Tip:
Highlight text to annotate it
X
A second key programming tool is the for loop.
A for loop is a structure used to execute a set of code
repeatedly.
The for loop statement specifier is an index over
which the loop is computed.
For example, here I'll execute a for loop using an
index called i.
The object, i, will start the loop by taking the first value
in the 1 through 10 vector.
That is, to start, i will equal 1.
Next, the for loop will execute.
Then i will take the next value in this vector, which is
2, and loop will execute again.
This will continue until i has taken the last value in the
vector, 10, and the code executes one last time.
In this set of code, the value i squared will be appended to
the vector, x, using the append function.
Here I got an error.
I'll take a closer look.
I can see that x never actually existed, so there was
no way to append anything onto the first iteration.
To fix this, I'll just initialize x as an empty
vector using the concatenate function, but leaving the
arguments empty.
OK, this runs well.
Look at x.
In each iteration the value of i squared was appended
to the end of x.
So the first element was created when i was 1, the
second when i was 2, and so on.
OK, I've done something pretty cool here.
I've done 10 calculations using a for loop, and it
wouldn't be hard to do many more with
the same set of code.
For example, i could easily go from 1 to 100, rather than
just 1 to 10.
While there are other, better ways to do this particular
calculation, there are instances where for loop is
very useful.
All right, one more look at the stock data.
To calculate the smallest and largest values for each stock
in the stock data set, I'm going to start by creating an
object called the.tickers that is just a list of the unique
stock tickers in the data.
Since for loops can iterate over any vector, I will write
a for loop to iterate over the object that I've called
the.tickers.
It's sometimes helpful to also give a meaningful name to the
index, so I'm going to change the index i to ticker.
Now I need to create code for the general case.
For a given ticker, calculate both the low
and the high value.
I'll start by identifying which rows are of interest in
the stocks data set.
The vector called look.at is a Boolean vector indicating
which observations represent the.ticker
for the given iteration.
Next, I can create two statements to calculate the
lowest low and highest high of these observations.
Finally, I need to store these values somehow.
I can start by initializing two objects, lows and highs.
Next, I can use an append command to append a value on
to the end of the vector.
All right, I can run the code and print the results, but
something's wrong.
While I'd want to spot check some of my data anyways,
something bad has happened.
A value of NA in R means that a value is missing, and, more
generally, oftentimes functions will return NA if
any of the observations are missing.
If I took a look through our data set, I'd find that there
are several observations with missing NA values.
Here, I've checked how many entries in the column low of
the stocks data set are missing.
In many functions, such as min and max, there's an optional
extra argument that is useful for ignoring missing data, the
NA.RM argument, which I will set to true in
the min and max functions.
Now when I re-run the code, I get sensible results.
I'd want to look at the data more carefully to see why some
observations are missing, but I'll leave this as a topic for
another set of videos.
One final word--
even in this example, there are other, better functions
that could have been used to get the same
results much more quickly.
This would be important for code that should be
implemented efficiently, and I'll get to these functions in
the future.
However, for the beginning R programmer, it's sometimes
easier and clearer to simply implement a for loop.