Tip:
Highlight text to annotate it
X
There are two other functions I'll consider that provide a
brief overview of a data frame.
But first, I'll load in data set--
this time, stock market data.
I've already navigated to the proper folder, and I can see
the file in my current working directory if I use the
list.files function.
This particular data set is saved as a tab-delimited text
file, so to import it into R, I'm going to use the
read.delim function.
In the last video, we saw how to load in CSV files, and in
this video we've seen now how to load in
tab-delimited files.
If you aren't sure how to load in your particular data set,
Google your question.
There are many online resources about loading data
into R that are likely to be very useful.
All right, back to the data set.
I'll take a look at the first and the last three rows.
I can also get a better overview of the data set by
using the STR function that provides a breakdown of the
object structure.
Here, I can see that stocks is an object with over 70,000
rows and eight variables.
I can also see each of the names of each of the variables
and the first several observations to get a sense of
what each contains.
Note that observations for factor variables may look like
numerical variables.
Another helpful function to get an alternative look at an
R object is the summary function.
In the case of a data frame, the summary object returns a
summary of each column.
Notice that there are NA values represented in some of
the columns.
In R, NA means that there's a missing observation, and here
it lists a number of missing observations for
each of these columns.
Note that the STR and summary functions are not specific to
data frames.
They can be applied to any R object to get a quick peek
about the object and its characteristics.
You might have noticed that one of the variables in the
stock data set is a date.
I'm going to take a closer look at the date and print out
the first 20 values.
If I look carefully, I can see there's also a levels
attribute associated with these dates.
That means that R has interpreted this field as
being a factor.
It would be much more useful to keep this as
an actual date object.
In this case, the dates are formatted as day, month, and
year, and in such cases, I want to examine this variable
as a date object using the as_Date function.
I also need to specify the format of the date.
There are many different ways to format dates, so you may
need to look up how to specify the date formats you run into.
For this purpose, you'll probably want to look in the
STRP time function help file.
You can do this by typing ?strptime and hitting enter.
I'm going to save the formatted dates in an object
called s.date.
While having this date object is helpful, I really would
like to replace the original date object with this one.
In fact, it actually would have been much easier had I
just skipped the step of creating s.date and just saved
the modified version over the date object
right from the start.
Now if I look at the stock objects again with the head
function, I can see that the date variable is now formatted
in the standard way, starting with all four digits of the
year, then the next two digits for the month, and then the
last two digits being the days.
You might wonder, why go to all this trouble
to format the date?
Why not just leave it as a factor or
just set it as a character?
First, you might like to examine the differences of
dates to learn the proximity of two
observations to each other.
Second, if you generate a time series plot that makes use of
the date object, R will do its part to help make
the plot look nice.
For example, here I'll plot the time series of the stock
price for Google.
I'm specifying the date as a variable for
the horizontal axis.
And since this formatted as a date, R will use this
information and plot the years all along the axes.
Had I not converted the date over to a date object, the
plot wouldn't have looked nearly as nice.
There are several other reasons to properly process
and format dates in R, but the general reason is that doing
so communicates the data structure accurately, and this
will make it easier for you and others to use
and reuse your code.
In the next video, we'll talk about if statements and also
the which function.