Tip:
Highlight text to annotate it
X
Hi! I am Mike Marin and in this video
we'll talk more about subsetting data using square bracket
I've already gone ahead and imported the LungCapData
into R and attached it. the LungCapData was introduced earlier
in the series of videos. in previous videos
we learned the "dim" or dimensions command, we can use this to ask about the
dimensions of our data; here we can see
it has 745 rows and 6 columns; we can also use the "length" command
to ask about the number of observations in a vector or a variable
here we can see Age consists of 725 observations
we've also explored the use of the square brackets on a single variable or
vector
here we can take a look at Ages for observations 11
up to 14, we've also looked at
the use of square brackets on a matrix or data frame
here we can look at the LungCapData, all observations
in row 11 up to 14 and we leave this blank to include all columns
let's now take subsetting one step further
and see how we can subset data based on values for other variables
in the dataset, for example let's calculate the mean Age
but only for females; here
we'd like to calculate the mean for the variable Age
and we can use the square brackets to subset only females
here we will specify remove Ages only those whose Gender
is equal to female; a double equal sign
what does this mean? recall that a single equal sign
can be used to assign values to objects while the double equal sign
is used to represent the meaning of equality in a mathematical sense
the word female is placed in quotations
as it is a character string or a factor; here
we can see the levels of the gender variable
are "female" and "male"; now
let's do the same and calculate the mean Age but this time subsetting
for "males". we can also go ahead
and create a subset of the data containing information for only the "females"
we will go ahead and do this and save it an object call FemData
here we would like to
go into this object LungCapData and we would like to remove
rows where the Gender is "female"
and we will include all columns; let's go ahead and do the same
but this time for "males" producing a subset of data
containing only the males; will call this object MaleData
let's go ahead and confirm that R has done what we wanted
and subsetted the "female" and "male" data. first
we can check the dimension of each of these objects
FemData: 358 rows, 6 columns
the MaleData :367 rows
6 columns; we ask for summary
of the Gender variable, we can see this 358 females
367 males as there should be and we can also take a look at
FemData: we will look at the first 4 rows and all columns
for this object now let's take this one step further
and we can pull out a subset of data for males who are over 15 years old
we'll call this object MaleOver15
and in this we will store the LungCapData
pulling out rows where the Gender
is male and
the Age is greater than 15
and we leave this blank again indicating we like
all columns. we can go ahead and check the dimensions
of this object. we can see
89 rows or in other words there's 89 individuals
who are male and greater than 15 years old in our dataset
and let's go ahead and look at the first
4 rows in this object, all columns here we can see
up the first 4 individuals they're all males with ages greater than 15
in the next video in this series will introduce the use of logic commands in R
as well as a few other random commands
thanks for watching this video and make sure to check out my other instructional videos