Tip:
Highlight text to annotate it
X
So we'll transition now to talk about a chi square test of independence. Last time we
talked about a chi squared test of goodness of fit which was a little simpler. This is
chi squared test of independence. And why is it called that? Because we have two variables,
we're testing whether they're independent of one another or not. They're either independent
or related and that's what we test for. Now this test is used when you have two variables
that are categorical, nominal, they're qualitative, you can't put them on a scale, it's just categories
tht people fall into. So let's take an example, let's say we have a race car course
and we run a bunch of people through this race car course and it will just happen to
be either raining or not raining when we do this. And let's say those people are either
each going to have an accident or no accident. So what we're testing here is are accidents
and raininess independent or are they related? Our null hypothesis is that they are independent
and that we expect to find no relationship. Really we are hoping to find a relationship.
So you're going to take -- in our example it's going to be 136 people and each person
will run through the course and measure one thing and that one thing will be "Did you
have an accident today?" yes or no. And then we'll also record whether it happened to be
raining at the time they drove. So I'm going to put observed frequencies in black -- observed
frequencies -- meaning what we're putting in here is numbers of people, think of them
as piles of people, the numbers we put here represents how many people in the pile. Because
in this design we're not analyzing a mean, we're analyzing how many people fall into
these categories. So it's the size of the crowd that winds up here versus here versus here
versus here. So let's say we take 136 people and we find that when it's raining 19 of them
happen to be driving in the rain and had an accident, 26 of them happened to be driving
in the rain and did not happen to have an accident, 20 of them were driving in dry conditions
and wound up having an accident, and 71 of them happened to be driving when it's not
raining and they did not have accidents. Now here's the common misconception not to have
-- this is not a repeated measures design, we do not put each person in each condition.
Each person's going to fall into only one of these piles of people so either here or
here or here or here -- that's the common misconception that people have. You can't
take repeated measures on subjects and run a chi squared test of independence, it would
not be a correct test for that. So in fact let's write this down in big letters in your
notes and pause the video until you have this written out. In this particular test each
person gets piled -- and I want to use the word piled because you should be thinking
that these are piles of people and we're analyzing the sizes of the piles. Each person gets piled
into just one cell, they wind up sort of like in one category, and we only take one observation
per person, per research subject. So we only run them through our race course one time
and then we record was it raining or not and did they have an accident or not. We don't
run them through the rain and the dry and the accident and the non-accident -- that's
NOT what we're talking about design-wise here. All right, well now we want to figure out
what would we expect if these two things are independent of each other? We want to get expected
frequencies. The way to think about this is: we need to know when rain is not so much a
factor how often should we expect accidents to happen or not. And so I want you to pause
the video right now and see if you're following along properly by computing right now how
many people are in our study -- and by the way the total number of people was 136 people
-- total, that's how many people in total ran our study. So see if you can compute how
many of all the people in our study had an accident and that would include both people
in the rain. Pause the video and see if you get it right and if you got it right you should
have taken 19 plus 20 is 39 people had accidents out of 136 on this race course. Now I want
you to pause the video and see if you can figure out how many of everybody in our study
had no accidents. Okay, and if you're following along properly you should have taken 26 and
71 which is 97. Now don't you just watch this video without doing the calculation. If you
do that you're kidding yourselves because then you're going to get to exam time and
end up being very sorry so I really want you to do these and see if you're following along.
Make your mistakes now where it's not going to cost you anything point-wise. Now pause
the video, see if you can figure out of all the people in our study how many were driving
the course in the rain. Okay, so that would be the 19 who had accidents in the rain plus
the 26 who had no accidents in the rain, 19 and 26 is 45. Now just like with everything
you might want to label things -- rain:45 and here 39 accidents, 97 no accidents -- so
that you can never get confused about what's what. Now pause the video and see if you can
figure out how many people in the study happened to be driving when there was no rain. And
if you're following along and understanding it was 91 people. So in figuring out expected
frequencies which we're going to put in red it sort of depends on how often accidents
happened overall, what we should expect for accidents should be this proportion here -- 39
out of 136 should be accidents and 97 out of 136 should be non-accidents. How do we
know that? Well when we ignore the weather and we just look at the data this is the best
indication we have from this race course, isn't it? How often do accidents happen generally,
right? So we're collapsing across weather. So we're going to figure out our expected
frequencies based on that. So we're figuring out norms. Think of an expected frequency
as a norm. So of 136 outings how many will be in the rain? We're assuming this
is a typical day probably, 45. And of 136 outings how many will be not in the rain?
91 -- so we should expect 45 divided by 136, 45 out of 136 should be in the rain generally,
this would be what we're taking as norm. So here's how to do the calculations -- get all
this on paper because we're going to do now expected frequencies. All right, well let's
think about this. We expect -- ask yourself this -- how many of the 136 to be in rain?
Pause the video and see if you can figure out how many of the 136 we expect to be in
rain based on the table that you now have on paper. Well, we expect 45 of them, right?
So do this calculation -- 45 divided by 136 should be 0.33 so 0.33 of the outings are
raining. And don't just be writing a number, always label it in this way. Label, label,
label -- you'll be glad you did.