chi square test of independence video 1

So we'll transition now to talk about a chi square test of independence. Last time we talked about a chi squared test of goodness of fit which was a little simpler. This is chi squared test of independence. And why is it called that? Because we have two variables, we're testing whether they're independent of one another or not. They're either independent or related and that's what we test for. Now this test is used when you have two variables that are categorical, nominal, they're qualitative, you can't put them on a scale, it's just categories tht people fall into. So let's take an example, let's say we have a race car course and we run a bunch of people through this race car course and it will just happen to be either raining or not raining when we do this. And let's say those people are either each going to have an accident or no accident. So what we're testing here is are accidents and raininess independent or are they related? Our null hypothesis is that they are independent and that we expect to find no relationship. Really we are hoping to find a relationship. So you're going to take -- in our example it's going to be 136 people and each person will run through the course and measure one thing and that one thing will be "Did you have an accident today?" yes or no. And then we'll also record whether it happened to be raining at the time they drove. So I'm going to put observed frequencies in black -- observed frequencies -- meaning what we're putting in here is numbers of people, think of them as piles of people, the numbers we put here represents how many people in the pile. Because in this design we're not analyzing a mean, we're analyzing how many people fall into these categories. So it's the size of the crowd that winds up here versus here versus here versus here. So let's say we take 136 people and we find that when it's raining 19 of them happen to be driving in the rain and had an accident, 26 of them happened to be driving in the rain and did not happen to have an accident, 20 of them were driving in dry conditions and wound up having an accident, and 71 of them happened to be driving when it's not raining and they did not have accidents. Now here's the common misconception not to have -- this is not a repeated measures design, we do not put each person in each condition. Each person's going to fall into only one of these piles of people so either here or here or here or here -- that's the common misconception that people have. You can't take repeated measures on subjects and run a chi squared test of independence, it would not be a correct test for that. So in fact let's write this down in big letters in your notes and pause the video until you have this written out. In this particular test each person gets piled -- and I want to use the word piled because you should be thinking that these are piles of people and we're analyzing the sizes of the piles. Each person gets piled into just one cell, they wind up sort of like in one category, and we only take one observation per person, per research subject. So we only run them through our race course one time and then we record was it raining or not and did they have an accident or not. We don't run them through the rain and the dry and the accident and the non-accident -- that's NOT what we're talking about design-wise here. All right, well now we want to figure out what would we expect if these two things are independent of each other? We want to get expected frequencies. The way to think about this is: we need to know when rain is not so much a factor how often should we expect accidents to happen or not. And so I want you to pause the video right now and see if you're following along properly by computing right now how many people are in our study -- and by the way the total number of people was 136 people -- total, that's how many people in total ran our study. So see if you can compute how many of all the people in our study had an accident and that would include both people in the rain. Pause the video and see if you get it right and if you got it right you should have taken 19 plus 20 is 39 people had accidents out of 136 on this race course. Now I want you to pause the video and see if you can figure out how many of everybody in our study had no accidents. Okay, and if you're following along properly you should have taken 26 and 71 which is 97. Now don't you just watch this video without doing the calculation. If you do that you're kidding yourselves because then you're going to get to exam time and end up being very sorry so I really want you to do these and see if you're following along. Make your mistakes now where it's not going to cost you anything point-wise. Now pause the video, see if you can figure out of all the people in our study how many were driving the course in the rain. Okay, so that would be the 19 who had accidents in the rain plus the 26 who had no accidents in the rain, 19 and 26 is 45. Now just like with everything you might want to label things -- rain:45 and here 39 accidents, 97 no accidents -- so that you can never get confused about what's what. Now pause the video and see if you can figure out how many people in the study happened to be driving when there was no rain. And if you're following along and understanding it was 91 people. So in figuring out expected frequencies which we're going to put in red it sort of depends on how often accidents happened overall, what we should expect for accidents should be this proportion here -- 39 out of 136 should be accidents and 97 out of 136 should be non-accidents. How do we know that? Well when we ignore the weather and we just look at the data this is the best indication we have from this race course, isn't it? How often do accidents happen generally, right? So we're collapsing across weather. So we're going to figure out our expected frequencies based on that. So we're figuring out norms. Think of an expected frequency as a norm. So of 136 outings how many will be in the rain? We're assuming this is a typical day probably, 45. And of 136 outings how many will be not in the rain? 91 -- so we should expect 45 divided by 136, 45 out of 136 should be in the rain generally, this would be what we're taking as norm. So here's how to do the calculations -- get all this on paper because we're going to do now expected frequencies. All right, well let's think about this. We expect -- ask yourself this -- how many of the 136 to be in rain? Pause the video and see if you can figure out how many of the 136 we expect to be in rain based on the table that you now have on paper. Well, we expect 45 of them, right? So do this calculation -- 45 divided by 136 should be 0.33 so 0.33 of the outings are raining. And don't just be writing a number, always label it in this way. Label, label, label -- you'll be glad you did.