Interpreting the SPSS Output for a Chi Square Analysis

>> Well, hello, we're going to spend a few minutes just to talk about how you interpret the SPSS output for the chi square goodness of fit test. As you're aware the chi square goodness of fit test as applied when we're working with nominal variables, specifically we mean that the possible values for the variable are categorical and cannot be ranked. When we're going to use a chi squared goodness of fit test there's some basic requirements that we need to make sure that we've met in order for anything we find to be of value to us. Number 1, the sample had to have been randomly drawn from the population and by this, what is meant is that the sample is representative of the population. That's going to be important if we hope to generalize our results from the sample to the entire population. Number 2, the values for the variable are mutually exclusive. By this I mean that if our variable for example is do you work, the values for that variable could be yes and no and those two values for the variable do you work, yes and no, they would fulfill this requirement in terms of being mutual exclusive, that is either someone works or they don't, they couldn't do both. So the values for the variables must be mutually exclusive. Finally our 3rd requirement is that the minimum expectation of five occurrences in each category and let's take a moment to look at that. In terms of our variable do you work and we'll actually be looking at this particular variable for this presentation but for this variable do you work the values were yes and no and those values represent categories so we have a yes category and a no category and this third requirement says that you have to expect at least five people for each category so if I for example interviewed 10 people I would need to expect 5 people to say yes I work and 5 people to say no. If for some reason the expectation of how many people expect to say yes or say no, that is the number of people expect to pick each particular value, if that should ever drop below 5 then the chi square goodness of fit would not be as appropriate perhaps as another statistical test and results may not be as valid. We'll talk a little bit more about this as we go along. Well, for this scenario let's say we surveyed 20 students about whether they work and in terms of our first requirement that we do a chi square goodness of fit test we'll set the sample as randomly drawn from the population. That is our sample is representative of the population and it would therefore be appropriate to generalize from the sample to the population as a whole. Our survey let's say had a question on it, do you work with the possible responses yes and no. In terms of the second requirement for the goodness of fit test these two responses to the question do you work, yes and no, they are mutually exclusive, that is someone would not select both of them, they would select one or the other. Then we have our null hypothesis and the null hypothesis specifies the expected frequency for each category. So we interviewed 20 people and our null hypothesis that is what would be expected in terms of the number of people who would say yes and number of people who would say no. Well, let's say and instructor in terms of assigning homework has an assumption that well only half the student work, the other half don't and so there's no reason to go any lighter on homework per se because you know at least half the students aren't working. That might then be the null hypothesis and of course the students in the class might be saying well hey a lot of us work and if that's the case that would be the research hypothesis so null hypothesis would be assumption no only half students work and the research hypothesis might be the students saying oh no more than half of us work, certainly it's not only half of us who work. With the research hypothesis we look at the observed frequency, that is we for example hand out a survey and then we observe what the responses will be so right now you see question marks under the observed frequency yes or no because the survey hasn't yet been handed out. Okay here you're taking a look a the chi square output and of course in the SPSS basics book it will show you the step by step process to get this chi square output but for now we're just going to look and focus on the chi square output itself. Looking at it you'll see that there's two tables the top table shows us our descriptive statistics, that is it helps us to describe the results for out particular sample and you can see that 16 people said that they work and only four people said that they do not work. So our null hypothesis was that it was going to be a 50/50, 10 saying yes, 10 saying no and what we observe certainly did not match up with what was expected based upon the null hypothesis. The bottom table is the inferential statistics, that is well this difference that we observe between the observed and the expected, this difference that we've found should we generalize from our sample to the entire population, that's what the inferential statistics is all about, you know perhaps what we observed was just due to chance so we'll also take a look at this bottom table as well. Okay so focusing on the top table the descriptive statistics you'll see that there's three columns, they're labeled, the observed number, the expected number and the residual. The observed number as was mentioned tells us what we observed based upon our data collection, 16 people saying yes and 4 people saying no to the question do you work. The expected end is what we expected based upon the null hypothesis, as you recall the null hypothesis was that only half the people would be working so out of 20 half would be 10 who should say yes and the other half, the other 10 would say no. Finally the last column is a residual and that's the difference between the observed and the expected so in terms of the people who said yes, we observed 16 only 10 were expected to say yes, 6 more said yes than had been expected. In terms of the now response we observed 4 people saying no, 10 people were expected to say no, we saw 6 less people than expected saying no. So the residual is how big of a difference was there between what we observed and what was expected based upon the null hypothesis? The larger the residual the more confident we can be that this is a real difference and not just some chance fluctuation in terms of we happened to just pick a sample where more people work than normal or less people worked than normal so we want a big residual if you want to reject our null hypothesis. Well we got to stop for a moment and we got to consider that third requirement to do a chi square test. As you recall that first requirement was random sampling to achieve the goal of a representative sample to allow us to generalize to the population. The second requirement was at the values for our variable, our variable is do you work, our values were yes and no but those guides would be mutually exclusive, and the third requirement was that expected value for each cell, that is each category has to at least be 5, again if it isn't 5 well that can make our results less of value in terms of making sense of the data. Taking a look at our top table we can see that the expected number of people saying yes and expected number of people saying no is 10 in both cases, looking at the bottom table at foot note A it says 0 cells have expected frequencies less that 5, that's good. We...that means we're meeting that third requirement, in fact the minimum expected cell frequency is 10 so it says you're clear to move forward. Okay so looking at the bottom table more closely you'll see that there's three rows, there's the chi square row, which is the value of your chi square analysis, there's the DF for degrees of freedom and then there's a significance and that's our P value. That top row chi square and notice they have a fancy looking X squared and the value for the chi square is 7.2, the larger that value the more likely we'll be able to reject the null hypothesis. F four degrees of freedom it's equal to 1 and our degrees of freedom is calculated as a number of categories minus 1. Well how many categories did we have, we had well people could respond either yes or no, so we had two categories, so number of categories minus 1, we had two categories, two minus 1, that's 1, our degrees of freedom is 1 and SPSS tells us that. Finally the A's of significance that's our P value, that is, okay so we observed a difference between the expected and the actual observed frequencies, well probably that just could be due to chance. Well, if there...if in the population of all students half worked and half did not and if we randomly sampled 20 students from that population the probability of us getting 16 saying yes I work and 4 people saying no I don't just due to chance is .007, less than 1 percent, a very small probability of this current due to chance. So if the null hypothesis was right that really in the whole population 50 percent people work and 50 percent people don't, you could get this sample with this large difference but it would only happen due to chance .007 of the time. Importantly the probability of this happen due to chance .007 is less than or equal to our alpha level of .05, so we're going to be able to reject the null hypothesis. So the A significance that's our P value and we want it to be less then .05. At the bottom of the screen you'll see chi square 1 equals 7.20 P less than equal to .05 and this information is written in it's exact format, this is our APA format for how we would write up the result and it is taken from that bottom table giving us our inferential statistics. We have chi square in parentheses we put the degrees of freedom which is 1, we say what it's equal to, that's where we put in the actual chi square value and that was 7.20. In terms of probability were either going to say P less than equal to .05 because .05 is our alpha level and if P is less than or equal to .05 we get to reject the null which is a wonderful thing because that would only mean you get to publish your results in this particular scenario it means that the students could tell their instructor hey way more than half of us work, please be kinder on the homework. The instructor might say well hey this homework is very necessary for you to learn the material but some negotiation could perhaps take place. If the probability is greater than .05 well then the probability is happen due to chance is greater than the .05 and so we're not willing to say that it's a real thing and in that case we would retain the null so probability is less than equal to .05 we reject, we say probability is happening due to chance is so small I'm not going to go along with that possibility instead I'll say something real is taking place. If probability is greater than .05 then we'll retain that null hypothesis. Okay and then here you can see how someone might write up the results of what we've been looking at in terms of the SPSS output so we could say we sampled 20 students, all right we're letting our reader know how many people were in the survey and that's going to be important because we're dealing with frequencies here, frequencies of people who said yes and the frequency of people who have said no. So we sampled 20 students and evaluated whether the number of students who work and there we're letting our reader know how people work, F because that frequency and notice that it's italicized is equal to 16, was equal to the number of students who did not work and again we have F italicized is equal to 4 so our first sentence says we sampled 20 students and the value number of the students who work F equal 16 was equal to the number of students who do not work, F was equal to 4. That tells us what we were evaluating. Whether there was the same number of people who work as who do not work. We also told our reader what we actually observed, 16 work, 4 do not. The data was analyzed using a chi square goodness of fit test, all right we're letting our reader know what statistical tests, what inferential statistical that is that we used to analyze our data. The null hypothesis was rejected and here we give our reader the inferential statistical test results, chi square, 1 degree of freedom that's put in parentheses is equal to and then the chi square value 7.20 comma and then that P was equal to less than or equal to .05 saying hey the probability of this was less then alpha so again we get to reject the null. Finally our last sentence just puts it out there in everyday understandable English, more than half the students at the college also work. Okay hope that this was helpful and again this is a presentation focusing on how do you interpret the SPSS output for chi square and then write it up with the APA style.