09.1 Two - Cell chi - Square

This presentation is on calculating and interpreting the two cell chi-square statistic. This is a highly useful statistic as it's relatively easy to comprehend. The formula is pretty basic and the test versatile, like the ANOVA and the independent samples t-test. The chi-square statistic was invented by the British statistician Karl Pearson, who also developed the standard deviation. Pearson developed the chi-square especially for nominal level data,. Recall what nominal level variables look like: they take on two values, 1s and 0s, depending upon whether or not the condition is present. For instance, a registered Democrat scores a 1 for a variable called registered Democrat, while a citizen who is not a registered Democrat scores 0. The problem with analyzing nominal level data is that the mean values and the variances are less illuminating. The mean values, by definition, have to be between 0 and 1, but it's hard to tell what a mean value of, for instance, .72 means, other than there are more 1s than 0s. In fact, the most appropriate measure of central tendency for nominal-level data is the mode. Chi-square (abbreviated as the Greek letter χ2) is appropriate for comparing frequencies of variable results. The basic χ2 calculation is the sum of the observed frequency minus the expected frequency squared divided by the expected frequency. χ2 = Σ [ (observed frequency -- expected frequency)2 / expected frequency] The observed frequency is what the frequency count is for a variable. The expected frequency is what we would predict the frequency count to be if there were no difference between the two possible values for the variable. Making that calculation for each observed and corresponding expected frequency, and summing up the results of those calculations, yields the χ2 result. As with the F-test, there is a critical value for χ2, above which the difference between the expected and observed frequency is so large that there must be a statistically significant difference between the observed and expected frequencies. These critical values are displayed on the χ2 table, at the .05 level of significance, on page 168 of the *** I book. As a hint, though, we will be using only one critical value: 3.841. The usual rule applies: if the χ2 value at the appropriate degrees of freedom is equal to or above the critical value listed on the table, then you reject the null hypothesis that there is no statistically significant difference between the observed frequency and the expected frequency. Let's go through an example. Let's imagine that two candidates, Smith and Jones, are running for election, and a sample of likely voters has been asked their views of the two candidates. The results of the 200 voter sample are as follows (the first cell is for Smith, the second cell is for Jones): CANDIDATE PREFERNCE Smith Jones Responses 110 90 Percentage 55% 45% of Respondents We want to know if this sample is statistically different from a dead heat between Smith and Jones. What would the null hypothesis be, expressed in percentage terms? 50% to 50%. Thus, the sample would be split evenly, 100 votes for Smith, 100 votes for Jones. In order to figure out 55% is statistically different from 50%, we perform the following operation: χ2 = Σ [ (observed frequency -- expected frequency)2 / (expected frequency) ] χ2 = Σ [ (110 -- 100)2 / 100 ] + [ (90 -- 100)2 / 100 ] χ2 = Σ [ (10)2 / 100 ] + [ (--10)2 / 100 ] χ2 = Σ [ 100 / 100 ] + [ 100 / 100 ] χ2 = Σ [ 1 ] + [ 1 ] χ2 = 2 One rule of thumb for χ2: it can never be less than zero, since we're squaring the observed frequency minus the expected frequency. We find the degrees of freedom by subtracting 1 from the number of categories. Thus, 2 -- 1 = 1. Next, we need to turn to the table of critical values for the Chi-square test in the *** I book, on page 168. Yet, unlike with the other statistical tests we have conducted this semester, we won't need this table for our Chi-square tests to figure out the critical value at which we decide to accept or reject the null hypothesis. This is because the degrees of freedom for all of the Chi-square tests we will conduct in this course will all be the same: df=1. We will use the .05 "Area to the right of critical value" as the correct column and the critical value we will always use is 3.841. This you can memorize if you wish: 3.841. 3.841. If the calculated Chi-square is less than 3.841, we accept the null hypothesis. If the calculated value is equal to or greater than the critical value, then we reject the null hypothesis. Since 2 is less than 3.841, we accept the null hypothesis and conclude that the observed frequency of support for Smith is not statistically different from the observed frequency of support for Jones. We will find that our results for the Chi-square test are highly dependent on the size of the sample. Let's change our example around a bit to demonstrate this. Let's keep the percentage of support for Smith and Jones the same as before, 55% supporting Smith, 45% supporting Jones in the pre-election survey. However, we will change the sample size from 200 to 1000. Let's see what happens. CANDIDATE VOTE INTENTION Smith Jones Responses 550 450 Percent of 55% 45% Respondents χ2 = Σ (observed frequency -- expected frequency)2 / (expected frequency) χ2 = Σ [ (550 -- 500)2 / 500 ] + [ (500 -- 550)2 / 500 ] χ2 = Σ [ (50)2 / 500 ] + [ (--50)2 / 500 ] χ2 = Σ [ 2500 / 500 ] + [ 2500 / 500 ] χ2 = Σ [ 5 ] + [ 5 ] χ2 = 10 And with 1 degree of freedom, clearly 10 is larger than 3.841 (the critical value), so we reject the null hypothesis now, and conclude that there is a statistically significant difference between Smith and Jones. Smith is statistically ahead of Jones. The moral of the story is: the larger the sample size, the better the opportunity to find statistical significance. We conclude this presentation of the two-cell Chi-square test.