Tip:
Highlight text to annotate it
X
This presentation is on calculating and interpreting the two cell chi-square statistic. This is
a highly useful statistic as it's relatively easy to comprehend. The formula is pretty
basic and the test versatile, like the ANOVA and the independent samples t-test.
The chi-square statistic was invented by the British statistician Karl Pearson, who also
developed the standard deviation. Pearson developed the chi-square especially for nominal
level data,. Recall what nominal level variables look like: they take on two values, 1s and
0s, depending upon whether or not the condition is present. For instance, a registered Democrat
scores a 1 for a variable called registered Democrat, while a citizen who is not a registered
Democrat scores 0.
The problem with analyzing nominal level data is that the mean values and the variances
are less illuminating. The mean values, by definition, have to be between 0 and 1, but
it's hard to tell what a mean value of, for instance, .72 means, other than there are
more 1s than 0s. In fact, the most appropriate measure of central tendency for nominal-level
data is the mode.
Chi-square (abbreviated as the Greek letter χ2) is appropriate for comparing frequencies
of variable results. The basic χ2 calculation is the sum of the observed frequency minus
the expected frequency squared divided by the expected frequency.
χ2 = Σ [ (observed frequency -- expected frequency)2 / expected frequency]
The observed frequency is what the frequency count is for a variable. The expected frequency
is what we would predict the frequency count to be if there were no difference between
the two possible values for the variable.
Making that calculation for each observed and corresponding expected frequency, and
summing up the results of those calculations, yields the χ2 result.
As with the F-test, there is a critical value for χ2, above which the difference between
the expected and observed frequency is so large that there must be a statistically significant
difference between the observed and expected frequencies. These critical values are displayed
on the χ2 table, at the .05 level of significance, on page 168 of the *** I book. As a hint,
though, we will be using only one critical value: 3.841.
The usual rule applies: if the χ2 value at the appropriate degrees of freedom is equal to
or above the critical value listed on the table, then you reject the null hypothesis
that there is no statistically significant difference between the observed frequency
and the expected frequency.
Let's go through an example. Let's imagine that two candidates, Smith and Jones, are
running for election, and a sample of likely voters has been asked their views of the two
candidates. The results of the 200 voter sample are as follows (the first cell is for Smith,
the second cell is for Jones):
CANDIDATE PREFERNCE
Smith Jones Responses 110 90
Percentage 55% 45% of Respondents
We want to know if this sample is statistically different from a dead heat between Smith and
Jones. What would the null hypothesis be, expressed in percentage terms? 50% to 50%.
Thus, the sample would be split evenly, 100 votes for Smith, 100 votes for Jones.
In order to figure out 55% is statistically different from 50%, we perform the following
operation:
χ2 = Σ [ (observed frequency -- expected frequency)2 / (expected frequency) ]
χ2 = Σ [ (110 -- 100)2 / 100 ] + [ (90 -- 100)2 / 100 ]
χ2 = Σ [ (10)2 / 100 ] + [ (--10)2 / 100 ]
χ2 = Σ [ 100 / 100 ] + [ 100 / 100 ]
χ2 = Σ [ 1 ] + [ 1 ]
χ2 = 2
One rule of thumb for χ2: it can never be less than zero, since we're squaring the observed
frequency minus the expected frequency.
We find the degrees of freedom by subtracting 1 from the number of categories. Thus, 2 -- 1
= 1.
Next, we need to turn to the table of critical values for the Chi-square test in the ***
I book, on page 168. Yet, unlike with the other statistical tests we have conducted
this semester, we won't need this table for our Chi-square tests to figure out the critical
value at which we decide to accept or reject the null hypothesis. This is because the degrees
of freedom for all of the Chi-square tests we will conduct in this course will all be
the same: df=1. We will use the .05 "Area to the right of critical value" as the correct
column and the critical value we will always use is 3.841. This you can memorize if you
wish: 3.841.
3.841.
If the calculated Chi-square is less than 3.841, we accept the null hypothesis. If the
calculated value is equal to or greater than the critical value, then we reject the null
hypothesis. Since 2 is less than 3.841, we accept the null hypothesis and conclude that
the observed frequency of support for Smith is not statistically different from the observed
frequency of support for Jones.
We will find that our results for the Chi-square test are highly dependent on the size of the
sample. Let's change our example around a bit to demonstrate this. Let's keep the percentage
of support for Smith and Jones the same as before, 55% supporting Smith, 45% supporting
Jones in the pre-election survey. However, we will change the sample size from 200 to
1000. Let's see what happens.
CANDIDATE VOTE INTENTION Smith Jones
Responses 550 450
Percent of 55% 45% Respondents
χ2 = Σ (observed frequency -- expected frequency)2 / (expected frequency)
χ2 = Σ [ (550 -- 500)2 / 500 ] + [ (500 -- 550)2 / 500 ]
χ2 = Σ [ (50)2 / 500 ] + [ (--50)2 / 500 ]
χ2 = Σ [ 2500 / 500 ] + [ 2500 / 500 ]
χ2 = Σ [ 5 ] + [ 5 ]
χ2 = 10
And with 1 degree of freedom, clearly 10 is larger than 3.841 (the critical value), so
we reject the null hypothesis now, and conclude that there is a statistically significant
difference between Smith and Jones. Smith is statistically ahead of Jones. The moral
of the story is: the larger the sample size, the better the opportunity to find statistical
significance.
We conclude this presentation of the two-cell Chi-square test.