07.1 Independent Samples T - Test #1

This presentation is the first of two presentations on understanding, calculating, and interpreting an independent samples t-test. The independent samples t-test is a versatile test that works when we have two subsets of a larger sample of data, and we want to test whether or not there are significant mean differences between the subsets. There are many instances where this test is appropriate: for instance, regions of the United States such as the Confederacy and the Northeast, Hispanic American and Asian American demographics in a survey, members of Congress in the Tea Party Caucus and the Congressional Progressive Caucus. In each instance, these sub-samples are part of a larger sample: states in the U.S., respondents to a survey in the U.S., and members of Congress, respectively. Here are fundamental criteria for the independent samples t-test. First, the samples that we are comparing must have different summary statistics. This means that the samples have different mean, median, mode, variance, and standard deviation values. Next, the sample sizes must be different. Finally, cases cannot appear in both samples. For instance, for this test to work, a U.S. state cannot both be in the Confederacy and in the Northeast, a respondent cannot be both Hispanic American and Asian American, and the member of Congress cannot be in both the Tea Party Caucus and the Congressional Progressive Caucus. And here is an example of the criteria being met. You're looking at a map of the state of Idaho, with all 44 counties represented. If you grew up attending elementary and secondary school in Idaho, you probably know the Idaho counties song. (If you didn't and you're curious, there are YouTube videos of the tune that you can watch.) You'll notice that the map of Idaho is divided into six planning districts. In 1972, the state created these planning districts so that the counties could better work together on planning matters. The planning districts correspond to different regions of the state. It's easy to see that planning districts I and II comprise the northern region of the state. Idaho County is where Grangeville is located and it is at the bottom of District II, whereas Boundary County is at the top of District I and borders Canada. Planning districts III and IV comprise the southwest region of Idaho, whereas districts V and VI are in the southeastern part of the state. One general hypothesis about Idaho politics and society is that it is heavily influenced by regions. For instance, the territorial capital of Idaho was first placed in Lewiston, in the north, in Nez Perce County. By 1867, the capital was moved to Boise in Ada County, and the land grant state university was placed in Moscow, in Latah County, in exchange. The regional rivalry sure has been bitter at times! Additionally, each region now has a major university: the University of Idaho in Moscow, Boise State University in Boise, and Idaho State University in Pocatello. Some folks refer to the "Great State of Ada" in derision at residents of Boise in particular, accusing them of being out of touch with the rest of the state. And so it goes.... But are there meaningful differences between the regions of Idaho? One way to test this is to use an independent samples t-test. Here's the formula, which I'll explain a couple of times, so don't sweat it if you don't get it at first. Let's focus on the numerator for starters. The numerator is easy: X bar sub 1 -- X bar sub 2 means that you subtract the mean value for the first independent sample by the mean value in the second independent sample. The denominator is more complex. On the left hand side of the denominator, the first operation is to take the number of cases in the first independent sample and multiply that number by the standard deviation of the first independent sample squared. Then the second operation is to take the number of cases in the second independent sample and multiply that number by the standard deviation of the second independent sample squared. Add the results of the first operation to the results of the second operation. Then divide by the first sample size added to the second sample size minus two. Then take the square root of that division result. This is your overall first denominator result. On the right hand side of the denominator, add the number of cases in the first independent sample by the number of cases in the second independent sample, then divide by the number of cases in the first independent sample multiplied by the number of cases in the second independent sample. Then take the square root of that division result. This is your overall second denominator result. Your next step is to multiply your overall first denominator result with your overall second denominator result. The result of this multiplication is your overall third denominator result. Your final step is to divide your overall third denominator result into your numerator result. In our next presentation, we will work through a data example of the independent samples t-test and interpret the results.