Tip:
Highlight text to annotate it
X
Let's take a look at this example-- Spring Break Drinks.
This data comes from a class that I
had in the spring of 2012.
And they were asked, on a StatCrunch survey,
to report the number of servings of alcoholic beverages
they consumed over spring break.
I also asked them whether they were
members of a Greek system or not.
What I want to find out is if there is evidence
that the mean number of servings of alcohol consumed by members
of the Greek system differs from non-members.
So to begin, I'll just look at the summary statistics.
And this is always the first step you should do.
You should never jump right into a confidence
interval or a hypothesis test.
You should always start with the graphical display and summary
statistics.
And when I take a look at this dot plot,
I have number of drinks, here, on the horizontal axis,
and I have a dot plot for both the students
who are not members of a Greek system and students
who are members of a Greek system.
And this is the actual data that they reported.
You see that both groups are slightly right-skewed.
You have quite a number of students
who reported they did not drink any alcohol at all.
And even if you took those students away,
you would still see a little bit of a right skew.
OK?
Also, we'd want to look at graphical display
because we do want to check for outliers.
I would definitely say that this is an outlier, here,
and this is a little bit of an outlier--
not as extreme as this one, over here.
And so, if I were going to really write this study up
and try to use it to make decisions based off
of this data, I would want to investigate
these outliers a little bit more and see
if they are valid observations, or maybe
find out what the story is.
If I was truly doing a research study on this topic,
I would want to find a little bit more about these subjects
and why they're so different from the rest of the Greek.
OK.
Next, I'm going to take a look at the summary statistics.
And I do have two different sample sizes.
I have 53 students who were not Greek, 22 who were Greek,
and that semester, I had three classes,
so almost-- or actually, I'm sorry this
is about two classes.
And so I had lots of students.
And the mean for the non-Greeks was 11-- I'm sorry.
The mean for the non-Greeks was 11.6; the mean for the Greeks
was a good bit higher than that.
Their average was 17 servings of alcohol.
All right?
So I can see that, in my sample, the mean for the Greeks
is higher than the non-Greeks.
But what I want to know now is, is that enough information
for me to assert that, on average, Greek students drink
more than non-Greek students?
And maybe our population of interest
here is all USC students.
All right?
Let's take a look at the standard deviation.
You see these numbers are quite different from each other.
I have a lot more variability in the Greek students
than I do in the non-Greek students.
That is what this information is telling me, right here.
All right?
Now that I've taken a look at the summary statistics,
I'm going to move on and I'm going
to take a look at-- I'm going to think
about doing a hypothesis test, to see if I can determine
if there's a difference in the mean number
of alcoholic servings between the two groups.
So let's write out what our hypotheses would be.
That's usually step two, and I like to do step two first,
even though the book says assumptions is step one.
Having a hard time finding a nice pen today.
I'm going to use some meaningful subscripts.
I'm going to say, mu sub G-- that's the Greek students--
equals mu sub NG-- for non-Greek students.
Notice that I wrote this a little bit
differently than what I showed you in the notes.
I want you see that this is equivalent to saying
that their difference is 0.
So I could write it either way.
These are equivalent; it doesn't matter which way I write it.
Software tends to use a statement of differences.
I think this is more intuitive and simple to follow,
but, because software usually gives its output like this,
you have to recognize that those are the same thing.
Our alternative hypothesis is simply
that there's a difference.
Notice the question of interest.
It simply says, "is there evidence
that the mean number of servings consumed by members
of the Greek system is different from non-members?"
Not "more than," not "less than;" simply "different."
So my alternative hypothesis is that the mean number of drinks
for Greek students does not equal
the number of drinks for non-Greek students.
Again, I could write this as a statement of differences--
mu Greek minus mu non-Greek does not equal 0.
OK?
I have the results from the StatCrunch analysis already,
and I'll show you how I got that, in a minute.
So I don't have to chop up this video too much,
I'm going to just go ahead and show you the results,
and then I'll move over to StatCrunch.
So I told StatCrunch to take this data
and conduct these hypotheses of mu 1 minus mu 2 equal 0,
versus mu 1 minus mu 2 not equal to 0.
All right?
I had my nice subscripts, but StatCrunch
won't do that for you.
All right?
And here, for the difference mu Greek minus mu non-Greek,
the sample mean difference is 5.4, or about 5.5.
That's the difference in my sample means.
The test statistic is 1.02, or about 1.03,
and the P-value is 0.312.
Let's draw a picture of this.
I have a not equal to alternative,
so my P-value is split into two tails.
My test statistic is 1.03, so I'm
going to show that as well as its opposite, negative 1.03.
And the area in these tails represents the P-value-- 0.31--
split into two.
It's about 0.155.
And the same thing over here.
0.31 split into two is 0.155.
OK?
This is a pretty large P-value.
Let's think about what this is telling us,
in the context of this problem.
It's saying, under the null hypothesis,
if there's no difference between Greeks and non-Greeks,
with regard to the mean number of drinks
that they have over spring break,
then the probability of me getting a sample mean
difference such as this-- or something
even further away-- is about 31%.
So in other words, it doesn't seem
to be that unlikely that I would get this kind of sample
mean difference, even if the two population mean
differences are the same.
So this doesn't give me much evidence
to support the alternative hypothesis.
I'll just make a couple notes, here.
Here's step three of the hypothesis;
here's step four of the hypothesis.
Now I'm going to write out that interpretation for step five.
So I'll say something like this.
With a P-value equal to 0.312, there is not
sufficient evidence that the true mean number
of alcoholic servings-- or I'll say, um-- ah,
I guess I'll just keep it that-- for Greek students differs
from non-Greek students over spring break.
OK?
Right.
Now let's take a look at the results for the 95% confidence
interval.
OK.
For the 95% confidence interval, I
have a difference in means of mu G minus mu non-G. All right?
They've shown us this, up here.
Mu 1 is Greek, mu 2 is non-Greek.
This is very important.
I have to really pay attention to the order of subtraction.
The order of subtraction is Greek minus non-Greek.
Here's the lower limit; here's the upper limit.
This is actually one of the more difficult ones to interpret.
This suggests that I can't be sure which group is higher.
The interval ranges all the way from negative 5.4 up to 16.
So it could be that the mean number of drinks
is higher for Greeks.
It could be that there is no difference,
because 0 is in the middle, here.
It could be that Greeks drink more than non-Greeks,
or the other way around.
So here is the long interpretation.
This is gonna be like the-- I'm gonna go ahead and tell you.
I'm going to have to say, with 95% confidence,
it is unclear whether Greeks or non-Greeks
drink a larger mean number of servings
of alcohol over spring break.
The mean for Greeks might be higher than
non-Greeks by as much as 16.32 servings.
Look right here.
If Greek is higher than non-Greek,
that's going to result in the positive number.
The highest positive difference that I might see is 16.
So the mean for Greeks might be higher than non-Greeks only
by as much as 16; that's the highest
it would be if the two were different.
However, the mean for non-Greeks might be higher than Greeks
by as much as 5.4 servings.
The largest negative difference is negative 5.4, so, therefore,
non-Greeks could be higher than Greeks by as much as 5.4.
All right.
Let's back up a little bit, before I move over
to StatCrunch.
I've just realized I never came back and checked
my assumptions, here.
I do need to check that I have a quantitative response variable.
And I do; it is number of servings.
And I'm observing this for both Greeks and non-Greeks.
I also want to point out that I do have an independent sample
design.
And if you look back at the two ways
that we end up with an independent sample design--
let's look back at 'em-- I hope you
will see that the drinking example falls
into this category.
It is "an observational study that separates subjects
into groups according to their value
for an explanatory variable."
The explanatory variable, this time,
is actually whether they are Greek or non-Greek.
We didn't take students and purposefully put them
in the Greek or the non-Greek system;
we just saw which system they were already In.
So this is an observational study.
OK?
The second observation that I have to check is that,
do I have a random selection?
This assumption is not met-- all right?
The way I collected this data is, I simply
did a StatCrunch survey of my spring 2013 Stat 201 students.
It is not a representative sample,
if my population of interest is all USC students.
OK?
Of course, I have limitations here, as do a lot of studies.
Ideally, we want a random selection,
but in practice, it's not always feasible.
We're still going to do the analysis,
but we are going to make a note that it is not
a random selection of students.
The last check is whether we have
met the normality assumption.
And we have a problem here, too, actually,
because, when I made my dot plots the normality assumption,
I need to either see that the problem states that I'm
sampling from a normal population;
it does not state that.
I need fairly large sample sizes; I'm OK here,
but I'm not OK here.
So my next step is to make a graphical display
of both sets of data, which I did, for both samples.
And this small sample group has an extreme outlier.
So I'm kind of in trouble, there, too, as well.
So I'm going to say, is the normality assumption met?
Not really.
We'll say the dot plot of Greek students
shows two extreme outliers.
So, in other words, this study really isn't-- it if you were
going to go out and try to make claims based on the study,
it's not going to be very convincing,
because we have a couple of problems.
We have a small sample size that shows extreme outliers.
Furthermore, our sample is not really
representative of the population of interest.
But for the sake of seeing how to work through and do
this analysis, it's a good example to look at.
OK.
Now I'm going to move into StatCrunch.
And I'm going to show you how I got both the summary
statistics, the dot plots, and also
the results of the hypothesis test and the confidence
interval.
I'm going to share my screen with you.
Move over to StatCrunch.
Here is the data that I collected.
And the variables of interest are right here,
whether the student is in the Greek system or not.
And I also asked for the number of drinks over spring break;
that's right here.
Number of drinks.
Started by looking at the dot plots.
So I asked for a dot plot, number of drinks.
I asked it to group it by whether the student
was Greek or not.
That's how I obtained those dot plots.
Now I want to look at the summary
statistics for both groups.
So I went to Summary Stats Columns.
I'm interested in the number of drinks,
but I want to break it out by this explanatory variable,
Greek.
So group by drink.
And there we go.
And now I can compare the two groups fairly easily.
Let's focus on the standard deviation, here, for a second.
That's going to come into play when
I do the hypothesis test, here, in a second.
I want you to notice that these two numbers are
quite different from each other.
The standard deviation for the Greek students
is almost twice as much as it is for the non-Greek students,
meaning I see a lot more variability
among the Greek students than the non-Greeks.
Keep that in mind.
Now, to do the hypothesis test, I'm
going to go to Stat, T-statistics, two samples.
Oops.
T-statistics, two sample.
And I'm going to choose the With Data option,
because I have the full data set.
Group one is going to be-- um-- here.
This is a little difficult to type into StatCrunch.
Where Greek equals Yes.
And this will be number of drinks where Greek equals No.
And I do not want to poll the variances.
This means, assume that the population variances are equal,
but remember, the sample standard
deviations we saw were quite different from each other.
So that would be a lofty assumption to make.
So I'm going to uncheck that option, there.
And I do want to leave this alone.
I want to test that the difference in means
is 0 versus not equal to 0.
And Compute.
And here are the results that you have.
Now I want to show you how to find the confidence interval.
So I'm going to edit this.
I'm going to just go back and ask for a confidence interval.
And I want you to notice that, when
I entered the samples in this order,
I set up the order of subtraction
to be Greek minus non-Greek.
Compute.
And here's the lower limit and the upper limit
of the hypothesis test.
I want to point something out, to help you out
with the homework.
In the homework, a lot of the questions
you're going to look at, you do not
have the full data set, such as I do, here.
You are instead given the summary statistics.
And so, in that case, you can choose With Summary.
You have two samples; you're using this to compare,
and you instead have the summary data.
And with those options, you just have to type in your values.
And you're going to always have to make this choice
about whether to include or not include the poll variances.
I gave you a lot of hints on the homework
to help you out with this.
But it's generally a matter of looking at the two sample
standard deviations and seeing if they are reasonably
close to each other or not.
I'm not going to give you hard and fast rules for it
in this class.
OK.
We're going to stop here, and I'll
pick up in the next segment with another example.