Tip:
Highlight text to annotate it
X
All right everyone. Hello and welcome. We're ready to get started.
If you weren't able to go to the review on Sunday, that review is on Blue Review. It's
Today in class, hey we started please.
Today we do have a little bit of new material, but there's a lot of review within it. Some
of the questions I ask you in fact will list conditions that you would have to know for
Thursday night right at the start. So there will be some good review in that. And then
I do hope to get to your Exam 2 lecture review and do the first three out of the four questions
with you.
There is a prelab that's open, but you probably won't work on it until after the exam. But
it is for next Monday. Homework 7 is due tomorrow, not Thursday. So you can have the solutions
open for you. That homework 7 was a little shorter. Homework 8 is going to open up, but
it's not due until the Monday before Thanksgiving. So not due next week right away. That way
we don't have a homework due on Thanksgiving week except that Monday. So homework 8 is
still open, so you can start to work on it over the weekend, but you have next week and
the weekend after that too.
All right. Do you have a general question or comment at all before we do a clicker question
towards the beginning here? That's just your exam information, where you go. Exam is sent
off for copying. It is a little longer in terms of number of pages, because of output
that's provided and adequate space for answering questions, graphs, and things too.
Same as Exam 1, you're free to leave when you're done with the exam. Make sure you've
got a picture ID. Come on in and sign up. We'll have those exams graded again over the
weekend and back to you next Monday.
Clicker question right at the beginning. It's an easy one.
So why is this an easy question?
Because no matter which answer you give, they're all right. Mm-hmm.
[Laughing]
Pick your favorite parameter. Which is your favorite? [Laughs]
All right. But just to remind you that those, that very first parameter p still could be
on the exam. And even p1 minus p2, we covered that right before Exam 1, that was that week
of Exam 1. Also any of those five parameters, how they would be estimated with the corresponding
statistic. Right, these are the five parameters you could think about what are the individual
statistics that would estimate those quantities, p-hats and X-bars type of thing.
Confidence interval. Either make it or here is a confidence interval and interpret it.
Use it to make a hypothesis test decision seeing if 0 is in your interval or not, those
ideas, or conducting a hypothesis test about that parameter. Now with five parameters,
five confidence intervals, five testing, I can't have you do all of them. I'm not going
to have you every one of those scenarios, both confidence interval and testing complete
problem with the summaries and then you work through the whole thing. That would take too
long.
So there's some that are on the hypothesis test side, there's some that might be on the
confidence interval side instead. You might have to do all five steps of the hypothesis
test or you may have to just take that test statistic that's reported and find the p-value
and make your decision. So there will be some partial ones too.
As always, show all your work and let us be able to give you consistency points, so you've
got a p-value but you're not sure if it's right but still make your decision with it
to get correct for that decision. Things like that.
All right. We are going to cover a few pages of notes. It's a technique that's called Analysis
of Variance. But I'm going to be reviewing with you for a good part of this time that
two-sample pooled t-test, because that's all we're going to do here is an extension of
that.
So in labs this week you're doing that lab 10, two independent samples, two means.
Today we're going to see a technique that allows us to extend that two-sample pooled
t-test, which compares two population means, mu1 and mu2, to be able to have an experiment
or observational study that lets you compare three means or more. Analysis of Variance,
ANOVA. Analysis of Variance allows us to compare means of two or more normal populations. That
was the condition about our populations. When you take an independent samples and we are
going to extend the pooled version, which is when assume the population variances are
equal. Now on Thursday night, if I ask you to write an H0 for the two independent sample
problem with means comparing two population means, what would your H0 look like? It would
be a mu1 not an X-bar1 and you could write it as what? Mu1 equals mu2. Or the equivalent
way to write it would be what? Mu1 minus mu2 is equal to 0. Either one is fine. And we
might have you look at the output and decide whether you can do the pooled technique or
not. The pooled technique is reasonable, then you'll calculate a t statistic or maybe pull
it from the output. Equal variance is assumed. And that t statistic would follow a t distribution
with what degrees of freedom? In the pooled t-test you get to use all of the degrees of
freedom from both samples put together and that was what? N1 plus n2 minus 2. If we're
going to extend this technique to more than two populations, then we're going to have
more than two samples and we'll have say n1 plus n2 plus n3 minus 3, if there's three
groups. So we're going to extend this technique. We're going to first remember the assumptions.
Your picture that we're going to start filling in a little bit and those four bullets at
the bottom are really the assumptions for your two-sample pooled t-tests or confidence
interval. Four assumptions, two are about the samples and two are about the populations.
For ANOVA, we're just extending it from one and two populations up to some number of populations
k. K represents that number of populations. It might be your new treatment, the standard
treatment that's being compared to, and then a placebo treatment all in the same experiment
so you can make multiple comparisons.
So what do we have? We have that each population of responses, which is some quantitative response,
has a normal model, that's one of our conditions. For each population the model for the response
is normal. So we have a normal model that describes the scores or the times or whatever
it is that we're measuring in each population. They might have different population means,
that's what we're trying to learn about. Are the population means all the same, mu1 and
mu2 and so on, or are they different? So we still have to keep the 1 and the 2 that notation
for representing the means, because they could be different. That's what we're trying to
assess. But what else do we assume on the population side? That the population variances
or the population standard deviations are all equal. Equal population variances or equal
population standard deviations. So the standard deviation's sigma that I write to represent
the model for each population, is just a sigma without a 1 and the 2, because the populations
all have the same variability. The test scores vary in each of my three teaching methods
the same, even though the means might be different.
And then the two about the data is that you go to each population and you will take a
representative sample, a random sample, from that population of some size and one for the
first. Take another random sample, the size might be different or it might be balanced
and the same, but of size n2. We'll do that for all of the populations. And then we do
assume that these samples, these sets of data that we have, that they're not matched or
paired in any way, that they are independent.
So if you just take the first two populations and instead of writing k in these bullets
you write 2. You have exactly the assumptions to do the pooled two-sample t-test or t interval.
And that very last assumption is the one that would not be needed if you're doing the general
or unpooled technique.
All right, so we're extending it to more than two populations. So we just wrote out the
Ha when it is just two populations to be compared, mu1 and mu2 are the same. What will the H0
and the Ha look like to do more than two? Well H0 is pretty easy. H0 is always that
there's no difference on average, no effect, so all of those population means should all
be equal. Mu1 is equal to mu2 and that's equal to mu3. There are three groups or however
many groups. So the H0 is still equality of the population means. What will Ha be? On
Thursday night, you could write between your two population means a greater than, a less
than, or a not equal to, depending on what the researcher is trying to establish. When
you have three groups, then there's lots of possibilities. It might be that they're all
different. It could be that the first two are really the same, but the last one is different.
So there's lots of possibilities. So in writing out the Ha here we actually take care of all
those possible options by saying at least one. At least one population mean, population
mean response, it's our mu. At least one of those population means is different. So we
will be testing the hypothesis that there is all the same, there is no difference between
those population means versus not H0. At least one of them is different. They're not all
equal. Somewhere there's something that's different and it could be that they're all
different. There's just lots of possibilities.
So we'll be doing a test that first establishes whether there's no effect across all those
populations or there is something going on somewhere. And then we'll figure out where
it is. There's a couple pictures of what the model for your response would look like under
H0. H0 already has the means being equal. The standard deviations were equal. They all
had a normal model. So it's really the same model for everybody. And here's a possibility
for what could happen under an alternative. That all the populations have slightly different
means. Two are sort of closer together than the other, but that's just one possible picture.
All right. So we're doing this thing called ANOVA. It's going to help us decide whether
the population means are all equal, no difference on average or not. But we're calling it Analysis
of Variance, which seems like a strange name for a hypothesis test that's about means.
Well it turns out to be able to decide between those different hypotheses about the means,
we're going to actually compare two estimators of that common population variance. So we're
going to use our data and in two ways we're going to come up with a way to estimate the
variability. They're called a mean squares. Probably the first formula you might have
had to look at on your formula card was one for the standard deviation for a set of data.
We call that S, right. S is your sample standard deviation. And we went through how to compute
this measure of spread back in Chapter 2. We took every value, we subtracted the mean,
to look at the distance from the mean. To get rid of the cancellation we squared those
distances and added them all up. And then what is the dividing in the standard deviation
that S? It's n1 minus 1, which you learned is also now called the?
Degrees of freedom for a single set of data. So that looks like to me a sums of squares
on the top, you're summing up a bunch of squared terms on the top and you divide by a degrees
of freedom. So this kind of looks like what we'll call the sum of squares over a degrees
of freedom. And that's really the variance that I'm talking about here. So that's S squared
and we're going to refer to this as being a mean square. You're averaging these squared
terms.
So we are estimating variances in that same way. We're taking a sums of squares over a
degrees of freedom. We're going to use the data we have and come up with two similar
sums of squares over degrees of freedom, two mean squares, and be able to compare them
because of their properties.
So one of them is called the mean square within or due to error. And it is going to turn out
to be, mean square is a good, it's a good or unbiased estimator of that common population variance sigma squared.
It does a good job. Whether H0's true or not, it does a nice job. In fact, that mean square
error is really going to be something you have seen before that Sp, that pooled variance
estimate. You pooled things between two groups. We're going to pool between three groups or
how many groups we have. But that ends up being a really good estimate to use in your
standard error term, so we'll have that one to use. It's a good unbiased estimator.
And the other mean square that we'll end up calculating also from our data is looking
at variability between the groups, between the groups, and it also is a good or unbiased
estimator of that common population variance sigma squared, but only if H0's true. If the population means
really are all equal, this one does a good job too. It should be equal to that variance
sigma squared on average. It does a good job. If H0's not true, this particular way of calculating
a variance estimate ends up blowing up, getting really, really large. So otherwise it tends
to be too big.
So we're going to take our data and calculate two ways to estimate this variance. That's
the same in all of our populations and we've got these samples to use. One of them does
a really nice job on general. The other one does a good job if H0's true. If H0's not
true, it tends to be much bigger than what it should be. So we can compare these two.
These should both be about the same if H0's true. If H0's not true, then I should expect
to see the top one being much larger than the bottom. So we're going to compare these
two in a ratio rather than just side by side. That ratio of these two variance estimates
is called an F statistic. So we have Z statistics, we have t statistics, and here is our first
F statistic. The ratio of these two mean squares. Now you actually have seen an F in output
before. In your output for the two independent samples, that Levene's test has an F sitting
there, but we said don't worry about the test statistic, that p-value that's sitting next
to it you know who to interpret the p-value. But actually is comparing those two S squareds,
those two variance estimates to each other in a ratio. So it's very similar to this kind
of statistic here.
So we're going to take a look at these two variance estimates, put them in a ratio. The
mean square that's on the top is the one that's only good if H0's true. Otherwise, it gets
to be too big. The one that's on the bottom is the good one that I would typically use
no matter what. So with that ratio formed that way, when should I reject H0's? When
I look at that F statistic, what kind of values for that F would lead me to day maybe reject
H0's if it's what, too? Which way too big or too small?
Too big. The one on the top should be about the same as the one on the bottom if H0's
true, so the ratio should be about 1. If H0's not true, the one on the top gets to be really
big, so this ratio would be big, much more than 1. And that would lead us to rejecting
the null hypothesis. So we'll figure out how big is big enough, but that gives us an idea
frame of reference.
All right. There is a nice write up for the logic behind ANOVA. I would request that you
read that little bit that's already completed for you sometime over the weekend before you
go to lab next week. And when you do that prelab over the weekend, you get to play around
with buttons on a simulation that basically demonstrates this logic idea. So that's something
to look at and go through Friday, Saturday, Sunday, Monday before your lab.
What we are going to do is show you how to get that F statistic today. We are going to
look at some data and the graph that I have to show you, something that I could ask you
Thursday night. So here's the steps to get that F statistic. It's a bit of work, which
is why we really like SPSS output.
So the steps are laid out with generic data. There's your data, but there will be numbers
there so you don't have to really worry about the 1s and the 2s or the notation. That's
just going to be a set of numbers, your samples from your different populations. Since you're
trying to learn about the population means, those parameters, the mu1, the mu2, the mu3,
I would hope that it makes sense that you would calculate the sample means. The X-bar1,
the X-bar2, to take a look at them to see if they're looking similar or not. So that's
step 1 and usually that summary's going to be provided for you, because I don't need
you to take three sets of data and calculate X-bar three times and calculate the S's. That
summary is usually provided even with two samples for two-sample t-tests you'll get
that summary. We'll get the sample means and the sample variances for each group.
Then we'll calculate in step 2 the overall sample mean. That's adding everything up and
dividing by the total sample size. The total sample size is represented by this big N,
that's the way the textbook uses and represents the overall sample size. So there's the overall
sample mean, which would only really make sense if H0's true, because then the population
means are all the same so you could combine all the data to come up with an overall estimate.
With those two summaries, we then have the two steps that get us the sums of squares.
So we can divide them by the degrees of freedom and get these mean squares that we form our
F.
So here are these formulas for those sums of squares. Sums of squares for the groups.
Sums of squares due to error. They are a sum of squared terms. A sum of some squared terms.
And these two sums of squares for the groups in error are really not that bad. They don't
look nice in formula form, but they're a summation over the groups. So if you have treatment
one, treatment two, and treatment three, three groups you're comparing, there's only three
things you have to calculate and add up, it's just three terms. So it won't be too bad.
And it involves the X-bars and the S's. Likewise, the sum of squares for error is over the three
groups, so there's just three things you have to plug numbers into and add them up. This
optional sums of squares for total, not very fun to find, because if I had three groups
with 25 subjects in each, I'd have 75 terms I'd have to work out and add up, because that's
over each of the individual values. So that one we usually say no thank you to and we
just find it by the total of the others. But there is a third way you can find it to check
your steps three and four calculations out if you wanted.
So we'll be able to take each of our sums of squares, add them up to get the total.
It is these two sums of squares that we need to start to form our last step of filling
in a table. A table called the ANOVA table, which is a pretty standard output that's provided,
the full ANOVA table, which really just leads you to that F statistic at the end.
The variability that's in your data, the fact that all those responses are not all the same,
usually comes from two sources. The differences between the groups because they all had different
teaching methods, so there's differences that way. And then there is still differences within,
because students that had the same teaching methods still have different scores. So between
the groups and within are due to error. So we'll find these two sums of squares and then
we have to work out the degrees of freedom.
So the very first degrees of freedom you saw which is for your one-sample t-test is just
n minus 1, right. However many things you have minus 1. I need the degrees of freedom
for groups. So if I have three groups, how many degrees of freedom for groups will I
have? Number of things you have minus 1. So number of groups minus 1 is my degrees of
freedom here. K represents the number of groups. Number of populations you're comparing. So
my degrees of freedom for the group's component is just how many groups you had minus 1. You
lose one degree of freedom for estimating that variability.
The sums of squares for error. This is the one that's like that Sp squared idea. So when
you did n1 plus n2 minus 2 and we said we're going to extend it to three groups, it would
be n1 plus n2 plus n3 minus 3, which is generically your total sample size minus however many
groups you have. So that's the degrees of freedom here. Your total sample size minus
the number of groups, that is, n1 plus n2 minus 2 if it were two groups. What do those
two add up to be for your total degrees of freedom for the entire data set, throwing
it all together? N minus k plus k minus 1 gets you back to n minus 1.
So with these degrees of freedom we can now compute these two variance estimates, that
first one called the mean square for groups, which takes the sums of squares between the
groups and divides by the appropriate degrees of freedom and that's a variance estimate
that works if H0's true. Otherwise, this one tends to be really big by the design and how
it's calculated. The other mean square, mean square for error, again takes the sums of
squares for error over the degrees of freedom, n minus k, and that's the variance estimate
that does a good job generally. It's the one we will use when we do confidence intervals
next week, maybe Thursday but most likely next week. This is really Sp squared extended
to three groups or four groups or however many groups you have. So we've got these two
variance estimates that can be worked out then.
The test statistic is an F statistic. It's the ratio of those two variance estimates,
the mean square between the groups on top, the error one on the bottom. The one on the
bottom does a good job generally. The one on the top does a good job, should be about
the same as the one on the bottom if H0's true, but otherwise the top one gets to be
really big. You add up a whole bunch of other positive stuff that gets added in. So I'll
be able to look at that ratio and see whether it looks like I should reject or not. Now
for you, you've done z-test and t-tests and you have a sense now that if you calculate
a t statistic or a Z and you got a test statistic of 3.8, would that seem extreme to you? Because
you're talking about being like almost four standard errors away from your H0. But I don't
know what's extreme for an F statistic yet. I need to know the distribution of the F statistic
so I can have a feel for what values you might see under H0 and which ones are starting to
be extreme or unusual. So your Z statistics have what model that you use to find the p-values?
The model for H0 for your test statistic. Z's, z-values, bell curve, what kind of bell
curve? The normal (0,1), otherwise known as standard normal. Every t statistic you have,
if you're finding the p-value you're drawing the picture of a distribution for that t statistic,
which is always a t distribution with a certain degrees of freedom. What kind of distribution
do you think we'll have for an F statistic under the H0 being true?
An F distribution. Clever names. An F distribution indexed by the pair of degrees of freedom
rather than just one set of degrees of freedom, because there's a numerator and a denominator.
Now these F distributions they don't look like the other distributions you've been working
with. The F distribution is definitely not symmetric around 0. It's what, skewed to the,
which way?
Skewed to the right. You can't get variances or standard deviations that are negative.
So when you put a ratio of two variance estimates, you can't get any negative numbers. So that's
why it starts out at 0. It is skewed to the right. In order to pull off numbers from an
F distribution and probabilities under our curve for an F distribution, we need big tables.
If I were to put the F tables on your formula card, we would be able to make a little house
around you to be able to take your exam in. The F tables are long. So we're not going
to have you look up on all these different tables the bounds for p-values. We're just
going to pull the p-value off the output that is provided by SPSS. But I do want you to
be able to know what the distribution looks like and the value of 1 of course is what
you'd expect to get if H0's true. That's sort of your expected value or your balancing point,
because 1 would be the ratio being the same on top and bottom. That's exactly what you
should see if H0's true. If I got a value of say 3 for my F statistic, I would find
the p-value as the probability of getting what you got or more extreme, assuming H0's
true, assuming this is the model I should be using then, the H0 model. More extreme
for F-test though are always if it's too big, to the right. So this little tail area here
for example might represent my p-value for a test.
But instead of having to try to find some bounds for it from a table that's really big
to work with, we'll report it from SPSS. All right.
So you have a section on your formula card that is the ANOVA section. Do you need to
look at it or reference it on Thursday night? No. I've already had a student from yesterday's
class say do I have to know that F stuff. No, you don't have to know the F stuff for
the exam. Eventually you do for, it's on the final for sure with a good number of points
behind it. But the F-test we're not going to have on the exam. It is so for comparing
many means and I just want to show you that you have lots of formulas, but you don't have
to memorize any of them. And when you have SPSS give you the output, you hardly have
to use any of them that are there.
But let's take a look at that, some data then. Let's start looking at not just formulas,
but actually some actual values which if I were giving it to you on Thursday night it
wouldn't have three different drugs, it would have two drugs to be compared. The background
of this experiment for comparing three different drugs for treating some condition. We are
going to measure a quantitative response. It has something to do with time to relief.
How quickly did you feel relief from your condition or symptoms? I think this is in
days. And so we have 19 patients in this clinical trial. We randomly assigned them to one of
three different treatment groups. The randomization of your subjects to one of three treatment
groups without matching or pairing them up in anyway means you have independent random
samples.
19 patients means I can't balance the design perfectly. I do have five subjects that were
given drug one, seven subjects given drug two, and seven given drug three. So those
are my sample sizes. My total sample size is 19. And I'm comparing three different drugs
simultaneously in this experiment, so my k, my number of groups or populations I'm comparing
are three. So there's my data. It's a quantitative response. So what's the first thing we should
do with any quantitative response? Well you've got a clue what's right underneath it. You
should graph your data in appropriate ways. What does the data trying to tell you descriptively?
Not just with numerical summaries like just calculate averages right away, but look at
the data. So this question I could ask you on Thursday night. I would ask it in the two-sample
problem with two groups. The box plots side-by-side could be given to you.
And these are the four assumptions again for doing both ANOVA here, but the two-sample
t-test. What assumption is best checked with this particular graph?
This was on the Sunday review a little bit. We commented on it there, as well as last
week.
Not too bad.
If you have to write out the conditions for doing a two-sample t-test, you might have
your checklist, but this is written out in a little bit more form. Because if you just
say normal as a condition, I'm going to want to know what's normal. What are you saying
has a normal model? One normal model, two normal models, right? If you just say equal
variance, that's not enough either, because what do you need to be talking about? Not
just equal variance for your samples, but equal variance for your populations. Your
populations have the same standard deviation or the same variance. That's how you would
state that last condition. As far as which of these conditions is best checked with this
graph, indeed most of you are picking the equal variances for the populations. Good.
Populations have equal variances being checked by this graph. I had some students say you're
checking that your three samples are independent, because there's three separate box plots there.
So they must be independent. Couldn't you make the box plots side-by-side quite nicely
of the before measurements and after measurements in a paired design to see whether things went
up generally. You would be looking really at differences, but you could make before
and afters. That doesn't make those two sets of data independent anymore. The independence
is mainly by the design that they randomly allocated subjects to groups. The randomness
of each sample has to do with can you say that those 19 subjects really reflect the
general subject pool? The normality would not be checked here. It would be better checked
with what?
A Q-Q plot or three Q-Q plots?
Three. Mm-hmm.
All right. So equal variances in the populations. I'm not looking at the population variances.
I'm looking at the sample IQRs, the interquartile ranges here. But I can use that to assess.
What would you conclude about that condition? Does that assumption seem reasonable based
on this so far?
Are the IQRs exactly the same? No. But this is a sample results, just like your S's when
we look at them in a minute don't have to be exactly the same. I think they look pretty
good, pretty comparable. I also don't see outliers, which is nice. But I still would
want to see three Q-Q plots to check the normality, especially with my sample sizes here being
quite small, 5, 7, and 7, would definitely need to check that.
All right. Which drug right now would you like to get?
Lower responses are better, because then you would be feeling better more quickly. So drug
three seems to be the winner descriptively. We'll decide through a test whether or not
that is the case. Is drug three really the overall winner? Are drugs one and two equally
as ineffective or is maybe drug one a little better than drug two? We'll check that out.
All right. Let's write the hypotheses out. I only have three populations I'm comparing
three population means, so I'm going to write mu1 equals mu2 and I go one more to 3. Don't
write it out generically to muk when you actually have a problem you're working on. And remember
how we write out the alternative, you have to say it some words, because you can't really,
you don't want to have to list all the possibilities. At least one population mean response, mean
time to cure, is different. You're not saying that they're all different. That's not what
you require that they all have to be distinct from each other, just at least one of them
will be different from the others, and it might be that they all are. We will see.
All right. There's our hypotheses, so step #1. I know you could do this. You could calculate
the X-bars. And I'm not going to have you sit here and get your calculators out and
do that. I'm going to give you the X-bars, but you would average those first five numbers
in your calculator and get the overall average, which would tend to be what? About 8, 8.22
days. The average of those seven observations for the seven subjects in drug two, their
average time to cure was the higher one, 9.3. And finally drug three which we saw descriptively
looks best in the box plot, and indeed it has the smallest sample mean. The average
time to cure for the student or subjects that had drug three only 6.8 days. Would you be
able to get the S's if you had to if it were a homework question? Could you throw the five
numbers maybe in your calculator or work out the couple of squared terms? I would do it
with a calculator or computer. But those S's could be found. How would that first S be
found? Just to remind us. First observation, 7.3, how far away is it from the sample mean
for that group? 8.22. Square that. Do that for all five terms. So the last one there
is 9.5. Especially with these not so nice numbers, I would not have you do this on an
exam. You would get the S provided for you. You divide by the degrees of freedom for that
sample. There are what? One, two, three, four, five subjects so 5 minus 1, and here is that
S squared, 2.74.
And I'm going to give you the other two sample variances, 2.61, 2.56, for those seven and
seven observations. What are some observations that you see now with those three numbers
there, those S squares?
Bless you.
Are those S squares the same?
They're not exactly the same, but are they comparable? Same kind of thing you'd look
for Thursday night with two sample variances or two sample standard deviations. I didn't
put units here because these are the sample variances. If I were going to put units on
each of those, they would be not days, but days squared, which seems kind of strange.
Good. All right. We'll comment then that these are similar. What does that support? It supports
the equal population variance assumption that is needed for ANOVA that would be needed for
a two-sampled pooled t-test.
All right. So let's, one set of notes had the sums of squares. On a homework, your homework
8, you might have to calculate it one sums of squares somewhere. Otherwise, most of the
time this is done through SPSS. We'll get the X-bar overall so we can get those sums
of squares. I need to average all 19 numbers. 7.3 plus all the way down to the 5.2. All
19 values averaged out here end up being 8, 8 point, I'm going to carry it out a bit,
0095. What would be a check to make sure that that looks reasonable? What are you comparing
that number to?
How about those three X-bars that you just calculated? And it should be somewhere in
between.
All right. Now we have everything we need to get our sums of squares. So as said, these
sums of squares are not too bad. There's really only three things we got to do here. Let's
take the first group. How many observations in that first group? How many subjects in
that first drug group?
Five.
What was the average for that first drug group? 8.22. How far away is it from that overall
average we just found of 8.0095? There's one term. Can you write out term #2?
Move on to the second group with seven people. A higher sample mean.
And the third and final term.
And so if I gave you steps 1 and 2 summarized, you could find a sums of squares pretty easily,
a little bit of plugging in. We get a total of 21, 21.98 is this first sums of squares
between the groups. Each sample mean for a group compared to the overall mean, between
the groups. One more sums of squares and then we get to throw it in the ANOVA table and
we'll analyze that, those results next time.
So the next sums of squares, three terms. Each term looks like this. First one then
should take the sample size minus 1 degrees of freedom. I'm going to put next to it the
2.74. Do I need to square the 2.74?
It says S squared here. But in step 1 we calculated S squared, right. We didn't take the square
root and report it as a standard deviation, so that is the sample variance. So let's do
the next term of 7 and a really similar S squared. And the final sample had a 2.56 as
that sample variance. And these three sums of squares added up is a 41.98.
Now I might often on an exam give you one of those, then you just have to find one other
one that you like the best, and get the other by subtraction. Because once you have two
of them, there is a step 5 that says to calculate this total sums of squares with all 19 terms
added up, but we'll just say no thank you. I don't want to do that. We would much rather
just add the two up and put it into our ANOVA table.
So our last step today of material, we're going to plug in these numbers into our ANOVA
table to get that first F statistic and analyze those results next time.
Sums of squares for groups was the 21. And the sums of squares for error, and don't cheat
and look at the table right below it. Cover that one up.
With our calculations rounding a little bit here, our two sums of squares and they add
up to 63.96. The generic table is on your formula card, so it would tell you that these
degrees of freedom right here are k minus 1. What's k again?
The number of groups you're comparing. We've got three drugs. So this is 3 minus 1 or 2
degrees of freedom. Your formula card would remind you that the degrees of freedom for
the within our error is your n minus k. So that's your total sample size. How many subjects
all together?
19. And then we had three groups. So 19 minus 3, 16 degrees of freedom. Then just add those
two up, 16 plus 2. And 18 makes sense as an overall degrees of freedom because I had 19
observations all together, 18 degrees of freedom.
Now the mean squares are just your sums of squares over the right degrees of freedom.
We're going to form our two variance estimates by taking the sums of squares over the degrees
of freedom. So I just cut that 21.98 in half. So almost 11, 10.99. There's the first mean
square, the first estimate of a variance that's common. Here's the next one, 41.98 but I get
to divide that by the 16 degrees of freedom, and that's about 2.6, 2.62.
Now remember both of those mean squares are estimating the common variance. The one on
the bottom looks more reasonable to me. That's the 2.62, that's your Sp squared, the pooling
of your S squares. That one works in general. It's a good estimator. The one on top also
works well if H0's true and otherwise it tends to be really big, and 10 is looking a bit
bigger than this 2.6. Let's look at how big in ratio. We put the mean square for groups
on top, the error mean square on the bottom to get our F statistic, 4.2. We should get
an F of about 1 if H0's true, because that means the two things were about the same,
top and bottom. If H0's true, it should be the same on top as the bottom. This is a bit
more than 1 and perhaps significantly more than 1. Again, you won't find the p-value
for that 4.2 yourself, you'll look down at the SPSS output under the sig column, which
is what SPSS calls the p-value, and be able to pull off that p-value to make your decision,
which we will do together on Thursday.
So we're going to turn to Exam 2 review. She'll have fun, fun, fun when her daddy takes her
t-test away. Take away those t-tests, okay fine.
Let's do review. If you forgot your sheet, I brought a few. Look over front page. We're
going to through those three today. Talk it over with your neighbor if you've already
tried it. Otherwise, please do try them now. Clickers on the true/false coming up on question
2.
We'll gather back in about two to three minutes.
So another minute or two to try the first three questions. Commit to a true/false. What
would you circle if it were the exam night on #2? And we'll go through these in just
a minute.
All right. Why don't you help me out with #1. Don't you have a homework question that's
also asking about alpha, what that means, statistically significant kind of thing. All
right. Statistically significant. We say the data, we say the results are statistically
significant at level alpha. What does that mean? In order to be statistically significant,
does your alpha need to be .05? Have you always done every test at a 5% level, alpha has to
be .05 to possibly be statistically significant? No. Alpha of 5% is your common one. If we,
we said that if we ever forget to give you an alpha you need to do a test, you could
use .05. Now we have learned that you usually use alpha of 10% if you're doing that Levene's
test for checking a condition of equal population variances. But 5% is a reasonable alpha, it's
just not required to be statistically significant.
Why would you like alpha to be small? You don't need the alpha small to be statistically
significant, but why is alpha being small good? Because what else does alpha represent?
It's a probability right? Probability of a?
Type 1 error. It does represent the probability of Type 1 error, alpha. So you want it to
be small. You don't want your chance of rejecting H0, going with the new theory when you shouldn't
because H0's true, to be too high. So you set that to be 1% or 5% or 10%, something
reasonably small. So yes, alpha can be 5%. You do want it to be reasonably small, but
those are not required to be statistically significant. It has something to do with your
p-value compared to alpha. And we said don't get this backwards. So stat sig means p-value
is large, larger than alpha or small?
Small. Needs to be small, less than alpha, but is that the only one?
Less than or equal to. If your alpha is set to be 5% and you do the calculations and your
p-value is .05, you just met the requirement to be able to say I have enough evidence to
reject. So you reject H0 if your p-value is less than or equal to alpha. Maybe 5%, then
you are statistically significant at that level alpha. Good.
I've had students on the top of the exam, p-value less than or equal to alpha, reject
H0, stat sig, right? Those three things go together.
All right. How about true/false, let's try them. So click in quickly what you think are
for these three confidence interval questions. Talking about skipping breakfast regularly
for all young adults. And you're provided the 90% confidence interval for that proportion
of all young adults who skip breakfast regularly for that population proportion p.
First statement is about the margin of error. It states that this interval has a margin
of error of 8 percentage points and you are saying, there it is, most are saying false.
Good. It's not 8 percentage points, is it? There is an 8 there, 20% to 28% is 8%, but
that's the width. What is the margin of error please?
It's the half width, so 4%. It's what you go out from the middle each way to get the
ends called the interval. So your margin of error is 4 percentage points. Can anyone tell
me what must have been p-hat for this problem? What proportion in the sample of young adults
must have said I skip breakfast regularly?
24%. Why? Because that's the midpoint. And that interval was made using p-hat going plus
or minus what we call a margin of error, which might have been the conservative one or not.
It didn't say conservative here, so I'm assuming it's the regular one. But that z-star in there
times that standard error is your margin of error, which was 4%. It's always the midpoint.
So 24% of the sample must have said yes.
All right. Good. Stop that. Let's look at the next one. This is one is looking at a
condition needed.
True or false for B?
Every confidence interval, every test has a set of conditions and those conditions need
to be met for that interval or test result to be valid. Almost every one of them somewhere
has something about their sample or samples need to be random samples. But then there
are other ones that come into play. If we're being able to make this interval and have
it be valid, do we need this condition? Most of you are saying true. Guess what it is?
It's false. Now I put it in here and I believe it's in one of the old exams, because I wanted
to have us look at this again before the exam. There is normality conditions for doing some
tests, but they have been for the t-tests in terms of the populations having a normal
model. So let's think about this here. What are we trying to learn about? We're trying
to learn about the population of all young adults and the question that I'm measuring, the question
of interest is whether you skip breakfast regularly. Yes or no? So for each person in
this population, you have their response, which is a yes or no. It's a categorical response,
right? When you're trying to learn about a proportion it's because your response is categorical,
yeses and not yeses, yeses and nos. They're comparing two proportions, it's categorical
responses. That's what my population of values look like. I could record the values of being
a 1 and 0 or something for the yeses and nos, but it's just two outcomes, categories. So
do you have a normal model to describe that population of responses? Would you have a
bell curve to fit the data that's in that population? No, it would be a bar chart or
it would a stick graph, a discrete model with yeses and with nos. So we do not assume that
the population that we're taking our data from has a normal model for doing proportion
stuff. But you're thinking Z's, right, proportions means I do a Z interval, a z-star, or a z-test
statistic perhaps if my sample size is large enough. Z's are normal, but you get that normality
for something else than the population. What do we require? What are the two conditions?
You go to your population and you take out condition #1, that you took a sample that
is a?
Random sample. You have a random sample from our population. And it would be a bunch of
yeses and nos. And then what are we going to calculate on that random sample? If you're
trying to learn about the true proportion what is it, the true proportion of all young
adults who skip breakfast and you got some data, you're going to calculate p-hat, which
turned out to be 24% for this sample. All right. In order for you to do your z-star
in your confidence interval, the second condition that's required isn't that you have normality
for this population, because that's a categorical response, but that you took a large enough
sample size. That your sample size n is large enough. And what was the requirement for being
large for proportion problems?
That at least 10 rule, right? The 25 or 30 or more was for means, for the Center Limit
Theorem to apply for sample means. The n being large for proportions back pre-exam 1 stuff,
was that at least 10. The conditions on your formula card say when you look at the normality
approximation, they write it with the n and the p. But you don't know p. And you're doing
a confidence interval so all you got is p-hat. So you check to make sure in your sample you
have at least 10 yeses and at least 10 nos in your sample.
Now if you have a large enough sample size, this says then p-hat will be as its model
approximately normal. The normality that you get is for the statistic, this p-hat, which
is a number now. P-hat is 24% here. Do it again, you might get 27%. It varies, but it
has a distribution and that distribution is approximately normal, which is why you get
to use Z's. But the original population is not normal.
In t-tests, one of the conditions that's always there about the population or populations
if there's two, is that the model for the response is normal, the population model is
a normal model, because you're having a quantitative response there. So proportions, the original
population isn't assumed to be normal, but I do need a large enough sample size for my
random sample so that my statistic when I standardize it I can make it a Z. That p-hat
will be approximately normal.
All right. One last true/false having to do with kind of interpreting. We made a write
up, an interpretation. This is a 90% confidence interval. So we're going to start out with
that if we repeated this poll many times, and what do you think this statement?
If it were in a list of statements that you were trying to decide if they're reasonable
to put in your report, would you include this one? Is p valid? And what do we have. Most
of you saying, not quite as many, but most of you saying false. It starts out okay, but
it slowly, quickly actually goes downhill. It is false. What does it say? If you repeated
the poll many times, all right, 90% of the time the proportion of all young adults who
skip breakfast will be between these two numbers. Is that what we can say? 90% of the time the
true p is going to be between 28% and 20%? When we're trying to interpret that 90% confidence
level we're talking about the process, we should talk about how well it performs if
you were to repeat it, but you shouldn't go back to the one interval you got the one time
you did it. Because when you repeat it, you don't get just 20% to 28% every single time.
The intervals are going to vary, because the sample results vary. And so if you remember
the picture we drew with our first confidence interval, we drew a bell curve, we said here's
the 95% range, and then we started looking at possible intervals. Or that simulation
where you pushed a button and it generated 100 intervals, some were green and some were
red. 90% of the intervals, not 90% of the time. 90% of the intervals are expected to
do what? Expected to be good intervals, by that we mean have the population proportion
of all young adults who skip breakfast falling in it. 90% of the intervals are expected to
contain the population proportion of all young adults who skip breakfast regularly.
The 20% to 28% is the one I got this one time. I'm really hoping it's one of those good intervals.
It either is a good interval or it's not. I can't tell you because I don't know the
true proportion p to know if it falls in between there or not, but this statement that's wrong
is trying to tell you that p changes and 90% of the time it's going to be between these
two numbers and 10% of the time it won't be, but p doesn't change. P is a population proportion
that's fixed, that you're trying to learn what it is. It's the intervals that would
change from one sample to the next, but most of them will be good ones.
All right. Good. Let's take a look at this last question, question 3. And question 3
has to do with what? Looking at weekday lunch customers at a restaurant, how quickly they
can be served their meal. Looking at the population of all weekday lunch customers, the claim
is that the mean for that population is 15 minutes. That the standard deviation for that
population is 2 minutes. That's the mean and the standard deviation for the population
of all customers' times for being served their weekday lunch.
So I'm imaging this population of all customer times for being served their lunch and we're
told the mean of that population happens to be 15 minutes, that's the claim anyway. The
standard deviation supposedly is 2 minutes. That's mu and sigma for this population. What
are we going to do, we're going to go to that population and take a random sample. We're
going to take 64 customers, record how long it takes them to get their lunch and calculate
the sample mean. So I'm going to pull 64 values out of that population at random, random sample
of 64. 64 kind of large? Large enough to may be help us if we need it to. 64 values out
and I'm going to calculate X-bar. Now I'm not giving you X-bar here, but I want you
to give me a picture of what I might get for X-bar. To give me a picture of the distribution
for that sample mean. It would show me what values you could get for the sample mean and
how often they tend to occur. I want the distribution and I even give you a clue it's the approximate
distribution here of the possible values for the sample mean. So where do we have sample
mean on here? We do have it on page two under population mean and sample mean there, but
we learned the foundation of doing t's, t procedures, from the sample mean distribution
on page one.
I'm actually going to be using a very famous result here to draw this picture, right? What
famous result? So important to be able to allow us to do some of our testing that we
want to do?
CLT. Why am I using a CLT, because I didn't tell you what model this population has. I
have no idea if the customer times if I graphed them would be bell shaped, skewed, or what.
But I'm taking 64 observations from there and I'm averaging 64 numbers. What is the
distribution for the possible values of the sample mean? That tells you put the sample
mean down here as your label, because that's the distribution you're giving me the picture
for. You can write it as X-bar or sample mean, sample mean values. That's the label for your
x axis. And what kind of model do I get for averages? Averages start to look approximately
normal.
Approximately normal. Where would it be centered at? Sample means should vary around the true
mean mu. The true mean mu, which is supposedly 15 minutes. What about the standard deviation?
Do I draw my tick marks going out by twos? Yes or no?
Two is a standard deviation, but the 2 up here is the standard deviation for individual
times, for individual customers. It's not what the variability would be for averages
of 64 customers. Averages vary less. It is on your formula card, bottom corner, that
says oh yeah standard deviation for an X-bar, for averages, is not just sigma, but it's
sigma over square root of n. So I need what there, 2 over well 64 is a nice number. 2
over 8, one-fourth, .25 minutes.
Now I can draw a few more values. I would go from 15 minutes as my mean to 15.25 to
14.75 for one standard deviation below, and I could put a few more there too. That's the
model. What are the, what's the name of the result that let's you draw this picture?
CLT. Now on the review on Sunday, very similar question, it was question 3 on that review
too. I did not have you draw the picture for the model for X-bar, but I went right to saying
calculate a probability that you might get a sample mean and in that problem it was as
high as 53 or higher. I could ask you part B here. So how likely would it be that the
average time for those 64 customers could be as high as 15.5 minutes or higher? How
likely could it be that it might be 15.5 minutes or higher? Would you be able to give me an
answer to that pretty quickly without looking up or calculating a Z score, because what
is the Z score at 15.5?
You're not one, but you are two, two standard deviations, I'm thinking how much in between
two and plus two, minus two?
95%. So what would it be an approximate probability of being in that tail?
About 2.5%, and I would accept that answer without going to Table A.1 if that were your
Z score. We're doing Z's here why? It's means. Aren't you supposed to do t's for means?
But t's are for testing hypotheses about an unknown mean that you don't know what it is.
And when you don't know sigma for that population either and you're trying to do a test to find
out what is the mean based on just a sample. This is about Z's because the model for X-bar
is, is normal by the CLT.
All right. Thanks for coming. See you on Weds, on Thursday.