Intr Stat&data Anlys 12 Nov 2013

All right everyone. Hello and welcome. We're ready to get started. If you weren't able to go to the review on Sunday, that review is on Blue Review. It's Today in class, hey we started please. Today we do have a little bit of new material, but there's a lot of review within it. Some of the questions I ask you in fact will list conditions that you would have to know for Thursday night right at the start. So there will be some good review in that. And then I do hope to get to your Exam 2 lecture review and do the first three out of the four questions with you. There is a prelab that's open, but you probably won't work on it until after the exam. But it is for next Monday. Homework 7 is due tomorrow, not Thursday. So you can have the solutions open for you. That homework 7 was a little shorter. Homework 8 is going to open up, but it's not due until the Monday before Thanksgiving. So not due next week right away. That way we don't have a homework due on Thanksgiving week except that Monday. So homework 8 is still open, so you can start to work on it over the weekend, but you have next week and the weekend after that too. All right. Do you have a general question or comment at all before we do a clicker question towards the beginning here? That's just your exam information, where you go. Exam is sent off for copying. It is a little longer in terms of number of pages, because of output that's provided and adequate space for answering questions, graphs, and things too. Same as Exam 1, you're free to leave when you're done with the exam. Make sure you've got a picture ID. Come on in and sign up. We'll have those exams graded again over the weekend and back to you next Monday. Clicker question right at the beginning. It's an easy one. So why is this an easy question? Because no matter which answer you give, they're all right. Mm-hmm. [Laughing] Pick your favorite parameter. Which is your favorite? [Laughs] All right. But just to remind you that those, that very first parameter p still could be on the exam. And even p1 minus p2, we covered that right before Exam 1, that was that week of Exam 1. Also any of those five parameters, how they would be estimated with the corresponding statistic. Right, these are the five parameters you could think about what are the individual statistics that would estimate those quantities, p-hats and X-bars type of thing. Confidence interval. Either make it or here is a confidence interval and interpret it. Use it to make a hypothesis test decision seeing if 0 is in your interval or not, those ideas, or conducting a hypothesis test about that parameter. Now with five parameters, five confidence intervals, five testing, I can't have you do all of them. I'm not going to have you every one of those scenarios, both confidence interval and testing complete problem with the summaries and then you work through the whole thing. That would take too long. So there's some that are on the hypothesis test side, there's some that might be on the confidence interval side instead. You might have to do all five steps of the hypothesis test or you may have to just take that test statistic that's reported and find the p-value and make your decision. So there will be some partial ones too. As always, show all your work and let us be able to give you consistency points, so you've got a p-value but you're not sure if it's right but still make your decision with it to get correct for that decision. Things like that. All right. We are going to cover a few pages of notes. It's a technique that's called Analysis of Variance. But I'm going to be reviewing with you for a good part of this time that two-sample pooled t-test, because that's all we're going to do here is an extension of that. So in labs this week you're doing that lab 10, two independent samples, two means. Today we're going to see a technique that allows us to extend that two-sample pooled t-test, which compares two population means, mu1 and mu2, to be able to have an experiment or observational study that lets you compare three means or more. Analysis of Variance, ANOVA. Analysis of Variance allows us to compare means of two or more normal populations. That was the condition about our populations. When you take an independent samples and we are going to extend the pooled version, which is when assume the population variances are equal. Now on Thursday night, if I ask you to write an H0 for the two independent sample problem with means comparing two population means, what would your H0 look like? It would be a mu1 not an X-bar1 and you could write it as what? Mu1 equals mu2. Or the equivalent way to write it would be what? Mu1 minus mu2 is equal to 0. Either one is fine. And we might have you look at the output and decide whether you can do the pooled technique or not. The pooled technique is reasonable, then you'll calculate a t statistic or maybe pull it from the output. Equal variance is assumed. And that t statistic would follow a t distribution with what degrees of freedom? In the pooled t-test you get to use all of the degrees of freedom from both samples put together and that was what? N1 plus n2 minus 2. If we're going to extend this technique to more than two populations, then we're going to have more than two samples and we'll have say n1 plus n2 plus n3 minus 3, if there's three groups. So we're going to extend this technique. We're going to first remember the assumptions. Your picture that we're going to start filling in a little bit and those four bullets at the bottom are really the assumptions for your two-sample pooled t-tests or confidence interval. Four assumptions, two are about the samples and two are about the populations. For ANOVA, we're just extending it from one and two populations up to some number of populations k. K represents that number of populations. It might be your new treatment, the standard treatment that's being compared to, and then a placebo treatment all in the same experiment so you can make multiple comparisons. So what do we have? We have that each population of responses, which is some quantitative response, has a normal model, that's one of our conditions. For each population the model for the response is normal. So we have a normal model that describes the scores or the times or whatever it is that we're measuring in each population. They might have different population means, that's what we're trying to learn about. Are the population means all the same, mu1 and mu2 and so on, or are they different? So we still have to keep the 1 and the 2 that notation for representing the means, because they could be different. That's what we're trying to assess. But what else do we assume on the population side? That the population variances or the population standard deviations are all equal. Equal population variances or equal population standard deviations. So the standard deviation's sigma that I write to represent the model for each population, is just a sigma without a 1 and the 2, because the populations all have the same variability. The test scores vary in each of my three teaching methods the same, even though the means might be different. And then the two about the data is that you go to each population and you will take a representative sample, a random sample, from that population of some size and one for the first. Take another random sample, the size might be different or it might be balanced and the same, but of size n2. We'll do that for all of the populations. And then we do assume that these samples, these sets of data that we have, that they're not matched or paired in any way, that they are independent. So if you just take the first two populations and instead of writing k in these bullets you write 2. You have exactly the assumptions to do the pooled two-sample t-test or t interval. And that very last assumption is the one that would not be needed if you're doing the general or unpooled technique. All right, so we're extending it to more than two populations. So we just wrote out the Ha when it is just two populations to be compared, mu1 and mu2 are the same. What will the H0 and the Ha look like to do more than two? Well H0 is pretty easy. H0 is always that there's no difference on average, no effect, so all of those population means should all be equal. Mu1 is equal to mu2 and that's equal to mu3. There are three groups or however many groups. So the H0 is still equality of the population means. What will Ha be? On Thursday night, you could write between your two population means a greater than, a less than, or a not equal to, depending on what the researcher is trying to establish. When you have three groups, then there's lots of possibilities. It might be that they're all different. It could be that the first two are really the same, but the last one is different. So there's lots of possibilities. So in writing out the Ha here we actually take care of all those possible options by saying at least one. At least one population mean, population mean response, it's our mu. At least one of those population means is different. So we will be testing the hypothesis that there is all the same, there is no difference between those population means versus not H0. At least one of them is different. They're not all equal. Somewhere there's something that's different and it could be that they're all different. There's just lots of possibilities. So we'll be doing a test that first establishes whether there's no effect across all those populations or there is something going on somewhere. And then we'll figure out where it is. There's a couple pictures of what the model for your response would look like under H0. H0 already has the means being equal. The standard deviations were equal. They all had a normal model. So it's really the same model for everybody. And here's a possibility for what could happen under an alternative. That all the populations have slightly different means. Two are sort of closer together than the other, but that's just one possible picture. All right. So we're doing this thing called ANOVA. It's going to help us decide whether the population means are all equal, no difference on average or not. But we're calling it Analysis of Variance, which seems like a strange name for a hypothesis test that's about means. Well it turns out to be able to decide between those different hypotheses about the means, we're going to actually compare two estimators of that common population variance. So we're going to use our data and in two ways we're going to come up with a way to estimate the variability. They're called a mean squares. Probably the first formula you might have had to look at on your formula card was one for the standard deviation for a set of data. We call that S, right. S is your sample standard deviation. And we went through how to compute this measure of spread back in Chapter 2. We took every value, we subtracted the mean, to look at the distance from the mean. To get rid of the cancellation we squared those distances and added them all up. And then what is the dividing in the standard deviation that S? It's n1 minus 1, which you learned is also now called the? Degrees of freedom for a single set of data. So that looks like to me a sums of squares on the top, you're summing up a bunch of squared terms on the top and you divide by a degrees of freedom. So this kind of looks like what we'll call the sum of squares over a degrees of freedom. And that's really the variance that I'm talking about here. So that's S squared and we're going to refer to this as being a mean square. You're averaging these squared terms. So we are estimating variances in that same way. We're taking a sums of squares over a degrees of freedom. We're going to use the data we have and come up with two similar sums of squares over degrees of freedom, two mean squares, and be able to compare them because of their properties. So one of them is called the mean square within or due to error. And it is going to turn out to be, mean square is a good, it's a good or unbiased estimator of that common population variance sigma squared. It does a good job. Whether H0's true or not, it does a nice job. In fact, that mean square error is really going to be something you have seen before that Sp, that pooled variance estimate. You pooled things between two groups. We're going to pool between three groups or how many groups we have. But that ends up being a really good estimate to use in your standard error term, so we'll have that one to use. It's a good unbiased estimator. And the other mean square that we'll end up calculating also from our data is looking at variability between the groups, between the groups, and it also is a good or unbiased estimator of that common population variance sigma squared, but only if H0's true. If the population means really are all equal, this one does a good job too. It should be equal to that variance sigma squared on average. It does a good job. If H0's not true, this particular way of calculating a variance estimate ends up blowing up, getting really, really large. So otherwise it tends to be too big. So we're going to take our data and calculate two ways to estimate this variance. That's the same in all of our populations and we've got these samples to use. One of them does a really nice job on general. The other one does a good job if H0's true. If H0's not true, it tends to be much bigger than what it should be. So we can compare these two. These should both be about the same if H0's true. If H0's not true, then I should expect to see the top one being much larger than the bottom. So we're going to compare these two in a ratio rather than just side by side. That ratio of these two variance estimates is called an F statistic. So we have Z statistics, we have t statistics, and here is our first F statistic. The ratio of these two mean squares. Now you actually have seen an F in output before. In your output for the two independent samples, that Levene's test has an F sitting there, but we said don't worry about the test statistic, that p-value that's sitting next to it you know who to interpret the p-value. But actually is comparing those two S squareds, those two variance estimates to each other in a ratio. So it's very similar to this kind of statistic here. So we're going to take a look at these two variance estimates, put them in a ratio. The mean square that's on the top is the one that's only good if H0's true. Otherwise, it gets to be too big. The one that's on the bottom is the good one that I would typically use no matter what. So with that ratio formed that way, when should I reject H0's? When I look at that F statistic, what kind of values for that F would lead me to day maybe reject H0's if it's what, too? Which way too big or too small? Too big. The one on the top should be about the same as the one on the bottom if H0's true, so the ratio should be about 1. If H0's not true, the one on the top gets to be really big, so this ratio would be big, much more than 1. And that would lead us to rejecting the null hypothesis. So we'll figure out how big is big enough, but that gives us an idea frame of reference. All right. There is a nice write up for the logic behind ANOVA. I would request that you read that little bit that's already completed for you sometime over the weekend before you go to lab next week. And when you do that prelab over the weekend, you get to play around with buttons on a simulation that basically demonstrates this logic idea. So that's something to look at and go through Friday, Saturday, Sunday, Monday before your lab. What we are going to do is show you how to get that F statistic today. We are going to look at some data and the graph that I have to show you, something that I could ask you Thursday night. So here's the steps to get that F statistic. It's a bit of work, which is why we really like SPSS output. So the steps are laid out with generic data. There's your data, but there will be numbers there so you don't have to really worry about the 1s and the 2s or the notation. That's just going to be a set of numbers, your samples from your different populations. Since you're trying to learn about the population means, those parameters, the mu1, the mu2, the mu3, I would hope that it makes sense that you would calculate the sample means. The X-bar1, the X-bar2, to take a look at them to see if they're looking similar or not. So that's step 1 and usually that summary's going to be provided for you, because I don't need you to take three sets of data and calculate X-bar three times and calculate the S's. That summary is usually provided even with two samples for two-sample t-tests you'll get that summary. We'll get the sample means and the sample variances for each group. Then we'll calculate in step 2 the overall sample mean. That's adding everything up and dividing by the total sample size. The total sample size is represented by this big N, that's the way the textbook uses and represents the overall sample size. So there's the overall sample mean, which would only really make sense if H0's true, because then the population means are all the same so you could combine all the data to come up with an overall estimate. With those two summaries, we then have the two steps that get us the sums of squares. So we can divide them by the degrees of freedom and get these mean squares that we form our F. So here are these formulas for those sums of squares. Sums of squares for the groups. Sums of squares due to error. They are a sum of squared terms. A sum of some squared terms. And these two sums of squares for the groups in error are really not that bad. They don't look nice in formula form, but they're a summation over the groups. So if you have treatment one, treatment two, and treatment three, three groups you're comparing, there's only three things you have to calculate and add up, it's just three terms. So it won't be too bad. And it involves the X-bars and the S's. Likewise, the sum of squares for error is over the three groups, so there's just three things you have to plug numbers into and add them up. This optional sums of squares for total, not very fun to find, because if I had three groups with 25 subjects in each, I'd have 75 terms I'd have to work out and add up, because that's over each of the individual values. So that one we usually say no thank you to and we just find it by the total of the others. But there is a third way you can find it to check your steps three and four calculations out if you wanted. So we'll be able to take each of our sums of squares, add them up to get the total. It is these two sums of squares that we need to start to form our last step of filling in a table. A table called the ANOVA table, which is a pretty standard output that's provided, the full ANOVA table, which really just leads you to that F statistic at the end. The variability that's in your data, the fact that all those responses are not all the same, usually comes from two sources. The differences between the groups because they all had different teaching methods, so there's differences that way. And then there is still differences within, because students that had the same teaching methods still have different scores. So between the groups and within are due to error. So we'll find these two sums of squares and then we have to work out the degrees of freedom. So the very first degrees of freedom you saw which is for your one-sample t-test is just n minus 1, right. However many things you have minus 1. I need the degrees of freedom for groups. So if I have three groups, how many degrees of freedom for groups will I have? Number of things you have minus 1. So number of groups minus 1 is my degrees of freedom here. K represents the number of groups. Number of populations you're comparing. So my degrees of freedom for the group's component is just how many groups you had minus 1. You lose one degree of freedom for estimating that variability. The sums of squares for error. This is the one that's like that Sp squared idea. So when you did n1 plus n2 minus 2 and we said we're going to extend it to three groups, it would be n1 plus n2 plus n3 minus 3, which is generically your total sample size minus however many groups you have. So that's the degrees of freedom here. Your total sample size minus the number of groups, that is, n1 plus n2 minus 2 if it were two groups. What do those two add up to be for your total degrees of freedom for the entire data set, throwing it all together? N minus k plus k minus 1 gets you back to n minus 1. So with these degrees of freedom we can now compute these two variance estimates, that first one called the mean square for groups, which takes the sums of squares between the groups and divides by the appropriate degrees of freedom and that's a variance estimate that works if H0's true. Otherwise, this one tends to be really big by the design and how it's calculated. The other mean square, mean square for error, again takes the sums of squares for error over the degrees of freedom, n minus k, and that's the variance estimate that does a good job generally. It's the one we will use when we do confidence intervals next week, maybe Thursday but most likely next week. This is really Sp squared extended to three groups or four groups or however many groups you have. So we've got these two variance estimates that can be worked out then. The test statistic is an F statistic. It's the ratio of those two variance estimates, the mean square between the groups on top, the error one on the bottom. The one on the bottom does a good job generally. The one on the top does a good job, should be about the same as the one on the bottom if H0's true, but otherwise the top one gets to be really big. You add up a whole bunch of other positive stuff that gets added in. So I'll be able to look at that ratio and see whether it looks like I should reject or not. Now for you, you've done z-test and t-tests and you have a sense now that if you calculate a t statistic or a Z and you got a test statistic of 3.8, would that seem extreme to you? Because you're talking about being like almost four standard errors away from your H0. But I don't know what's extreme for an F statistic yet. I need to know the distribution of the F statistic so I can have a feel for what values you might see under H0 and which ones are starting to be extreme or unusual. So your Z statistics have what model that you use to find the p-values? The model for H0 for your test statistic. Z's, z-values, bell curve, what kind of bell curve? The normal (0,1), otherwise known as standard normal. Every t statistic you have, if you're finding the p-value you're drawing the picture of a distribution for that t statistic, which is always a t distribution with a certain degrees of freedom. What kind of distribution do you think we'll have for an F statistic under the H0 being true? An F distribution. Clever names. An F distribution indexed by the pair of degrees of freedom rather than just one set of degrees of freedom, because there's a numerator and a denominator. Now these F distributions they don't look like the other distributions you've been working with. The F distribution is definitely not symmetric around 0. It's what, skewed to the, which way? Skewed to the right. You can't get variances or standard deviations that are negative. So when you put a ratio of two variance estimates, you can't get any negative numbers. So that's why it starts out at 0. It is skewed to the right. In order to pull off numbers from an F distribution and probabilities under our curve for an F distribution, we need big tables. If I were to put the F tables on your formula card, we would be able to make a little house around you to be able to take your exam in. The F tables are long. So we're not going to have you look up on all these different tables the bounds for p-values. We're just going to pull the p-value off the output that is provided by SPSS. But I do want you to be able to know what the distribution looks like and the value of 1 of course is what you'd expect to get if H0's true. That's sort of your expected value or your balancing point, because 1 would be the ratio being the same on top and bottom. That's exactly what you should see if H0's true. If I got a value of say 3 for my F statistic, I would find the p-value as the probability of getting what you got or more extreme, assuming H0's true, assuming this is the model I should be using then, the H0 model. More extreme for F-test though are always if it's too big, to the right. So this little tail area here for example might represent my p-value for a test. But instead of having to try to find some bounds for it from a table that's really big to work with, we'll report it from SPSS. All right. So you have a section on your formula card that is the ANOVA section. Do you need to look at it or reference it on Thursday night? No. I've already had a student from yesterday's class say do I have to know that F stuff. No, you don't have to know the F stuff for the exam. Eventually you do for, it's on the final for sure with a good number of points behind it. But the F-test we're not going to have on the exam. It is so for comparing many means and I just want to show you that you have lots of formulas, but you don't have to memorize any of them. And when you have SPSS give you the output, you hardly have to use any of them that are there. But let's take a look at that, some data then. Let's start looking at not just formulas, but actually some actual values which if I were giving it to you on Thursday night it wouldn't have three different drugs, it would have two drugs to be compared. The background of this experiment for comparing three different drugs for treating some condition. We are going to measure a quantitative response. It has something to do with time to relief. How quickly did you feel relief from your condition or symptoms? I think this is in days. And so we have 19 patients in this clinical trial. We randomly assigned them to one of three different treatment groups. The randomization of your subjects to one of three treatment groups without matching or pairing them up in anyway means you have independent random samples. 19 patients means I can't balance the design perfectly. I do have five subjects that were given drug one, seven subjects given drug two, and seven given drug three. So those are my sample sizes. My total sample size is 19. And I'm comparing three different drugs simultaneously in this experiment, so my k, my number of groups or populations I'm comparing are three. So there's my data. It's a quantitative response. So what's the first thing we should do with any quantitative response? Well you've got a clue what's right underneath it. You should graph your data in appropriate ways. What does the data trying to tell you descriptively? Not just with numerical summaries like just calculate averages right away, but look at the data. So this question I could ask you on Thursday night. I would ask it in the two-sample problem with two groups. The box plots side-by-side could be given to you. And these are the four assumptions again for doing both ANOVA here, but the two-sample t-test. What assumption is best checked with this particular graph? This was on the Sunday review a little bit. We commented on it there, as well as last week. Not too bad. If you have to write out the conditions for doing a two-sample t-test, you might have your checklist, but this is written out in a little bit more form. Because if you just say normal as a condition, I'm going to want to know what's normal. What are you saying has a normal model? One normal model, two normal models, right? If you just say equal variance, that's not enough either, because what do you need to be talking about? Not just equal variance for your samples, but equal variance for your populations. Your populations have the same standard deviation or the same variance. That's how you would state that last condition. As far as which of these conditions is best checked with this graph, indeed most of you are picking the equal variances for the populations. Good. Populations have equal variances being checked by this graph. I had some students say you're checking that your three samples are independent, because there's three separate box plots there. So they must be independent. Couldn't you make the box plots side-by-side quite nicely of the before measurements and after measurements in a paired design to see whether things went up generally. You would be looking really at differences, but you could make before and afters. That doesn't make those two sets of data independent anymore. The independence is mainly by the design that they randomly allocated subjects to groups. The randomness of each sample has to do with can you say that those 19 subjects really reflect the general subject pool? The normality would not be checked here. It would be better checked with what? A Q-Q plot or three Q-Q plots? Three. Mm-hmm. All right. So equal variances in the populations. I'm not looking at the population variances. I'm looking at the sample IQRs, the interquartile ranges here. But I can use that to assess. What would you conclude about that condition? Does that assumption seem reasonable based on this so far? Are the IQRs exactly the same? No. But this is a sample results, just like your S's when we look at them in a minute don't have to be exactly the same. I think they look pretty good, pretty comparable. I also don't see outliers, which is nice. But I still would want to see three Q-Q plots to check the normality, especially with my sample sizes here being quite small, 5, 7, and 7, would definitely need to check that. All right. Which drug right now would you like to get? Lower responses are better, because then you would be feeling better more quickly. So drug three seems to be the winner descriptively. We'll decide through a test whether or not that is the case. Is drug three really the overall winner? Are drugs one and two equally as ineffective or is maybe drug one a little better than drug two? We'll check that out. All right. Let's write the hypotheses out. I only have three populations I'm comparing three population means, so I'm going to write mu1 equals mu2 and I go one more to 3. Don't write it out generically to muk when you actually have a problem you're working on. And remember how we write out the alternative, you have to say it some words, because you can't really, you don't want to have to list all the possibilities. At least one population mean response, mean time to cure, is different. You're not saying that they're all different. That's not what you require that they all have to be distinct from each other, just at least one of them will be different from the others, and it might be that they all are. We will see. All right. There's our hypotheses, so step #1. I know you could do this. You could calculate the X-bars. And I'm not going to have you sit here and get your calculators out and do that. I'm going to give you the X-bars, but you would average those first five numbers in your calculator and get the overall average, which would tend to be what? About 8, 8.22 days. The average of those seven observations for the seven subjects in drug two, their average time to cure was the higher one, 9.3. And finally drug three which we saw descriptively looks best in the box plot, and indeed it has the smallest sample mean. The average time to cure for the student or subjects that had drug three only 6.8 days. Would you be able to get the S's if you had to if it were a homework question? Could you throw the five numbers maybe in your calculator or work out the couple of squared terms? I would do it with a calculator or computer. But those S's could be found. How would that first S be found? Just to remind us. First observation, 7.3, how far away is it from the sample mean for that group? 8.22. Square that. Do that for all five terms. So the last one there is 9.5. Especially with these not so nice numbers, I would not have you do this on an exam. You would get the S provided for you. You divide by the degrees of freedom for that sample. There are what? One, two, three, four, five subjects so 5 minus 1, and here is that S squared, 2.74. And I'm going to give you the other two sample variances, 2.61, 2.56, for those seven and seven observations. What are some observations that you see now with those three numbers there, those S squares? Bless you. Are those S squares the same? They're not exactly the same, but are they comparable? Same kind of thing you'd look for Thursday night with two sample variances or two sample standard deviations. I didn't put units here because these are the sample variances. If I were going to put units on each of those, they would be not days, but days squared, which seems kind of strange. Good. All right. We'll comment then that these are similar. What does that support? It supports the equal population variance assumption that is needed for ANOVA that would be needed for a two-sampled pooled t-test. All right. So let's, one set of notes had the sums of squares. On a homework, your homework 8, you might have to calculate it one sums of squares somewhere. Otherwise, most of the time this is done through SPSS. We'll get the X-bar overall so we can get those sums of squares. I need to average all 19 numbers. 7.3 plus all the way down to the 5.2. All 19 values averaged out here end up being 8, 8 point, I'm going to carry it out a bit, 0095. What would be a check to make sure that that looks reasonable? What are you comparing that number to? How about those three X-bars that you just calculated? And it should be somewhere in between. All right. Now we have everything we need to get our sums of squares. So as said, these sums of squares are not too bad. There's really only three things we got to do here. Let's take the first group. How many observations in that first group? How many subjects in that first drug group? Five. What was the average for that first drug group? 8.22. How far away is it from that overall average we just found of 8.0095? There's one term. Can you write out term #2? Move on to the second group with seven people. A higher sample mean. And the third and final term. And so if I gave you steps 1 and 2 summarized, you could find a sums of squares pretty easily, a little bit of plugging in. We get a total of 21, 21.98 is this first sums of squares between the groups. Each sample mean for a group compared to the overall mean, between the groups. One more sums of squares and then we get to throw it in the ANOVA table and we'll analyze that, those results next time. So the next sums of squares, three terms. Each term looks like this. First one then should take the sample size minus 1 degrees of freedom. I'm going to put next to it the 2.74. Do I need to square the 2.74? It says S squared here. But in step 1 we calculated S squared, right. We didn't take the square root and report it as a standard deviation, so that is the sample variance. So let's do the next term of 7 and a really similar S squared. And the final sample had a 2.56 as that sample variance. And these three sums of squares added up is a 41.98. Now I might often on an exam give you one of those, then you just have to find one other one that you like the best, and get the other by subtraction. Because once you have two of them, there is a step 5 that says to calculate this total sums of squares with all 19 terms added up, but we'll just say no thank you. I don't want to do that. We would much rather just add the two up and put it into our ANOVA table. So our last step today of material, we're going to plug in these numbers into our ANOVA table to get that first F statistic and analyze those results next time. Sums of squares for groups was the 21. And the sums of squares for error, and don't cheat and look at the table right below it. Cover that one up. With our calculations rounding a little bit here, our two sums of squares and they add up to 63.96. The generic table is on your formula card, so it would tell you that these degrees of freedom right here are k minus 1. What's k again? The number of groups you're comparing. We've got three drugs. So this is 3 minus 1 or 2 degrees of freedom. Your formula card would remind you that the degrees of freedom for the within our error is your n minus k. So that's your total sample size. How many subjects all together? 19. And then we had three groups. So 19 minus 3, 16 degrees of freedom. Then just add those two up, 16 plus 2. And 18 makes sense as an overall degrees of freedom because I had 19 observations all together, 18 degrees of freedom. Now the mean squares are just your sums of squares over the right degrees of freedom. We're going to form our two variance estimates by taking the sums of squares over the degrees of freedom. So I just cut that 21.98 in half. So almost 11, 10.99. There's the first mean square, the first estimate of a variance that's common. Here's the next one, 41.98 but I get to divide that by the 16 degrees of freedom, and that's about 2.6, 2.62. Now remember both of those mean squares are estimating the common variance. The one on the bottom looks more reasonable to me. That's the 2.62, that's your Sp squared, the pooling of your S squares. That one works in general. It's a good estimator. The one on top also works well if H0's true and otherwise it tends to be really big, and 10 is looking a bit bigger than this 2.6. Let's look at how big in ratio. We put the mean square for groups on top, the error mean square on the bottom to get our F statistic, 4.2. We should get an F of about 1 if H0's true, because that means the two things were about the same, top and bottom. If H0's true, it should be the same on top as the bottom. This is a bit more than 1 and perhaps significantly more than 1. Again, you won't find the p-value for that 4.2 yourself, you'll look down at the SPSS output under the sig column, which is what SPSS calls the p-value, and be able to pull off that p-value to make your decision, which we will do together on Thursday. So we're going to turn to Exam 2 review. She'll have fun, fun, fun when her daddy takes her t-test away. Take away those t-tests, okay fine. Let's do review. If you forgot your sheet, I brought a few. Look over front page. We're going to through those three today. Talk it over with your neighbor if you've already tried it. Otherwise, please do try them now. Clickers on the true/false coming up on question 2. We'll gather back in about two to three minutes. So another minute or two to try the first three questions. Commit to a true/false. What would you circle if it were the exam night on #2? And we'll go through these in just a minute. All right. Why don't you help me out with #1. Don't you have a homework question that's also asking about alpha, what that means, statistically significant kind of thing. All right. Statistically significant. We say the data, we say the results are statistically significant at level alpha. What does that mean? In order to be statistically significant, does your alpha need to be .05? Have you always done every test at a 5% level, alpha has to be .05 to possibly be statistically significant? No. Alpha of 5% is your common one. If we, we said that if we ever forget to give you an alpha you need to do a test, you could use .05. Now we have learned that you usually use alpha of 10% if you're doing that Levene's test for checking a condition of equal population variances. But 5% is a reasonable alpha, it's just not required to be statistically significant. Why would you like alpha to be small? You don't need the alpha small to be statistically significant, but why is alpha being small good? Because what else does alpha represent? It's a probability right? Probability of a? Type 1 error. It does represent the probability of Type 1 error, alpha. So you want it to be small. You don't want your chance of rejecting H0, going with the new theory when you shouldn't because H0's true, to be too high. So you set that to be 1% or 5% or 10%, something reasonably small. So yes, alpha can be 5%. You do want it to be reasonably small, but those are not required to be statistically significant. It has something to do with your p-value compared to alpha. And we said don't get this backwards. So stat sig means p-value is large, larger than alpha or small? Small. Needs to be small, less than alpha, but is that the only one? Less than or equal to. If your alpha is set to be 5% and you do the calculations and your p-value is .05, you just met the requirement to be able to say I have enough evidence to reject. So you reject H0 if your p-value is less than or equal to alpha. Maybe 5%, then you are statistically significant at that level alpha. Good. I've had students on the top of the exam, p-value less than or equal to alpha, reject H0, stat sig, right? Those three things go together. All right. How about true/false, let's try them. So click in quickly what you think are for these three confidence interval questions. Talking about skipping breakfast regularly for all young adults. And you're provided the 90% confidence interval for that proportion of all young adults who skip breakfast regularly for that population proportion p. First statement is about the margin of error. It states that this interval has a margin of error of 8 percentage points and you are saying, there it is, most are saying false. Good. It's not 8 percentage points, is it? There is an 8 there, 20% to 28% is 8%, but that's the width. What is the margin of error please? It's the half width, so 4%. It's what you go out from the middle each way to get the ends called the interval. So your margin of error is 4 percentage points. Can anyone tell me what must have been p-hat for this problem? What proportion in the sample of young adults must have said I skip breakfast regularly? 24%. Why? Because that's the midpoint. And that interval was made using p-hat going plus or minus what we call a margin of error, which might have been the conservative one or not. It didn't say conservative here, so I'm assuming it's the regular one. But that z-star in there times that standard error is your margin of error, which was 4%. It's always the midpoint. So 24% of the sample must have said yes. All right. Good. Stop that. Let's look at the next one. This is one is looking at a condition needed. True or false for B? Every confidence interval, every test has a set of conditions and those conditions need to be met for that interval or test result to be valid. Almost every one of them somewhere has something about their sample or samples need to be random samples. But then there are other ones that come into play. If we're being able to make this interval and have it be valid, do we need this condition? Most of you are saying true. Guess what it is? It's false. Now I put it in here and I believe it's in one of the old exams, because I wanted to have us look at this again before the exam. There is normality conditions for doing some tests, but they have been for the t-tests in terms of the populations having a normal model. So let's think about this here. What are we trying to learn about? We're trying to learn about the population of all young adults and the question that I'm measuring, the question of interest is whether you skip breakfast regularly. Yes or no? So for each person in this population, you have their response, which is a yes or no. It's a categorical response, right? When you're trying to learn about a proportion it's because your response is categorical, yeses and not yeses, yeses and nos. They're comparing two proportions, it's categorical responses. That's what my population of values look like. I could record the values of being a 1 and 0 or something for the yeses and nos, but it's just two outcomes, categories. So do you have a normal model to describe that population of responses? Would you have a bell curve to fit the data that's in that population? No, it would be a bar chart or it would a stick graph, a discrete model with yeses and with nos. So we do not assume that the population that we're taking our data from has a normal model for doing proportion stuff. But you're thinking Z's, right, proportions means I do a Z interval, a z-star, or a z-test statistic perhaps if my sample size is large enough. Z's are normal, but you get that normality for something else than the population. What do we require? What are the two conditions? You go to your population and you take out condition #1, that you took a sample that is a? Random sample. You have a random sample from our population. And it would be a bunch of yeses and nos. And then what are we going to calculate on that random sample? If you're trying to learn about the true proportion what is it, the true proportion of all young adults who skip breakfast and you got some data, you're going to calculate p-hat, which turned out to be 24% for this sample. All right. In order for you to do your z-star in your confidence interval, the second condition that's required isn't that you have normality for this population, because that's a categorical response, but that you took a large enough sample size. That your sample size n is large enough. And what was the requirement for being large for proportion problems? That at least 10 rule, right? The 25 or 30 or more was for means, for the Center Limit Theorem to apply for sample means. The n being large for proportions back pre-exam 1 stuff, was that at least 10. The conditions on your formula card say when you look at the normality approximation, they write it with the n and the p. But you don't know p. And you're doing a confidence interval so all you got is p-hat. So you check to make sure in your sample you have at least 10 yeses and at least 10 nos in your sample. Now if you have a large enough sample size, this says then p-hat will be as its model approximately normal. The normality that you get is for the statistic, this p-hat, which is a number now. P-hat is 24% here. Do it again, you might get 27%. It varies, but it has a distribution and that distribution is approximately normal, which is why you get to use Z's. But the original population is not normal. In t-tests, one of the conditions that's always there about the population or populations if there's two, is that the model for the response is normal, the population model is a normal model, because you're having a quantitative response there. So proportions, the original population isn't assumed to be normal, but I do need a large enough sample size for my random sample so that my statistic when I standardize it I can make it a Z. That p-hat will be approximately normal. All right. One last true/false having to do with kind of interpreting. We made a write up, an interpretation. This is a 90% confidence interval. So we're going to start out with that if we repeated this poll many times, and what do you think this statement? If it were in a list of statements that you were trying to decide if they're reasonable to put in your report, would you include this one? Is p valid? And what do we have. Most of you saying, not quite as many, but most of you saying false. It starts out okay, but it slowly, quickly actually goes downhill. It is false. What does it say? If you repeated the poll many times, all right, 90% of the time the proportion of all young adults who skip breakfast will be between these two numbers. Is that what we can say? 90% of the time the true p is going to be between 28% and 20%? When we're trying to interpret that 90% confidence level we're talking about the process, we should talk about how well it performs if you were to repeat it, but you shouldn't go back to the one interval you got the one time you did it. Because when you repeat it, you don't get just 20% to 28% every single time. The intervals are going to vary, because the sample results vary. And so if you remember the picture we drew with our first confidence interval, we drew a bell curve, we said here's the 95% range, and then we started looking at possible intervals. Or that simulation where you pushed a button and it generated 100 intervals, some were green and some were red. 90% of the intervals, not 90% of the time. 90% of the intervals are expected to do what? Expected to be good intervals, by that we mean have the population proportion of all young adults who skip breakfast falling in it. 90% of the intervals are expected to contain the population proportion of all young adults who skip breakfast regularly. The 20% to 28% is the one I got this one time. I'm really hoping it's one of those good intervals. It either is a good interval or it's not. I can't tell you because I don't know the true proportion p to know if it falls in between there or not, but this statement that's wrong is trying to tell you that p changes and 90% of the time it's going to be between these two numbers and 10% of the time it won't be, but p doesn't change. P is a population proportion that's fixed, that you're trying to learn what it is. It's the intervals that would change from one sample to the next, but most of them will be good ones. All right. Good. Let's take a look at this last question, question 3. And question 3 has to do with what? Looking at weekday lunch customers at a restaurant, how quickly they can be served their meal. Looking at the population of all weekday lunch customers, the claim is that the mean for that population is 15 minutes. That the standard deviation for that population is 2 minutes. That's the mean and the standard deviation for the population of all customers' times for being served their weekday lunch. So I'm imaging this population of all customer times for being served their lunch and we're told the mean of that population happens to be 15 minutes, that's the claim anyway. The standard deviation supposedly is 2 minutes. That's mu and sigma for this population. What are we going to do, we're going to go to that population and take a random sample. We're going to take 64 customers, record how long it takes them to get their lunch and calculate the sample mean. So I'm going to pull 64 values out of that population at random, random sample of 64. 64 kind of large? Large enough to may be help us if we need it to. 64 values out and I'm going to calculate X-bar. Now I'm not giving you X-bar here, but I want you to give me a picture of what I might get for X-bar. To give me a picture of the distribution for that sample mean. It would show me what values you could get for the sample mean and how often they tend to occur. I want the distribution and I even give you a clue it's the approximate distribution here of the possible values for the sample mean. So where do we have sample mean on here? We do have it on page two under population mean and sample mean there, but we learned the foundation of doing t's, t procedures, from the sample mean distribution on page one. I'm actually going to be using a very famous result here to draw this picture, right? What famous result? So important to be able to allow us to do some of our testing that we want to do? CLT. Why am I using a CLT, because I didn't tell you what model this population has. I have no idea if the customer times if I graphed them would be bell shaped, skewed, or what. But I'm taking 64 observations from there and I'm averaging 64 numbers. What is the distribution for the possible values of the sample mean? That tells you put the sample mean down here as your label, because that's the distribution you're giving me the picture for. You can write it as X-bar or sample mean, sample mean values. That's the label for your x axis. And what kind of model do I get for averages? Averages start to look approximately normal. Approximately normal. Where would it be centered at? Sample means should vary around the true mean mu. The true mean mu, which is supposedly 15 minutes. What about the standard deviation? Do I draw my tick marks going out by twos? Yes or no? Two is a standard deviation, but the 2 up here is the standard deviation for individual times, for individual customers. It's not what the variability would be for averages of 64 customers. Averages vary less. It is on your formula card, bottom corner, that says oh yeah standard deviation for an X-bar, for averages, is not just sigma, but it's sigma over square root of n. So I need what there, 2 over well 64 is a nice number. 2 over 8, one-fourth, .25 minutes. Now I can draw a few more values. I would go from 15 minutes as my mean to 15.25 to 14.75 for one standard deviation below, and I could put a few more there too. That's the model. What are the, what's the name of the result that let's you draw this picture? CLT. Now on the review on Sunday, very similar question, it was question 3 on that review too. I did not have you draw the picture for the model for X-bar, but I went right to saying calculate a probability that you might get a sample mean and in that problem it was as high as 53 or higher. I could ask you part B here. So how likely would it be that the average time for those 64 customers could be as high as 15.5 minutes or higher? How likely could it be that it might be 15.5 minutes or higher? Would you be able to give me an answer to that pretty quickly without looking up or calculating a Z score, because what is the Z score at 15.5? You're not one, but you are two, two standard deviations, I'm thinking how much in between two and plus two, minus two? 95%. So what would it be an approximate probability of being in that tail? About 2.5%, and I would accept that answer without going to Table A.1 if that were your Z score. We're doing Z's here why? It's means. Aren't you supposed to do t's for means? But t's are for testing hypotheses about an unknown mean that you don't know what it is. And when you don't know sigma for that population either and you're trying to do a test to find out what is the mean based on just a sample. This is about Z's because the model for X-bar is, is normal by the CLT. All right. Thanks for coming. See you on Weds, on Thursday.