5 Anova - Effect size and power

Hello, friends. We will continue our journey through comparative analysis looking at effect size and power. Though I have chosen to discuss effect size and power under our ANOVA videos, I wanted to point out that this is exactly like kurtosis and skewness. This reaches well beyond ANOVA [INAUDIBLE] t-test, ANOVA, MANOVA, or any of a group of other processes. Effect size iis the distance of the actual value that we gain for the mean from the data set from the anticipated value for the mean. Consider the following. We might have a group of cars. And we expect that that group of cars would have a mean mile per gallon of 30.5. We take a sample, and we found that the mean for the sample is 28.3. We have a distance between what was anticipated and what the actual value really is. We call this the effect size. And the effect size in this case would be 2.2 miles per gallon. Now the effect size is the distance between the means of the variables. You will notice that in ANOVA, we have three groups, and we have a distribution of the value that we're looking at for each of those groups, so we would have a difference between the mean of each of these. The effect size is the distance of the actual value from the anticipated value. Again, you want to consider the following. Effect size may be strong, it might be moderate, or it may be weak. So we have three values and effect size that we can have are strong, moderate, or weak. Now, we will use a coefficient known as the partial eta squared to discuss effect size. And of course, we in math and statistics love these little Greek symbols. Eta squared will stand for the coefficient that gives us the effect size. Now, a strong effect size, which is eta squared greater than 0.14, means that if we go into samples, and we're comparing two groups of three groups, and we select a value at random, a strong effect size means that it is very likely that we can look at that data point based on its value and determine which groups it belongs to, because the distances between the groups were so profound. A moderate is between 0.6 0.14. And that means if we randomly select a data point, it might be identified as to the group which it belongs to based on its value. And weak is between what 0.1 and 0.6, and that means that it's not likely that we would be able to take that data point and look at its value and determine the group that it belongs to. Now, I've done some very clever things for you here. In discussing effect size, I have drawn you a parallel that the effect size tells us, if we randomly select a data point, how likely it is that we would be able to predict the group that it belongs to. That's what effect size really is all about. Now, I want to give you an example of where you might have a strong significance but a very weak effect size. In the 1970s, the analysis of GRE scores indicated that men scored higher than women. Well, you know the guys jumped all over that and said, that means we're smarter than women. Well, I'm not going to make that statement. I don't believe that at all. Sometimes my wife scares me to death. But what we had actually in the GRE test at that time, and you still have today, is a bias towards the engineering fields, the math fields. And in the 1970s, of course, those were dominated by men. But they found a significant difference between the scores. Now, the difference in the averages was very, very minute. However, the number in the sample was enormous. So this made that small difference significant. You remember something about the difference being divided by s over the square root of n from your initial statistics? So the more that you had in the sample, the smaller that denominator becomes, and you divide it in. Man, it makes a big z score, and lo and behold, you've got a great significance with that. Well, the fact is is that the random selection of a participant yielded no likelihood of predicting the group based on the scores. I mean, the value was just minutely differing. But because the number was so high, it made that little bit of difference significant. Well in fact, that was a very, very weak effect size, which meant that it really didn't have any meaning. You can have significance and lack any meaning whatsoever in the results that you have. Now, power is about the probability that the test will reject the null hypothesis when the null hypothesis is false. Power analysis can be utilized to calculate the minimal sample size required so that one can be reasonably likely to detect an effect of a given size. Power can also be used to compare different statistical testing procedures. We might compare a parametric design to a non-parametric design, and find that one has more power than the other, the power being the probability that we will reject the null hypothesis when the null hypothesis is false. I want to share with you this table just a minute. And it might make this a little bit more clear, the options for evaluating the null hypothesis. Now, the possibilities are that the null hypothesis is true or it's false. And our actions could be that we do not reject it or that we reject it. So it's either true or false. We're either not going to reject it or reject it. If we have a true null hypothesis and we do not reject, we made a correct decision. And that's where alpha comes in. Alpha is equal to the significance, the likelihood, the little error that we're willing to live with. If we reject the null hypothesis when it's true, this is called a type I error. And if we fail to reject the null hypothesis when it's false, that's a type II error. Now, the type I error is really 1 minus alpha. And a type II error is going to be 1 minus beta, where beta is the power. If the null hypothesis is false and we reject it, that's a correct decision. That's where power comes in. Power is represented by beta, the probability that we will reject the null hypothesis when it's false. Alpha is the probability that we will fail to reject the null hypothesis when it is true. Well, my friends, I will show you how to do effect size and power. You'll recognize the data set again, the percent women and the group. We'll go up to Analyze. Now, if you were doing ANOVA, you'd go to Compare Means and One-Way ANOVA. We're not doing ANOVA. Let's go to General Linear Model. Let's go to Univariant, because we have one independent variable group. We will translate percent women into the dependent variable, group into the fixed factor. We will go to the Options. In the Options, we will select Estimates of Effect Size, and Observe Power. And you see we could put a lot of other things in there. And we want to display those for overall. Here we go. And let's say git 'r done. And here comes our analysis, just that quickly. Well, let's take this SPSS readout now, and let's go through and see if we can find what we look for. Now, keep in mind that we did some statistics that we really didn't have to do. We have descriptives and a Levene's test for homogeneity [INAUDIBLE] variance. We didn't need that. What we want to look at, though, is this thing, Tests of Between-Subjects Effects, and this little area, partial eta squared-- 0.167 for the corrected model. That is a strong, strong partial eta squared, and a strong effect size. That means that these variables in the groups differ so much that if we randomly pick one, we're very likely to be able to tell which group that it goes to-- we'll look at the post hoc test shortly-- one set or the other. Then also when we come in here we want to look at the power, let's do the observed power-- 0.995. This is a very powerful test. We've done well to get this far. Well, my friends, I would never want to close one of these videos without thanking you for your patronage. Your patronage keeps my family fed. Enjoy watching these videos. You take care of me and I'll take care of you. May the odds be ever in your favor.