Hypothesis Testing, Probabilty, And Distribution of Sample Means - Part a

>>This narrative PowerPoint that is designed to help integrate your understanding about probability, distribution of sample means, and hypothesis testing. This presentation will make the most sense if you've already had the opportunity to read workbook on these topics. To help us to go through and see how these topics are related, I'd like us to address the question, does lead impair IQ? Here's the dilemma, if you want to find out whether dead impairs IQ, let's say your theory is that it will effect brain development, there's an ethical issue. It would not be appropriate to give people, or children lead to see if it harms their IQ, right, we wouldn't want to do a treatment or manipulation that would actually cause harm to someone, and yet, it may be very important to us to find out if lead impairs cognitive development, because if it does, it would be very important to take lead substances that could get into the air, like, that could give off lead dust, to take those substances out of classrooms. Okay, so we have a situation where we really would want to know the answer to the question, but it would be ethical to actually test people to see if it has an effect. Our solution, you are inadvertently exposed to lead, and compare mental building to the population. And, because paints used to be lead-based, this would actually not be as difficult to do as you might think. Research has now shown that lead does negatively affect brain development, and there was late paying all blinds that kids would suck on his little kids were lead dust in the air, were lots of lead paint was used, and this is actually been shown to decrease IQ. So, let's say our researcher has found the group of children they are in an environment that had a lot of lead present, and that these children, for all other purposes, were normal except for this one unusual difference, that they were in lead-based environment. When you're ready to develop your hypothesis concerning the relationship between lead and mental development, you have to decide whether you're going to have a directional hypothesis, which is known as a one tail test, or if you're going to have a nondirectional hypothesis, which is known as a two-tailed test. With a directional hypothesis, you are specifying whether the sample mean should be below a world of the population mean, and so you see here two illustrations of that distribution sample means and, in one case, the, in the top case, the right tail is shaded in. This would be a directional hypothesis where you would expect the sample mean to be greater than the population mean. For example, we want to test whether a food diet will actually improve your weight gain for sumo wrestlers, you know, some major carbo diet to really blow them up, you would do a directional hypothesis where you expect the sample mean to be above the population mean. Take a look at the bottom illustration normal distribution, and there you will see that the left tail is shaded on the bottom side, and that directional hypothesis corresponds to where you would expect the sample mean to be below the population mean, that might be a diet where you are actually hoping to lose weight, and we would hope that the sample mean is below the population mean. So what researchers are going to do, when he or she writes up the search hypothesis, they're going to say, in my expect them a sample mean to be above the population mean or below it? In our case, the research hypothesis is going to be that children who would help dust, lead dust that is, will have a lower IQ scores than the general population, so we're going to have a directional hypothesis, specifically going to predict that are sample mean will be below the population mean. Now in addition to a one tail test, that is a directional test, you can also do a two-tailed test, which is known as a nondirectional hypothesis. A nondirectional hypothesis is where you want to cover both bets, you say, hey I think this sample mean is going be different than the population mean, maybe it'll be above, maybe it'll be below, I don't know, I just think it will be different. Okay, so that would be a nondirectional hypothesis, otherwise known as a two-tailed test. In our case, we're going with but one tail hypothesis, that is a directional hypothesis, we're going to predict that the sample mean will be below the population mean. We have, in essence, placed our bets for what the results will look like. We have decided to do that over the other possibilities saying that lead would improve IQ, if we did this other one tail hypothesis test, or saying, hey I think lead would just have an effect, I don't know if it will improve it or harm it, and that would be that two-tailed hypothesis test. So we're going with the one tail hypothesis that lead would harm IQ. Okay, now, what we're going to be doing is work comparing a sample to a population mean, so we need to pick the right inferential statistical tests, so we can make this comparison and see what the probability of our getting an outcome due to chance. For our particular research design, our deterrent variable isn't IQ score, which is a way to measure intelligence; we believe it will be affected by lead. IQ score is a scale variable, it is an integral scale variable, so all in the scale, it's normally distributed, that is true, and we notice standard deviation. The people who design IQ tests, actually designed and set the standard deviation will be of particular value; for example, 16. And that is going to allow us to do a Z test, so a Z test, we've done many of these the past, only it was known as a Z score, when you're comparing a single value to a population made, now we're, kind of, going up to the next level and were saying, okay let's compare a sample mean to a population mean, let's find out what's the probability of a whole sample having some average of value compared to the population. If we're going to do a Z test, there are four basic requirements. Number one, are you comparing a single sample to the population, if so you're at the right place. Number two is your dependent variable scale. It has to be scaled because that way you can see what is the shape of the distribution. If it is nominal ordinal, they were just dealing with categories that, at most, can be arranged, so we actually need a scale variable where there's equal intervals between the values. Number three, only look at our distribution, the distribution scores must be normal, or the distribution scores, if the shape is unknown, or definitely not normal, were still okay, as laws are sample size was at least 1000 or more people in it. Okay, so if the distribution of population scores is normal, were good to go, doesn't really matter our sample size. On the other hand, if we know that the distribution of our individual scores isn't normal, and we need a sample size of at least 1000, and they were okay to use the Z test. And finally, the standard deviation for the population also needs to be known. And you may say why? Why are all these requirements in place? Well, remember that with the Z test, we're going to be looking up our proportion using a Z table, and just like with scores, just as with sample means, the Z table is only going to give us useful information if the distribution is normal, if it is not normal, we can still look up something in the Z table, it will give us a proportion, but it would be wrong, right, because the Z table is based upon that assumption we have a normal distribution. We also need to know the standard deviation to know how wide is our distribution. Interestingly, you don't always have to know the population mean, in fact, later on this semester, will talk about when we're going to compare a sample mean again so hypothesized population mean, but don't worry about that, just know that for our Z test requirements, you've got a have a normal distribution, you need to know the standard deviation for it, and if you got those two, you're well on your way. Also, of course, we'd only do a Z test if work comparing a single sample to the population and likewise if our dependent variable is scale, so that covers all four requirements. When I was talking about the Z test requirements, that illustration on the left that would be a distribution of sample means, so this is a distribution where each value, instead of being a single score, actually represents a sample. Okay, now you may wonder why, if the Z table requires the distribution be normal, why did I previously say, you know what, it doesn't matter the shape of the distribution of individual scores, you're still okay if the sample size of 1000 or more, why is that true? Well, there is one key theorem covered in statistics, and anyone who takes a statistics class is expected to know this key theorem, is known as a central limit theorem, and you can read about it in that workbook, but the main idea behind it is that it doesn't matter what the shape of the distribution individuals scores looks like, if you work, for example, at that bimodal distribution that shown at the top, if you randomly sample 1000 people from that bimodal distribution and you plot that single point, then you sample another 1000 people, and then you plot another single point represent in their average, and you keep doing this, sampling 1000 people, recording a single point representative average, by the time you're done with hundreds and hundreds of the samples, your distribution of all the sample means will appear normal, guaranteed every time, and that's why you can use the Z table, because there are guaranteed that the distribution sample means will be normal. So if the individual scores are already normal, you're good to go, you don't need the sample size with thousand or more. If the individual scores are normal, the distribution a sample means will be normal, but if you're distribution of individual scores is not normal, manager sample size is at least 1000, you're still okay, that distribution sample mean will be normal.