Sampling Distribution - Quantitative data

>> All right, here's a quick tutorial on sampling distributions. I've got two problems here because there's kind of two different types, at least in our class, that we want to look at, one when we have quantitative data and when we have qualitative data. So this first example we'll be dealing with quantitative data. In this case, we're told that the batting average for an outfielder has a mean of 270 so we're given this value for a mean, oh that's going really slow. And we use the letter -- or the Greek symbol mu to represent the mean; this is the mean of the parent distribution and a standard deviation of .45, use the Greek letter sigma for the standard deviation of the parent distribution. And then what it's saying is find the probability that in a sample of 81, that will be key, this number here, the sample size we use n to represent. In a sample of 81 randomly chosen outfielders, what's the probability that the average, batting average is between .275 and 285. So the idea here, I mean the key point, the thing that you want to notice is we're no longer asking what's the probability that one person, one outfielder has a batting average between 275 and 285. We're now asking what's the probability that the average of 81 outfielders is between 275 and 285. In other words, we're trying to figure out the probability that x bar -- not x but x bar, the sample average is between 275 and 285. That will be key, that's kind of what separates the problems we've done up to this point in our class with all the problems we'll do going forward in this class, is our random variable is now x bar, it's the sample mean, it's the average of these 81 people, it's not just one person. So this is the thing we're trying to figure out, to figure this out, we need to know three things; we need to figure out the shape, the center and the spread. And now we have to be a little bit more specific what we're talking about here, specifically of our sampling distribution. [ Background noise ] So it gives us in the problem that we have a mean of 270 and a standard deviation of .45, those are the center and the spread for the parent distribution for a single outfielder, that's not what we want to know, we want to know the shape, center and spread for the sampling distribution, for the average of these 81 outfielders. The shape is going to be approximately normal, that will always be the answer in our class. However, the reason, we can now justify why it's approximately normal. In the past, we were just told it's approximately normal. If you note, when you read this paragraph, nowhere in here does it say it's approximately normal. However, we have this thing called the central limit theorem and what the central limit theorem tells you is that regardless of the shape of the distribution, when n gets large enough, the shape of a sampling distribution becomes approximately normal. The cutoff our book uses is 30 so since our sample size is greater than or equal to 30, we can say that our sampling distribution has shape approximately normal, even though that's not stated up here in the problem. And then the two other things we need are the center and the spread, the center is -- we'll get a new symbol for it, instead of just mu, mu and then in subscript an x bar, this means the population mean of the sampling distribution. The nice thing about that is that it's the exact same as mu which in this case is 270, that will always be true, can even kind of put that down here, kind of the key formulas if you want to think about it that way is the mean of the sampling distribution is the same as the mean of the parent distribution. However, the standard deviation of the sampling distribution is not the same as the standard deviation of the parent distribution, it's that divided by the square root of n. So the center, mu sub x bar is 270, it's the same as mu; however, sigma sub x bar is not just .45, it's .45 divided by the square root of n, n in this case is 81. So the square root of 81 is 9 so it's .45 over 9 which is .05. So now that you know the shape, the center and the spread, this problem becomes a lot like a chapter 7 problem, I can draw my distribution here, or at least I can attempt to. Here's a somewhat approximately normal looking distribution, I mean for that to be symmetric and I got the middle as 270 and then I can count up and down by standard deviations that are .05 in this case so this is 275 one standard deviation over 280 is two standard deviations over. And 285 is three standard deviations over and similarly in this direction, we got 265, 260 and 255. So note that I didn't count over by the standard deviation of the parent distribution, I did by the standard deviation of the sampling distribution. So what this problem wants to me to ask is what's the probability that we're between 275 and 285 so what's the probability that we end in this shaded right region right here? And the next thing is figuring that is just how we figured it out in chapter 7, we can use the inverse -- no, that's not true, we can use the normal -- whoa, the normal CDF function on our calculator. On the normal CDF function takes four arguments, the first argument is your lower end point which in this case is 275; the second argument is your upper end point, 285; and then the third argument is your mean, 270; and the fourth argument is your spread, your standard deviation. Be careful here, it's not .45, it's .05, if put it in .45, you'll get the probability that a single outfielder has an average between 275 and 285 but by putting .05, we get the probability that the average of the 81 outfielders in our sample is between 275 and 285. If it seems like I'm beating that to death, I am because it's really the most important concept with sampling distributions. So to get to normal CDF, that's under distributions so I'll hit second variables to get that, the second one here is normal CDF and then if I type down all -- type in all those values, 275, 285 -- what was my mean, 270. And my standard deviation was .05, it will give me a probability, in this case it gives me .078 so what I get that the probability that x bar is between these two values is .078 or if you prefer percentages, that shaded area is 7.8%.