Sampling Distribution Tutorial - Qualitative data

>> All right, moving on. The second problem is very similar. We're still talking about sampling distributions here, however, now we're talking about qualitative data. We have a category, a characteristic. In this case, it's being left handed and we have the 30 percent of pitchers are left handed and he wants to know the probability that in a sample of a hundred pitchers, less than 25 percent are left handed. So note, there's nothing in here about standard deviation. It just gives us these percentages. So a few things, instead of mu--whoa, turned out weird, instead of mu, we use P and this is the population proportion. [ Pause ] Let's see some other things that we need to know. I guess for now, that's really--well, that's not true. And instead of X bar, we use P hat and what P hat is, is the sample proportion. So it's kind of annoying that we're introducing new symbols, but we kind of have to because we don't have a population mean and a sample mean. We kind of have the proportion of people that have a given characteristic. We're talking about qualitative data not quantitative data. So we don't have means and standard deviations. So how do we do this problem? Okay, so first of all, what's the problem asking for. Wants to know the probability that in a sample of 100 pitchers, so there's our value for N, our sample size, less than 25 percent are left handed and what's given in the problem is that 30 percent of all pitchers are left handed. In other words, our population proportion, the thing that we'll use the letter P for is 30 percent. And what the question is asking us to find is what's the probability that P hat, our sample proportion, is less than 25 percent? So we want to know this thing right here, this is what we want to solve. This is the analog to this statement. Kind of annoying that we have the letter P use for multiple things. P right here is the population proportion, P right here is the probability. Just coincidence that they're the same letter and they mean very different things. It's kind of annoying that such is life, this is saying what's the probability that P hat, your sample proportion is less than 25 percent. We could figure that out if we could find the shape, the center and the spread. And unfortunately, we can figure out those things. Shape, it's going to end up being approximately normal, but we have to justify that. Maybe I can continue with what I'm doing over here. For our shape, we used to need, so I'll write down here, instead of needing N to be greater than or equal to 30 for it to be approximately normal, we need P, hurry up, times 1 minus P times N. So maybe I'll say NP times 1 minus P to be greater than or equal to 10. So a whole new criteria here. We used to need N to be greater than or equal to 30, now we need this product to be greater than or equal to 10. So we should probably check that, approximately normal because 10 times P which is 0.3 times 1 minus 0.3, 1 minus 0.3 is 0.7. Sorry, this isn't 10, this is a hundred. The value of N is a hundred, N times P times 1 minus P. If you multiply these all out, you get 21 and 21 is greater than or equal to 10. So it fits these criteria here. Our distribution is approximately normal because of this. That's the only criteria I care about. A point worth noting here is that your book also cares that your value for N here is less than 0.05 times the size of your population. So we need N, our sample proportion, so book also wants N to be less than or equal to 0.05 times your population size, capital N. I don't care about this. In our examples, our population is always going to be so big, this capital N will be such a huge number that our N will always be smaller than 0.05 times this huge number. So this part won't matter at all, but I just mentioned it for the homework. So anyways, we got our shape as approximately normal, our center--what's kind of nice is our center will always be given by P, so P equals 0.30, and then we can figure out what our spread is. First of all, a new symbol, instead of sigma, which we've already used up here to represent the standard deviation of the parent distribution and then sigma sub X bar is the standard deviation of X bar, kind of down here instead of X bar, we'll use P hat. So what we want is sigma of P hat. But that's just notation, don't let that throw you off. It's just saying that the thing that we'll use for spread down here in these types of problems we get by doing P times 1 minus P, dividing that by N and taking the square root of the entire thing. So in this case, we would say, P is 0.3, 1 minus P is 0.7. If I divide that number by a hundred and I take the square root, I'll get what the spread is, but that's not something I can do in my head but fortunately, I got a calculator, the square root of 0.3 times 0.7 divided by 100 and that gives me this number of 0.046, 0.046. So now, we have a center and a spread so we can answer the question that it's looking for. We can draw our picture and we could answer the question. These are a little bit trickier because you get all these decimals and they kind of throw people off, but don't let it throw you off. Think about it like the above example. Here's our center, here's our spread, we want the probability that we're less than this number. So we could still draw the exact same looking picture, we get our approximately normal distribution, which there's a poor drawing of it. We still have--our center is 0.30, so we still put that right in the middle and now our spread is 0.046. So we're going to go up and down by 0.046. That's not a very round number so I'm not even going to bother doing that, I'm just going to put 0.25 in my picture. 0.25 is less than 0.30, it's about one standard deviation below. Right about there would be a good place to put it. And what we want to know is what's the probability that we're less than this point, that we're less than 0.25. So that's this shaded area right here, slowly shaded area, good enough. So all we got to do is find this area. I know the numbers look very different, the process feels different, but it's not. We're going to do the exact same thing. To find this area, we're going to use the same function as we did above at normal CDF. And the arguments will be the same, the left bound, the right bound, the center and then this spread, the mean and the standard deviation. Left bound, we don't have a left bound so I'll put in some large negative number, maybe negative 9999, that should be plenty large enough negative. The center is point--or the right end point is 0.25, the center is 0.30 and the standard deviation is this 0.046. And if we type all that into a calculator, we'll have our answer and we'll be done with this thing. Again, we got to get normal CDF which is under our distribution menu and then if you enter all those inputs, negative 99999, make sure you use this for the negative, not this subtraction sign. And then we put comma in, 0.25 is our right end point, 0.30 is our mean and 0.046, be careful with all the decimals, is our spread. You enter, it gives you 0.1385. In other words, the probability that in a sample of a hundred pitchers, less than 25 percent of them are left handed is 13.85 percent. So there is a long tutorial, I didn't mean for this to be quite this long on sampling distributions, there are two different types that we'll see in this class. The second one was when we have qualitative data and this first one is when we have quantitative data, so I hope that helps.