Tip:
Highlight text to annotate it
X
>> All right, moving on. The second problem is very similar. We're still talking about
sampling distributions here, however, now we're talking about qualitative data. We have
a category, a characteristic. In this case, it's being left handed and we have the 30
percent of pitchers are left handed and he wants to know the probability that in a sample
of a hundred pitchers, less than 25 percent are left handed. So note, there's nothing
in here about standard deviation. It just gives us these percentages. So a few things,
instead of mu--whoa, turned out weird, instead of mu, we use P and this is the population
proportion.
[ Pause ]
Let's see some other things that we need to know. I guess for now, that's really--well,
that's not true. And instead of X bar, we use P hat and what P hat is, is the sample
proportion. So it's kind of annoying that we're introducing new symbols, but we kind
of have to because we don't have a population mean and a sample mean. We kind of have the
proportion of people that have a given characteristic. We're talking about qualitative data not quantitative
data. So we don't have means and standard deviations. So how do we do this problem?
Okay, so first of all, what's the problem asking for. Wants to know the probability
that in a sample of 100 pitchers, so there's our value for N, our sample size, less than
25 percent are left handed and what's given in the problem is that 30 percent of all pitchers
are left handed. In other words, our population proportion, the thing that we'll use the letter
P for is 30 percent. And what the question is asking us to find is what's the probability
that P hat, our sample proportion, is less than 25 percent? So we want to know this thing
right here, this is what we want to solve. This is the analog to this statement. Kind
of annoying that we have the letter P use for multiple things. P right here is the population
proportion, P right here is the probability. Just coincidence that they're the same letter
and they mean very different things. It's kind of annoying that such is life, this is
saying what's the probability that P hat, your sample proportion is less than 25 percent.
We could figure that out if we could find the shape, the center and the spread. And
unfortunately, we can figure out those things. Shape, it's going to end up being approximately
normal, but we have to justify that. Maybe I can continue with what I'm doing over here.
For our shape, we used to need, so I'll write down here, instead of needing N to be greater
than or equal to 30 for it to be approximately normal, we need P, hurry up, times 1 minus
P times N. So maybe I'll say NP times 1 minus P to be greater than or equal to 10. So a
whole new criteria here. We used to need N to be greater than or equal to 30, now we
need this product to be greater than or equal to 10. So we should probably check that, approximately
normal because 10 times P which is 0.3 times 1 minus 0.3, 1 minus 0.3 is 0.7. Sorry, this
isn't 10, this is a hundred. The value of N is a hundred, N times P times 1 minus P.
If you multiply these all out, you get 21 and 21 is greater than or equal to 10. So
it fits these criteria here. Our distribution is approximately normal because of this. That's
the only criteria I care about. A point worth noting here is that your book also cares that
your value for N here is less than 0.05 times the size of your population. So we need N,
our sample proportion, so book also wants N to be less than or equal to 0.05 times your
population size, capital N. I don't care about this. In our examples, our population is always
going to be so big, this capital N will be such a huge number that our N will always
be smaller than 0.05 times this huge number. So this part won't matter at all, but I just
mentioned it for the homework. So anyways, we got our shape as approximately normal,
our center--what's kind of nice is our center will always be given by P, so P equals 0.30,
and then we can figure out what our spread is. First of all, a new symbol, instead of
sigma, which we've already used up here to represent the standard deviation of the parent
distribution and then sigma sub X bar is the standard deviation of X bar, kind of down
here instead of X bar, we'll use P hat. So what we want is sigma of P hat. But that's
just notation, don't let that throw you off. It's just saying that the thing that we'll
use for spread down here in these types of problems we get by doing P times 1 minus P,
dividing that by N and taking the square root of the entire thing. So in this case, we would
say, P is 0.3, 1 minus P is 0.7. If I divide that number by a hundred and I take the square
root, I'll get what the spread is, but that's not something I can do in my head but fortunately,
I got a calculator, the square root of 0.3 times 0.7 divided by 100 and that gives me
this number of 0.046, 0.046. So now, we have a center and a spread so we can answer the question
that it's looking for. We can draw our picture and we could answer the question. These are
a little bit trickier because you get all these decimals and they kind of throw people
off, but don't let it throw you off. Think about it like the above example. Here's our
center, here's our spread, we want the probability that we're less than this number. So we could
still draw the exact same looking picture, we get our approximately normal distribution,
which there's a poor drawing of it. We still have--our center is 0.30, so we still put
that right in the middle and now our spread is 0.046. So we're going to go up and down
by 0.046. That's not a very round number so I'm not even going to bother doing that, I'm
just going to put 0.25 in my picture. 0.25 is less than 0.30, it's about one standard
deviation below. Right about there would be a good place to put it. And what we want to
know is what's the probability that we're less than this point, that we're less than
0.25. So that's this shaded area right here, slowly shaded area, good enough. So all we
got to do is find this area. I know the numbers look very different, the process feels different,
but it's not. We're going to do the exact same thing. To find this area, we're going
to use the same function as we did above at normal CDF. And the arguments will be the
same, the left bound, the right bound, the center and then this spread, the mean and
the standard deviation. Left bound, we don't have a left bound so I'll put in some large
negative number, maybe negative 9999, that should be plenty large enough negative. The
center is point--or the right end point is 0.25, the center is 0.30 and the standard
deviation is this 0.046. And if we type all that into a calculator, we'll have our answer
and we'll be done with this thing. Again, we got to get normal CDF which is under our
distribution menu and then if you enter all those inputs, negative 99999, make sure you
use this for the negative, not this subtraction sign. And then we put comma in, 0.25 is our
right end point, 0.30 is our mean and 0.046, be careful with all the decimals, is our spread.
You enter, it gives you 0.1385. In other words, the probability that in a sample of a hundred
pitchers, less than 25 percent of them are left handed is 13.85 percent. So there is
a long tutorial, I didn't mean for this to be quite this long on sampling distributions,
there are two different types that we'll see in this class. The second one was when we
have qualitative data and this first one is when we have quantitative data, so I hope
that helps.