Bias is Awful, but Random Selection Not Perfect Either

>> Getting a representative sample, it's not easy and it's not even guaranteed. And that's why with inferential statistics, we can actually never say we proved a hypothesis as much as we'd like to. That is, any time we select our sample there's always the possibility that it might not be representative. Now let's say that I decide to personally pick a sample, let's say a size one in which to get an idea of the typical college student IQ and so I go ahead and I pick someone. It's very possible that unconscious bias may have influenced who I picked or if not this person I picked this person or this person here. Maybe's there something going on that maybe not even I'm aware of that's influencing who I'm picking and so that is going to get in the way of me getting a good representative sample. And so if I want a representative sample and let's keep at the most simplest level, size one. Then to get rid of this bias the next best step up is random selection where every person would have an equal chance of being included in my sample. So let's say that I go with a random selection of one student to get an idea of the typical college student IQ. We'll say that IQ of college students is normally distributed and for sake of discussion, we'll go with the mean IQ of 116 and a standard deviation of 16. So with the mean of 116 and a standard deviation of 16 we have our measure central tendency right here to 116 with let's say most college students right there. That's our highest frequency. And as you go into a higher IQ, there's fewer students and as you go further higher IQs, less students and same with lower IQs. Well at what point would we consider a college student not representative of the general college student? What point would we say that person is not representative? Let's just say that we put that at plus and minus two standard deviations. Well, if we go with that definition that if you're more than two standard deviations above the mean, you're not representative. And if you're less than two standard deviations below the mean you're not representative. We would say if you happen to randomly pick a college student whose IQ is between 84 and 148, that student is at least somewhat representative of all other college students. But that if the college student's IQ is 148 or more or let's say like 149, or if the college student's IQ is, you know, 83 or some other value here in the shaded region, that college student is not representative. Well in that case, what's the probability, even though we are using random selection that our sample size one is not representative? To answer that question we would have to go to a Z table. A Z table is a table where you determine how many standard deviations the value is away from the mean and it will give you the area under the curve that corresponds. So 84 is two standard deviations below the mean so it's a Z score of negative two. 148 is two standard deviations above the mean and that corresponds to a Z score of a positive two because that's one two standard deviations above the mean. So here we're looking at a Z table in the shaded area is the proportion that can be looked up in the Z table. And we'll look up a Z score 2.00. So here we go. A Z score of 2.00. Here's a Z score of 2.00. That proportion is .023. That is, it's saying that shaded area beyond Z would be .023. So the proportion of people that we might randomly pick who would not be representative, we have .023 for those people who are two standard deviations above the mean. And we have .023 for those people who are two standard deviations below the mean. If we add up those two proportions, the .023 plus .023 our combined area, these two red areas, where we would say someone is not representative is .046. That is about 4.6 percent of the time we might randomly select someone who is not representative. Thus, the probability of not being representative as we've defined it here is .046. Okay. I hope this helped you in thinking about how random selection does not always give you a representative sample. We'll learn more on how this applies to larger sample sizes. Take care.