Statistics 21 - Lecture 17

To a, to go over the extra credit problem before, before we start talking about the course material proper. So, I, I put up a webpage With the five calculations and I'll post it to our discussion board so that you can find it. But if you wanted to look for it from my homepage, it's Preprints, and then accrostingvito09.htm. So some of this is just to make this a self contained document on the web. This is how some academics amuse themselves, right? You take something silly like this and, turn it into a document, right? so we have the, the preamble, and there are actually. Now that, there've been, let me open this in another so this is from the Miami Herald and Steven Devlin, chair of Math Department at USF. Told the San Francisco Weekly that the chance that the numbers would come out, the letters would come out that way are one in ten million. So. Okay that's a number but behind that number are some assumptions. And we saw last time that you make different assumptions you get different numbers out the end. And in fact the range of numbers that you get out of the five approaches that we've been talking about are more than a 100 billion. . So it's an enormous range of numbers depending on the assumptions you're going to make you're willing to make. And none of them is really particularly compelling from my point of view. So we, we talked last time about the chance that you would get the letters in that order if basically what was going on was monkeys banging on typewriters and that's. Kind of a number that, that looks like he computed. 126 to the seventh power. Then, we talked about what would happened if you took into account the fact that these words are in English and that we're looking at the initial letters of words in English. So if you thought that words were drawn at random from the gutenberg corpus you would get an even smaller number. So one in 486 billion. rather then one and about eight billion. so that's a, that's a rather big, factor. Then we talked about the fact that suppose we let the governor kee p his words and keep his sentences, but simply shuffle the lines, the seven lines. Then that gives us a number that's only one in 2500. Then we get to what the extra credit assignment was. So the idea of the first part of the extra credit assignment was that you let the governor keep his words but not the order of the words. And it's as if you wrote each of the words in those seven lines on a piece of paper. Shuffle those 85 pieces of paper together and started drawing, drawing slips of paper and let that comprise the message. So, after the fourteen word I guess you put in a line break, after the twenty-ninth word, sorry except for the sixteenth word you put in a line break, after the twenty-ninth you put in another line break. Keep the line breaks who they are. But you draw the words out of a hat in a random order. if you're doing that, then in order to get the acrostic, what matters is the first letter on each line, and you need to know how many letters have that first word. There were eight, how many words have that first letter. There were eight words that started wit the letter c, three started with f, and one started with k, etcetera. so what, what do you need to do? Well you can imagine starting by picking the first letter, the first word of the first line. Then the first word of the second line, etcetera on down to the first word of the seventh line. And then let the other words kind of fall where they are. So how many ways are there to pick the first word of the first line so that it starts with an F? You have three choices, right? Then for each of those three choices, you have how many choices left for how to get a U as the first letter of the first word on the second line? Two, cuz there's two words that start with the letter U and, and for each of those you have. Seven choices, right or I'm sorry, I'm sorry, did I say, F was three. You have, you have, eight choices for which word starts the third line to give you the C etc. Having picked the first. Words of, of, okay, so that the only one that's kind of confusing is u because u, u occurs twice in the across state. So, you have two choices for which one comes at the beginning of the second line and then having done that, there's only one choice left for which one comes at the beginning of the seventh line. There's no more choices you can make. You're out of, you're out of u's. So, having specified what the first letters are of the first, of the first words of the seven lines, there's 78 words left that can be in any order in the remaining 78 places. And by assumption, when you're drawing the words out of a hat, there's, there's 85 factorial different sequences in which you could get those slips of paper. Now some people worried about the fact that some of these sequences aren't distinguishable from each other. That because there are some words that are repeated. If you took that into account on both ends of the problem, I would have given you credit. But nobody did. Some people worried about the fact that unnecessary was the word that occurred twice, that started with the letter U. But they didn't worry about the fact that then this needs to be adjusted as well. So nobody found themselves in quite that situation. And the last one, I was really pleased with the number of people who got the last approach right. so the idea here is, we keep the words in the order in which they actually occur in the message. But we throw in carriage returns. Returns at random in such a way as to end up with seven lines, none of which is blank. So each line has to have at least one word. so how many ways are there of inserting six line breaks in such a way that you end up with seven lines? Well you, you, you can't put a line break before the first word, because then you'll have a blank line. But, you could put a line break before the second word third, fourth on up to, before the 85th word. So there's 84 places you could put line breaks. From those 84 places you need to pick six in order to end up with seven lines after you do it and the assumption that I asked you to make was th at all ways of doing that equally likely so there are 84 choose six equally likely ways to break this thing into lines. Then its just a question of counting how many of those result in having the initial letters right and so there some people came really close but didn't count correctly. There's twelve ways in all and what that's coming from. So you're stuck with the f. you're actually stuck with the U, but now what do you have as choices for the C? It could be this, this, or this. So, there's three choices for the c, right? then there's only one choice for the k. then what about the y? There's one, two that work. Yes. and then for the O you have one, two choices for that. And there's only one for the, so there's three times two times two is twelve possible ways that you keep the words in the same order but get them starting with the same letter as the . So the probability of this would be twelve over, over 84 choose six. which is about one in 33 million. So I mean this is partly just amusement But it's also partly to get you to think about what happens when you see somebody quoting the odds of something in the newspaper. That we managed to get numbers that ranged from one in 2500, to one in 487 billion, right? By just making slightly different assumptions which assumption is right? Well, I don't know. It's all concocted, it's all made up. Nobody really thinks that anybody types a veto letter by choosing words at random from the Gutenberg Corpus. Right? I mean, that's not, sort of not a realistic way to, you know, to do this. So what I think these numbers speak to is that. It'd be kind of hard for that to happen accidentally. But exactly how to put a number on that is a very complicated you know, I mean it, it, it's very difficult to do it in a reasonable way and you can get pretty much any answer you want. Think you'd have to tell a pretty concocted story to get the number up to say, ten%.. I don't know what, you know, but to get the number up to one%, could probably do that pretty easily. Questions about this? S o if you go online to check your homework scores, you will have, and you submitted the extra credit, you will have a score for extra one, which is your number of points which range from zero to three. I don't know half a dozen or so people got three points on it. I think the smallest, non zero credit that I gave was half a point. Some people didn't get any, alas. Questions? Yeah. Is this the only ? I don't know. Alright. It stopped. Okay. We're now going to start thinking about what happens when we excuse me. We've been talking about probabilities of events. We've been talking about random variables. in particular random variables that come from drawing tickets out of a box at random. and looking at the numbers on the tickets that, that we get. And we want to start looking about long run regularities there are in order to be able to use these box models to draw inferences about the world. One of the most fundamental notions that supply to random variables is the expected value and this chapter is about that but in order to get there I'm going to talk about law of large numbers which is a key concept in probability. So let me write this with chalk because it's easier to talk it through. So let's suppose that we have independent trials that have the same probability of success. So we can imagine tossing a coin repeatedly, really vigorously in such a way that whether the coin land heads landed it's head each time is independent of the other times that we tossed the coin. Okay? So n is the number of times I toss the coin. And p if the coin is fair. We're sort of, by definition, what we mean is the chance that the coin lands head is 50 percent. Okay? Now, suppose we look at the difference between the. Number of heads, boy that's really illegible. , Okay, so this is the number of heads we get in this many independent tosses of the coin. Okay, we're, we're, so this is a random variable. The number of heads you get in a certain number of tosses is random. Okay? So this ratio is random. It depends on what actually hap pens. And suppose we look at the difference between that and 50%. which is, the probability that the coin lands heads in a single toss. We're, by definition we're talking about a fair coin. So, this is. Kind of, rarely going to be exactly equal to 50%. Yes? Just, every once in a while it might happen. But what tends to happen is that. Let's look at the difference between the empirical fraction of successors. The empirical fraction of heads and the probability of success, the probability of heads. And let's look at, so this thing could be big. It could be small. Right? Could so happen that, that the fraction of heads in seventeen tosses is 90 percent. And so, 90 percent minus 50 percent is a big number. It's 40 percent. Right? It could happen that this is really small. It could happen that the fraction of heads in twenty tosses is exactly 50%. This number is zero. Alright, so this difference is random. It depends on what this is, and this is random. Let's ask though, what's the chance that this difference is less than some positive number epsilon? So, we are going to look at the probability that. The difference between the fraction of successes and the theoretical probability of success is less than epsilon. And, the law of large numbers says that this probability goes to one. As the number of tosses. . Close to infinity. So the more trials you have, the more likely it is that . The, the fraction of successes you see is arbitrarily close to the probability of success. Okay, so this is true for any fixed positive number epsilon. So we have some epsilon greater than zero. So for any fixed epsilon greater than zero, the chance that the theoretical probability differs from the empirical fraction of successes by less than that threshold value epsilon gets bigger and bigger and bigger the more, the more trials you have. Okay, this is the law of large numbers. Right. Everybody remembers what a limit is? Yeah, okay. So, this is a funny notion of a limit because it's a limit of a probability. It's still possible that the fraction successes differs from the probability of success by more than epsilon, right? It could still happen that you toss a coin a million times, you get heads a million times. It's just very, very, very unlikely. Yeah. okay. So, the chance that these things differ by very much gets really, really small the more trials you have. The chance that the difference is really small gets big, the more trials you have. You have independent trials. Same probability P of success in each trial. Okay, so here it is. The chance that the difference between the fraction of successes in N trials differs from the probability of success by less than any positive tolerance E. That goes to 100% as the number of trials increases. . Okay. What does this mean about drawing tickets out of a box? . No . Okay so lets suppose I have five tickets in a box, and I draw a ticket from the box, one ticket from the box with replacement, over and over and over and over again. Okay. The law of large numbers says I'm going to get the ticket at. What's the probability that I get the ticket labelled a. From each draw it's one fifth, right? So, probability that I get the ticket , Alright, it's twenty percent in each draw. Okay, so the law of large numbers says that as I draw from the box more, and more, and more, and more times, the fraction of the time that I get the ticket labelled A, is increasingly likely to be close, arbitrarily close, to a fifth. I won't get it exactly a fifth of the time but the more times I draw, the greater the probability is that it's going to be arbitrarily close to a fifth. So, if I drew from this box a zillion times I would see the ticket labeled A about twenty percent of the time. I would see the ticket labeled B about twenty percent of the time. I would see the ticket labeled C about twenty percent of the time, etc.. Yes? Okay. With high probability, right. Now it's not might guaranteed to see it exactly twenty% of the time, but I am very likely to see it nearly twenty% of the time, okay. So, what would happen if I added up t he numbers on all the tickets that I saw in a very, very, very large number of draws from the box and took the average. If I, if I take all those results I draw from the box n times, where n is really, really big And I average T in those zillion draws from the box. Well, I would have some random list of labels. Like, I might get A, A, E, D, D, B, C, B, etcetera. . so I. There's going to be some string of labels that I get as I draw from this thing repeatedly. And they're going to be a random number of each label in a random order. So forth and so on. But if I go through and I have N of these in all. Then, of those n. With high probability the number of A's is going to be about N divided by five, right and the number of B's is going to be about N divided by five, and the number of C's is going to be about N divided by five, right, so the average of this whole list of, of repeated tickets is going to be about. . I'm going to see A about N over five times, B about N over five times, et cetera. If I average them, what am I going to get? I'm going to get something like A plus B plus C plus D plus E over five. Approximately. Each of them is going to occur about the same number of times in the end trials. Yes, I see them each about a fifth of the time. Okay? All right. This number. . Is the expected value of A draw from this box. The long, the frequency interpretation is. If I did the experiment over and over and over and over again. The probability that the average of the draws would differ from this number by more than a little bit gets really, really, really small.That is, the probability that the average of the draws is. Close to this number, it get's really, really big. Approaches 100%, as the number of draws gets big. Okay? Good time to ask questions if it's confusing you. All right, let me back up, because we have a little applet to show the law of large numbers here. I'll just shrink this down a little bit. Okay what is this plot. This is looking at the difference between, this is, this is simulating a toss of a coin 50% chance of heads and its looking at the difference between the observed fraction of heads and the probability of heads and in this case, its 800 trials. So what tends to happen as the number of trials gets bigger and bigger. Is the difference settles down to zero so initially for a small number of trials there's a pretty good chance that the theoritical chance of success and the emperical rate of success is going to differ by a lot but as the number of trials goes gets bigger and bigger and bigger the probability that they differ by a lot gets smaller and smaller. Let's do another umm Okay. So now, here's the story for a coin that has an 11.6% chance of heads and we're taking the difference between the observed number of, the, the observed percentage of heads and 11.6%, the theoretical probability of heads. And again, it wanders around a while but eventually this kind of settles down and gets very close to zero. Okay, so this is the difference between the percentage of heads and the probability of heads for this bias coin that has an 11.6% chance of heads. What happens if instead of looking at the difference between the percentage and 11.6? We look at the difference between the number of successes and 11.6% of the number of trials. Law of large numbers doesn't say that, that gets small. Alright. So I just toggle this so now, instead of plotting the difference between the percentage of successes in 11.6% of the number of trials, sorry, in 11.6%, I'm plotting the difference between the number of successes and 11.6% of the number of trials. This tends to actually grow, the more trials you have. So the difference between the number of successes and the probability times the number of trials tends to get big. The difference between the fraction of successes and the probability of success tends to get small. Okay. How can those things simultaneously be true? Okay here it is for an example for a fair coin. So, it starts off as the difference between the number of successes and 50% of the number of trials. An d it just wanders off. if you look at the difference between the percentage of successes and %50, it tends to get smaller. What's the relationship between the number of successes and the percentage of successes? Okay, so the, the number of successes. Tends to grow. But, the probability of success times the number of trials. Also tends to grow. So we're comparing two things that are growing. What, just mathematically, percentage of successes, right? So, that's the relationship between percentage of successes and number of successes, right? The, They're related through the number of trials. Okay, so, suppose if the number of succe-, it, it, it If the difference between, the number of successes, and, P times the number of trials, grows, at the rate of N. The number of trials or faster. Then, the percentage of successes wouldn't get closer to the percentage, to the probability of success. But, if the number of successes... The difference between the number of successes and P times the number of trials grows slower than the number of trials. When you divide by the number of trials, that difference is going to get closer and closer to zero. So, in fact what happens is that the difference between the number of successes and the number of trials times the probability of success tends to grow like the square root of the number of trials. Okay, grow slower than the number of trials. So when you divide it by the number of trials. It ends up getting smaller and smaller. So even though it grows, it grows slower than the number of trials. When you divide by the number of trials the result gets smaller and smaller. Does this make sense to people? Alright. If, if I have. I have something that looks like a constant times the square root of n. And I divide that by m. What happens is N gets big? This is constant over the square root event. This gets small as that gets big okay? So, even though this is growing it's not growing as quickly as that is. And as a result when you divide it by n it gets smaller and smaller. That, that's, whats going on here. The, the, difference between the number of successes and %50 times the number of trials tends to grow, it tends to grow at the rate of the square root of n. But when you divide it by n to look at the difference between the fraction of successes and the probability of success, that tends to shrink at the rate of the square root of n. The. Alright. okay so we were talking about, is this, any questions about this, moment? Okay. This applet we saw fairly recently when we were drawing from a box of tickets where the tickets were labelled zero and one. Here we have four numbers in the box zero, one, one and four. Alright, now each time I draw from the box, if I draw a sample of size one, I'm equally likely to get any of those four tickets, right? There's a 25% chance of getting anyone of the four. If I draw one sample, over and over and over and over gain. I'm going to get a ticket labeled zero about a quarter of the time. I'm going to get a ticket labeled one about a quarter of the time. I'm going to get a ticket labelled four about a quarter of the time. So if I just did this experiment. We're here. Okay. So that time, I got a ticket labeled one. That time, I got the ticket labeled four. That time, I got the ticket labeled four again. Another ticket labeled one. Another ticket labeled one. A ticket labeled zero. Etcetera, right? We could just do this. Now, we could speed it up to take, lets say, a thousand samples. To just see what happens if we do this many, many, many times. Okay. We've now taken 10,000 samples of size one from this box of numbers. And what we would expect to see is that, the area of this bin should be about 25%.. The area of this bin should be about 50%. The area of that bin should be about 25%. Alright? So we can actually check it from zero to. Okay. It turned out to be 25.4% in those 10,000 trials. Okay? If we did it more, it would be likely to be even closer to 25% if we keep going. Yes? Right, what's the average of the values that we saw in those 10,000 draws, the, that 's this line, meaner values. The average was 1.497. If we had seen the ticket labeled zero exactly a quarter of the time. A ticket labeled one exactly half the time, and a ticket labeled four exactly a quarter of the time, what would that average have been? So. 1.5, right? One and a half. So, we've got zero times a quarter plus one times a half plus four times a quarter. Okay. One and a half. the expected value of a draw from this box is one and a half. So if we imagine doing the experiment more, longer and longer and longer and longer, more and more and more draws of size one. Then we would have closer and closer and closer to a quarter of the numbers in that list would be zero. Closer and closer to a half of the numbers would be one. Closer and closer to a quarter will be four, with high probability. Right? Not guaranteed, but with increasingly high probability. And so, when we average the list, we'd have the average of a list that was a quarter zeroes, a half ones and a quarter fours. The average of that list would be 1.5, make sense? Okay. That's the expected value. Notice the expected value is one one-half, and yet you would be shocked to draw a ticket labelled one one-half from this box. Yes, there are no tickets labelled one one-half in the box. Expected value is a term of art. It is not what you expect to see. It is the long run average of what you get if you do things over and over again. It doesn't have to be a possible value to be the expected value. All good, okay, uh-huh. So. In general, uh-huh, how will we define uh-huh, the expected value of the discrete random variable. . if we know its probability distribution. And if you imagine what happens when you draw over and over and over again. The, the frequency with which I get a particular value is going to approach, in high probability, the probability of getting that value. So in this case this, This is supposed to be the probability distribution of the sum of two draws from that box of tickets that we were just looking at. So let's verify that this is right. So I'm going to draw twice from that box of tickets and calculate the sum of the two draws. Then I'm going to draw twice from that box of tickets and count the sum of the two draws, calculate the sum of the two draws. I'm going to do that over and over again so I have a new experiment. The experiment is not drawing a sample of size one, it's drawing a sample of size two. But I'm going to repeat that experiment many, many, many times and look at the values that I get. And I'm interested in what's the average outcome for that experiment, the long run average outcome. So, what's the probability that the sum of two draws is equal to zero? Well how does the sum of two draws end up being zero. I got to get a zero in the first draw and a zero in the second draw. The two draws are independent, I'm drawing with replacement. So the chance I get a zero on the first draw is a quarter, the chance I get a zero on the second draw given I got a zero on the first draw, is also a quarter, because they're independent. So the probability that I get a zero on both is a quarter times a quarter or 1/16th, alright? Okay, what's the chance that I get a one? As the sum of the draws. Let's change, the sum of the two draws is one. Well, I can either get a zero and a one, or a one and a zero. Alright, we have two possibilities. I, those are just joint, right? If I get a zero on the first draw, and a one on the two, I didn't get a one on the first draw, and a zero on the second. Did I say that right? Okay ,, , okay so, I can find the probability that the sum is one by finding the probabilities of those two possibilities and adding them together because they're dis-joined, and exhaustive. They're a partition of the event that I get a one as the sum of the two draws. So what's the chance I get a one and then a zero? Well, there's a 50% change I get a one, cuz there's two ones in the box. And then, given that I get a one, there's a 25% chance that I get a zero on the second draw. Cuz the draws are independent and there's 25% zeros in the box. So I get five. Times a quarter is an eighth, right? And then the other way around. What's the chance I first get a zero and then get a one? I've got a quarter times a half, which is another eighth. So I have an eighth plus an eights is a quarter is the prob, probability that the sum is equal to one. Okay? What are the ways of getting two? We've got a one and a one, a zero and a two, or a two and a zero, right? One and one is a half times a half is quarter. Zero and two is a quarter times a quarter is a sixteenth, two and zero is a quarter times a quarter is a sixteenth, to get two sixteenth which is an eighth plus a quarter. Is three eighths? Did I do that right? No. Did I leave something out? Oh, I'm sorry there is now two in the, okay. That, that would explain it, yeah, I was thinking there was a two in the box. Keep your eyes on the box. okay, yeah, so there's, all, to get two you have to get a one and a one. that's the only way you can do it. And the chance you get a one and a one is a half times a half as a quarter, alright? Okay. So, we can work this out, and find the whole probability distribution, for the sum of two draws from this box. These are the possible values, those are the only sums you can get, and these are probabilities of these values. So, if you repeated this experience of drawing twice from the box over and over again. Each time adding the results, we would get eight about a sixteenth of the time. We would get five about a quarter of the time so the expected value of this is going to be zero times a sixteenth, plus one times a quarter, plus two times a quarter, plus four times an eighth, five times a quarter, eight times a sixteenth. Right? Because we'll see each of these values about that often. Okay, so it turns out that the expected value is three, which is twice what the expected value was for one draw from the box, okay? That turns out to be true in great generality. The expected value of N draw. The, the sum of N draws from a box is equal to N times the expected value of one d raw from the box. The expected value of one draw from the box is the average of the numbers on the tickets where you take into account repeats. So, in this case, it's the average of zero, one, one, and four, right? That turned out to be one and a half. The average of 37 draws from this box. Sorry, the expected value of the sum of the 37 draws fro this box would be 37 times one and a half. Okay? So the experiment is, pull 37 tickets out of the box with replacement, add them. Do it again. Pull 37 tickets out of the box with replacement, add them, get, you get number each time. Imagine doing that over and over and over again an enormous number of times. Each, each experiment consists of pulling 37 tickets out of the box placement. What would the average of that list be? Well, with increasingly high probability, the average of that list would be increasingly close to 37 times one and a half. 37 times one and a half is the expected value. Okay. in general, the expected value of a discrete random variable is a weighted sum of the possible values the variable takes. So, we have a random variable X. It can take values X1, X2, X3, right? In this case we're talking about a single draw. The random variable was, what number do you see on the ticket. What are the possible values? The possible values were zero, one, one, and four. Right? These are the possible values of the random variale. What's the expected value? Well, it's a weight of some of the possible values, where the weights are the probability. If the random variable takes that value. Because the random variable is going to take this value about that often, and it's going to take this value about that often in repeated trials. . Okay I'm just going to state without proof the thing that I said a moment ago, which is if I draw N times, from a box of tickets, and I look at the sum of the, I draw N times at random from a box of tickets and add the value of the tickets that I see. The expected value of that is n times the average of the labels on the tickets in the box . If I draw once, the expected value is the average, If I draw twice, it's twice the average, etcetera, alright? It turns out that this is true even if I draw without replacement from the box. Course, if I draw without replacement from the box, I can't, can't pull more tickets out than are in the box in the first place. So little n needs to be less than the number of tickets in the box. But it, it, it works out that the expected value of the sum of N draws. For a box of numbered tickets is N times the average of the labels on the tickets, numbers on the tickets. Alright, let's look at some special cases. We were talking about 0-1 boxes for the last couple of lectures. What is the average of the numbers in a 0-1 box? So, if I have a box that has capital N tickets and all and a fraction P of them are labelled one and a fraction one minus P of number labeled zero. Then what's the average of the labels? Well I have N * P 1's + N * one - P * zero, right? That's the sum of all the labels and I have N tickets in all . Alright. Okay. The average of the labels, on all the tickets in the box is just P, okay? Now, remember that the binomial distribution is the distribution of the sum of little n draws with replacement from a box of tickets like this. Yes. the question is what's the difference between the average of labels and the expected value. So, the labels are just some particular list of numbers. Okay and we can, we can calculate the average of, of that list. It turns out that, that the, the, what the expected value, what's the expected value defined to be. It's defined to be the sum of. So we have a random variable now, which is what do you get when you pull a ticket at random from the box. A random variable is the number on that ticket. The expected value is, the sum of the possible numbers you can get on the ticket times the probabilities of the numbers that you get when you draw. So, expected value is a concept that applies to random variables, average is a number that applies to lists. They're analogous, the expect ed value of random variables is a weighted sum of its possible values. It turns out that numerically The expected value of the number on a draw from this box, a draw at random from this box, is equal to the average of the on the tickets in the box. So that's a result rather than a definition. Alright? The definition is it really defined to be the sum of the possible values each times the probability of those values for a random variable. That's the definition. And then it turns out that that's exactly equal to the average of the labels on the tickets in the box. If the experiment is, pull the ticket at random from the box. The expected value of the number that you get, when you draw a ticket at random from a box, is always equal to the average of the labels of the tickets in the box. . And indeed the expected value of the sum of, N, draws. The, the sum of labels on N draws from the box, is equal to N times the average of the label of the tickets in the box. And that's true, whether you're drawing with or without replacement. And again, it's a result. It's not a definition. Okay, so remember, binomial distribution is like the number of tickets labelled one we get if we draw with replacement from this box, little N times. And, we count how many tickets labeled one we got. And, I mentioned a couple of times that counting the number of tickets you get are labeled one, is the same as adding the labels on the tickets that you get. Because, when you add a string of 0's and 1's what do you get? How many 1's there were, like the 0's don't contribute anything, the 1's contribute one. It's just counting how many 1's there are. Okay so, if the probability distribution of. The number of tickets labelled one you get, in little N draws with replacement from this box is, that probability distribution is binomial with parameters little n and little p, all right? So, wel, let\s calculate, let's, we're going to look at the probability, we're going to look at the expected value of a binomial distribution in two different ways. One i s starting with a definition and the other is using the fact that we have a box model that leads to the binomial distribution. So we wanted to start with a definition that says this, the expected value, is defined to be the sum of the possible values times the probabilities of those values. Then, what piece of math do we need to write down to calculate the expected value of a binomial distribution. So, if we have a binomial distribution with parameters N and P. So we have a random variable that has a binomial distribution with parameters N and P. let me, let me back up a couple of steps. The expected value of a random variable just depends on its probability distribution, right? Because it's just a, a weighted value, a weighted sum of the possible values times the probabilities of those values. Okay? It doesn't depend on the outcome of some experiment. It just depends on the probability distribution. So there's no, we can talk about the expected value of a probability distribution or the expected value of a random variable. It, it makes sense because the expected value of a random variable just depends on its probability distribution. Okay, so let's suppose we have a random variable that has a binomial distribution with parameters ending in p, so that is it's like the sum of the labels on little end draws with replacement from that 0-1 box. Okay. So what's the expected value? It would be, the sum of possible values times the probabilities of those values. What are the possible values? when we draw N times and add the results on the tickets little n times, what are the possible values we can see from the sum? We might get a ticket labeled zero every time, right? Or we might get a ticket labeled one every time. The, the possible values range from zero, one, two, three, etcetera on up to n. Those are the possible values for the sum of, of the end draws. So let's say this is the sum from J = zero to n. J is now standing for the possible values, okay? Of J, that is the possible value. That's taken the role of x and something, rig ht? So, it might be zero. What's the probability that it's zero? Right? This is the binomial distribution. So, we have j, is the possible value. And then, this is the probability that we get J successes, that we get J tickets labelled one when we drop in the box. We have to get only the tickets labelled one. J times, each time, there's probability p that, that happens, they're independent. So we multiply, and we also have to get a ticket label zero and minus j times. And there are N choose J different ways we can do that in the N trials. Right? This, this, this should ring a bell by now. This, this piece. Okay? So we could do this, sum, and it would take us the rest of the lecture. Okay, or we can use the fact that the, a binomial is, you know what has a binomial distribution? Well, little n draws with replacement from that box of tickets. We take the sum of the draws, sum of little N draws, okay. And what do we know about the expected value of the sum of n draws with or without replacement from a box of numbered tickets? We know that it's little N times the average of the values on the tickets. Yes. And we know that the average on the value of the tickets in this case is P. So, this whole thing is just going to be equal to N times P, if we work it out. Yes? Right? Right. So that's, that. J is like taking this role. So, this is like zero times the probability that you get zero, one times the probability that you get one, two times the probability you get two, on up to n times the probability you get n tickets labelled one. What did, sorry. . Why what? How do you like? Why'd it disappear? Well, we're, we're summing over it. All, all of its possible values. This is a dummy variable. It's just inside the, the sum. We have, this sum has N1 + one terms. One of those terms, J is zero. We get zero times something, that's not going to contribute anything. Another term, this is one. So we get one times n choose one, which is n*p^1*(1-p)^(n-1). That's the next term. Then J is going to be two. We get two times n choose two, etc. We sum all that up. If we did that, the answer would be n*p. Okay. So using our notion that the expected value of the sum of n draws from a box of numbered tickets is just n times the average of the numbers on the box, on the tickets in the box, we can shortcut a very difficult calculation and just write down the answer. So far so good. The average of the numbers on the tickets in the box is little p. We're drawing little n times form the box. Because we're asking about a binomial distribution that has parameters, n and p. So it's like the sum of the tickets we get on little n draws from a box, where there's a fraction little p of tickets labelled one in the box, and the rest are labelled zero. Okay, so we've just seen that the expected value of a random variable that has a binomial distribution with parameters n and p is just nxp. Sometimes when we draw from the box, little at times, we're not going to get any tickets labeled one, sometimes we're going to get one ticket labeled one, sometimes we're going to get seventeen, sometimes we're going to get ten, alright? On average, how many tickets labeled one did we get? Let me draw a little n times on the box. On average we get N * eight. That might not be an integer, alright? It might not be a possible number of tickets. It's on average what would happen in the long run. We all, together? Okay. I asserted without proof that it doesn't make any difference whether you draw with replacement or without replacement. The expected value is still the number of draws times the average of the, of the numbers on the tickets in the box. Okay, we have another random variable that's like drawing without replacement from this box of tickets. What, what, what distribution is that? If I draw a little N times, without replacement from a box, from a zero one box. Hyper geometric okay. So, for hyper geometric, I would call this number G, right? G is the number of tickets in the box that are labeled one. There is N tickets in the box in all. Okay. What is the average of the labels on the ticket's in the box, Here? It's still P but I can write this in terms of the parameters of the hyper geometric. What's the fraction in, in terms of, in terms of G and N, what is P? Number of tickets labelled one, divided by the total number of tickets there are. Okay. So, lets suppose that I draw at random without replacement little end times from this box and add the numbers on the tickets that I see. That's like counting how many ones I get. How many tickets labeled one do I get in that sample without replacement? What's the expected value of that sum? Yeah, it would be little M times big G, over big M. Provided little n is less than big n. Can't pull more tickets out of the box than there are in the box. If you're sampling without replacement. If I wanted to do that, from the definition of the expected value, what would I have to calculate? Now suppose I have a hypergeometric distribution with parameters. N, G, and little n. Then, how would I calculate its expected value if I didn't know the shortcut. Yeah, I, I'd be doing some sum that involves taking. J x the probability that the random variable is equal to j. What's the probability that the random variable is equal to j? We have this sort of max and min thing that we have to worry about the possible values. But, but for a, for a feasible term, what does it look like? From the good tickets, I need to choose J. From the tickets labeled zero, I need to choose, N minus J. And, from all the tickets, I need to choose N. Alright, so I would be looking at the sum of J times this thing, this combinatorial ratio thing. And adding the terms for all the possible values of J. Possible values of J are going to depend on how many tickets I pull out of the box, etcetera. It's going to be integers but it might not start at zero depending on the sample size. Okay so this, this mess is going to end up reducing to little n times big g over big N which is the analog of little n times little p for the binomial. The expected value for the binomial is the same as the expec ted value for the hypergeometric. It is the number of draws times the fraction of ones there are in the box. So far so good.? Okay the expected value satisfies some algebraic identities that make it very, make it easier to calculate expected values in complicated situations by reducing at the simpler, simpler problems. So, first of all, the expected value of a constant is just that constant. If I have an experiment that no what happens returns the value one. Then, in the long run, the average value I get is going to be one, right? Expected value of a constant is just a constant. The expected value of a sum of random variables is always the sum of their expected values. This is, this is incredibly powerful, this, this little, this little of linear, property. And then the final thing that's, that's extremely useful is the expected value of a constant times a random variable is that constant times the expected value. So if I, if I pull people at random from the room, one person at random and calculate their height in meters and I ask what's the expected height in meters of someone drawn at random from the room? And I say, okay what's the expected value in inches instead of meters? It's just going to be the conversion between meters and inches times the expected value in meters. It's scales. Okay alright. So, we're going to use this last thing the, the expected value of a constant times a random variable is a constant times the expected value of the random variable to figure out the expected value of the sample mean from what we know about the expected value of the sample sum, okay? So, we've been talking about pulling tickets at random. Why didn't that. What? Okay. We've been talking about pulling tickets at random and adding the values on the tickets. That's the sample sum of N draws from the box. Suppose we, we take, we pull little n tickets at random, and then, instead of adding them, we average them. That's the sample mean, okay? What's the expected value of the sample mean? It's the sample mean of n tickets drawn at random from a box. Is equal to, sample sum. Of the N tickets divided by N right? You add'em together, and you divide by how many you have? So, that is you can write it as one over N times the sample sum. Yes? That's a constant, times the sample sum. So the expected value of the sample mean is that constant times the expected value of the sample sum, right? That's this rule. Expected value of a constant times a random variable is a constant times the expected value, so the expected value of one over N times the sample sum is one over N times the expected value of the sample sum. Yes. Can I reexplain it in a different way? what I mean by sample mean? Okay so we've been talking about. Say, a 0-1 one box. Okay, and suppose I pull five tickets at random out of the box, I might get the tickets zero, one, one, zero, one, okay? The sample sum is zero + one + one + zero +one is three. The sample mean zero + one + one + zero + one / five. Okay? okay, so suppose that I pull five tickets at random out of this box, with replacement. Then the number of tickets labeled one I get is going to be random. On average the fraction of tickets labelled one in the sample is going to be little P. The, the sum of the label of tickets that I get is going to be little n times little p on average. That's the expected value. So the expected value of the sample sum. God, my handwriting is just so terrible, I'm so sorry. So, the expected value of the sample sum is going to be, n * p. The expected value of the sample mean. is equal to one over n times the expected value of the sample sum. . Which is one over N times NP. , which is p. So if the expected value of the sample sum of N tickets drawn at random from a box is N times the average of the number of tickets, N times the average of the numbers on the labels, the tickets in the box. Then once I take n times that average, and divide it by n, I get back the average. The expected value of the sample mean, is the mean of the numbers on the tickets in the box. So the expected value of the sample mean is the population mean. Population of tickets in the box. This make sense? Okay. And that's true, whether I'm drawing with replacement or without replacement from the box. The expected value of the sample means is still the population means. So, the mean of all the numbers on the tickets in the box. That's going to turn out to be very important. There, the sample mean in general is, you add the labels on the things that you get in your sample. You divide by how many there were. In the special case that you're drawing from a zero, one box, the sample mean is the percentage of 1s in the sample. If the sample percentage is a special case from the sample mean when you're drawing from a zero one box. All you can get are 0's and 1's, you know, if the things are red or green, and you're interested in the fraction of red ones, then the number of red ones divided by the sample size is the percentage of red ones. So it's, it's, it's the sample percentage is a special case of the sample mean. . Alright, so, the expected value of the sample mean of N random draws with or without replacement for box of numbered tickets, is the average of the numbers on the tickets in the box. Special case of that is, if the tickets are all labelled just zero one, then, the expected value is the percentage of the tickets in the box labelled one. So far, so good? Okay. Alright. What motivated so much of probability in the first place is gambling, right? How do you figure out, what's a fair bet, what's not a fair bet, how should you place your money. So, expected value. is intimently tied to whether a bet is fair or not. The difference between your the, the so the expected amount by which you, the expected amount of money from each dollar that you lose in repeated play is called the house edge. That is fair if the house edge is zero. That is, if on average, you would expect to break even in the long run. If on average, you expect to lose money in the long run, the house has an edge. And in all casino games, there is a house edge. If you play long enough, you will los e all your money. I mean that's just how it's built. I mean unless you're cheating, right that's. , Okay alright so let's look at what the house edges for a bet on red in roulette, zero. The way a roulette wheel works is there are, there's a wheel, and it has 38 positions. there are Um., there, there are 36 so that's, eighteen, eighteen of them are red, eighteen of them are black, two of them are green, if I'm remembering correctly, okay? Can we go, go to Vegas or, alright. Okay. So the, the payoff for winning a bet on red is, is, is sort of break, break even. That you sort of, you, you, you get your, you, you, you double your money. You get your money back and you get your dollar back and another dollar if it, if it lands in red, okay? Otherwise, you lose your money. Yes. So, if the chance that the ball landed on red was 50% then there would be no house edge. But the chance that the ball lands on red is less than 50%. Okay, the chance the ball lands on red is eighteen out 38, rather than nineteen out of 38. We're braking it were nineteen but only eighteen of 38 positions are red. So out of each dollar you bet you expect to lose little more than a nickel. Kay and in the long run if you keep playing over and over and over and over again with very, very high probability your losses are going to increase until you have no more money. . what's a fair bet? A fair bet is one where the expected value, payoff the, the expected amount of money you get back is zero after you count for how much money you have to ante up to play, okay? If your expected payoff is positive, the bet favors you. If the expected payoff is negative, the bet favors somebody, whoever you're playing against, the house. Okay, let/s Go over some of this stuff and hint at how you get some of these results. So we, we already figured out that the expected value of a binomial random variable with parameters N and P is N times P. It's like, it is the sample sum of the numbers on the tickets that get if you draw independently with replacement from a zero, one box that has a fraction P of tickets labeled one and you draw little N times. Okay? expected value of a geometric random variable. What's a geometric? Okay. So you're drawing from 0,1 box with replacement independently until the first time you see a ticket labelled one. Alright. Why does it make sense for the expected value to be one over p? I'll give you a heuristic and then I'll show you the proof briefly, because you really don't want to see it. heuristic is, you get, I-, if there's p tickets labeled one, then every time you draw, you get sort of a, there's, there's p success per draw. Does that make sense? You get sort of a fraction, P tickets labelled one per draw on average. An so if you get. P, P Successes per draw. Then how many draws per success are there? One over P, loosely speaking. So on average, how many times would you have to draw to get one success? On average one over P. That's a loose heuristic. It's not completely accurate. But the proof that that's the expected value is a little longer. So you don't, you don't want to know that. You don't have to know that. okay, but let's assume that we take this now as given that the expected value of a geometric random variable with parameter P is one over P. From that we can actually derive the fact that the expected value of a negative random variable, negative minomial random variable, parameters, P and R, is R over P. So, let's, let's reason that out because this reasoning is actually helpful. Right, remind me what's, what kind of thing has a negative binomial distribution. Okay, so we've got a zero one box, right? Fraction P tickets labeled one in the box. We draw with replacement until the R time we get a ticket labeled one, okay? Let's think about it as occurring in stages. We draw until the first time we get a ticket labeled one. Then we draw until the next time we get a ticket labelled one. Then we draw until the next time we get a ticket labelled one. Until we've done that r times. Okay? What's the distribution of the number of times I have to draw until I get the first ticket labelled one? It's geometric, right? Now, starting there, what's the distribution, the number of times I have to draw until I get another ticket labeled one? It's again geometric, right? I'm waiting from then until the first time I get a ticket labeled one. And then once that's happened, I'm waiting from there until the first time I get a ticket labeled one. And I have to do that a total of R times. So I could actually write a negative binomial random variable as the sum of R geometric random variables. So if we have X1, X2, so forth on up to XR, if these are, if these all. Independent, random variables with the geometric distribution with parameter P. . Then, if we define X to be X1 plus, plus XR, the distribution. of X, is negative binomial with parameters R and P, okay? I wait until the first time, and then I wait until the second time. And, then I wait until the third time. Each time, I am drawing with replacement from the same box of tickets, same fraction P of tickets labeled one. Everything is independent, right? So, I can think of waiting for the seventh ticket labeled one as, waiting for the first ticket labeled one and then starting over, and waiting for the first ticket labeled one and then starting over, and doing that seven times. Okay, so the distribution of the sum of R independent geometrics, is a negative binomial with parameters r and p, okay? Now, we have that little rule that says, the expected value of a sum is the sum of the expected values. What's the expected value of this Geometric random variable? Alright, we've, we've sort of said it's one over P without proving it, yeah? The expected value of this one is one over P, et cetera, and the expected value of this one is one over P. So what's the expected value of their sum? R over P. The expected power of their sum is the sum of their expected values. one over P, one over P, one over P, one over P. That adds up to R over P. So, knowing this, it's easy to prove this. Okay. Hypergeometric, we already did. Again, it draws in this case, without replacement, from a box with a number of tickets, the average of the tickets in the box, the labels on the tickets in the box, is geogram. That's the fraction of tickets labelled one in the box. In general, if we draw little n ***times with, or without replacement from the box of number tickets and add the results that we see, that sample sum. To be expected by the sample sum is N times the average of labels on the tickets in the box. Expected by the sample mean is this divided by N which is just the average of labels on the tickets in the box. Of course if we're drawing without replacement, we can't draw more times than there are tickets in the box. Alright, let's do an example. Okay, so we've got a loaded die We're going to roll it six times and we want to know what's the expected number of spots that show in six rolls of this dye. So we can think of each roll as giving us a random number of spots, right? And the total number of spots we see in six rolls is the sum of six random variables that all have the same distribution, right? Each time you roll it you get a new number. So each draw, each, each experiment, each time we roll the die, it's like drawing from a box of numbered tickets, where, there's point 028 times the number of tickets in the box are labelled one. 035,. sorry 305. times N of the tickets in the box that are labelled, six And then two, three, four, five. There's each, each of these has a sixth times N. My handwriting. I'm so sorry do this a little better here. Give myself more space to start with. Four, a six times n and five as six times n. Okay, everyone has it on a die, the sum of the spots on the top and bottom add up to seven? The opposite faces they add to seven? Which is why I've tweaked one and six here because they're on, they're on opposite sides. Okay. All right, so, if I draw from this box once. What's the expected value of one draw from this box? It's going to be the average of the numbers on the tickets in the box, so.028 N of them are labeled one. .305 N of them are labeled six, etc. If I wrote them all out. I need to sort of put it over a common denominator, but if I wrote them all out and then added them up and divided by N, what would I get? one * 028. + six * 305. + two * one-third + three * one-sixth etecetera, right? That would be the average of the labels on the capital N tickets there are in this box, right? There aren't the same number of tickets of every, with every number. These have the same number. These differ from each other, and from those. Right. Okay, so if I, if I do this sum. That's the expected value of one draw from the box. What's the expected value of six draws from the box? Six x whatever that gives us. Okay? The sum of the numbers we get in six draws from the box. And that's true whether we do it with or without replacement. we've got like two minutes. Let's just hint at the beginning of exercise 18-7, I'm going to roll a fair die. If the die shows three spots. I'm then going to roll three more dice. If the die shows one spot I'm going to roll, one more die. Okay. I'm going to roll as many dice as, show the first time that I roll. Just to make sense, I'm rolling a random number of dice, right? What's the expected value of the total number of spots that are rolled in the second part of the experiment? Not including the one that figured out how many dice should we're going to roll in the second part of the experiment. How do, how do I do this? well, the first roll might give me one spot. Okay, what's the chance it gives me one spot? one-sixth, okay. If it does give me one spot. Then I am going to roll one dye alright. If I roll one dye. I could get any number of spots between one and six. There's probability a sixth that I get one spot, probability a sixth that I get two spots etcetera. It's like one draw from a box. A number of tickets numbered one through six. Okay? So if I knew that I was only rolling one die, then the expected value of what, what I would get would be one + two + three + four + five + six / six which is three one-half okay? Remember opposite bases sum to seven. So you, okay. Suppose that I get two spots. Probability that is a sixth. If indeed I got two spots, I'd be rolling two dice. What's the expected number of spots that I would see if I did that? Okay. So it would be two times three and a half is seven. Right? Etcetera. Okay, so what's the expected number of spots that show on the die or dice that are rolled the second time? There's a one-sixth chance of getting this and a one-sixth chance of getting this and a one sixth chance of getting ten and one-half and a one-sixth chance of getting blah, blah, blah so we add these numbers, and divide by six. I know there is a little subtlety going here because we're sort of, we're, we're conditioning on how many spots show up the first time we roll the dye. If you look at the detailed solution to that, it'll, it'll, it'll make it a little more rigorous. Alright, I should let you guys go. .