Tip:
Highlight text to annotate it
X
To a, to go over the extra credit problem before, before we start talking about the
course material proper. So, I, I put up a webpage With the five calculations and
I'll post it to our discussion board so that you can find it. But if you wanted to
look for it from my homepage, it's Preprints, and then accrostingvito09.htm.
So some of this is just to make this a self contained document on the web. This
is how some academics amuse themselves, right? You take something silly like this
and, turn it into a document, right? so we have the, the preamble, and there are
actually. Now that, there've been, let me open this in another so this is from the
Miami Herald and Steven Devlin, chair of Math Department at USF. Told the San
Francisco Weekly that the chance that the numbers would come out, the letters would
come out that way are one in ten million. So. Okay that's a number but behind that
number are some assumptions. And we saw last time that you make different
assumptions you get different numbers out the end. And in fact the range of numbers
that you get out of the five approaches that we've been talking about are more
than a 100 billion. . So it's an enormous range of numbers depending on the
assumptions you're going to make you're willing to make. And none of them is
really particularly compelling from my point of view. So we, we talked last time
about the chance that you would get the letters in that order if basically what
was going on was monkeys banging on typewriters and that's. Kind of a number
that, that looks like he computed. 126 to the seventh power. Then, we talked about
what would happened if you took into account the fact that these words are in
English and that we're looking at the initial letters of words in English. So if
you thought that words were drawn at random from the gutenberg corpus you would
get an even smaller number. So one in 486 billion. rather then one and about eight
billion. so that's a, that's a rather big, factor. Then we talked about the fact that
suppose we let the governor kee p his words and keep his sentences, but simply
shuffle the lines, the seven lines. Then that gives us a number that's only one in
2500. Then we get to what the extra credit assignment was. So the idea of the first
part of the extra credit assignment was that you let the governor keep his words
but not the order of the words. And it's as if you wrote each of the words in those
seven lines on a piece of paper. Shuffle those 85 pieces of paper together and
started drawing, drawing slips of paper and let that comprise the message. So,
after the fourteen word I guess you put in a line break, after the twenty-ninth word,
sorry except for the sixteenth word you put in a line break, after the
twenty-ninth you put in another line break. Keep the line breaks who they are.
But you draw the words out of a hat in a random order. if you're doing that, then
in order to get the acrostic, what matters is the first letter on each line, and you
need to know how many letters have that first word. There were eight, how many
words have that first letter. There were eight words that started wit the letter c,
three started with f, and one started with k, etcetera. so what, what do you need to
do? Well you can imagine starting by picking the first letter, the first word
of the first line. Then the first word of the second line, etcetera on down to the
first word of the seventh line. And then let the other words kind of fall where
they are. So how many ways are there to pick the first word of the first line so
that it starts with an F? You have three choices, right? Then for each of those
three choices, you have how many choices left for how to get a U as the first
letter of the first word on the second line? Two, cuz there's two words that
start with the letter U and, and for each of those you have. Seven choices, right or
I'm sorry, I'm sorry, did I say, F was three. You have, you have, eight choices
for which word starts the third line to give you the C etc. Having picked the
first. Words of, of, okay, so that the only one that's kind of confusing is u
because u, u occurs twice in the across state. So, you have two choices for which
one comes at the beginning of the second line and then having done that, there's
only one choice left for which one comes at the beginning of the seventh line.
There's no more choices you can make. You're out of, you're out of u's. So,
having specified what the first letters are of the first, of the first words of
the seven lines, there's 78 words left that can be in any order in the remaining
78 places. And by assumption, when you're drawing the words out of a hat, there's,
there's 85 factorial different sequences in which you could get those slips of
paper. Now some people worried about the fact that some of these sequences aren't
distinguishable from each other. That because there are some words that are
repeated. If you took that into account on both ends of the problem, I would have
given you credit. But nobody did. Some people worried about the fact that
unnecessary was the word that occurred twice, that started with the letter U. But
they didn't worry about the fact that then this needs to be adjusted as well. So
nobody found themselves in quite that situation. And the last one, I was really
pleased with the number of people who got the last approach right. so the idea here
is, we keep the words in the order in which they actually occur in the message.
But we throw in carriage returns. Returns at random in such a way as to end up with
seven lines, none of which is blank. So each line has to have at least one word.
so how many ways are there of inserting six line breaks in such a way that you end
up with seven lines? Well you, you, you can't put a line break before the first
word, because then you'll have a blank line. But, you could put a line break
before the second word third, fourth on up to, before the 85th word. So there's 84
places you could put line breaks. From those 84 places you need to pick six in
order to end up with seven lines after you do it and the assumption that I asked you
to make was th at all ways of doing that equally likely so there are 84 choose six
equally likely ways to break this thing into lines. Then its just a question of
counting how many of those result in having the initial letters right and so
there some people came really close but didn't count correctly. There's twelve
ways in all and what that's coming from. So you're stuck with the f. you're
actually stuck with the U, but now what do you have as choices for the C? It could be
this, this, or this. So, there's three choices for the c, right? then there's
only one choice for the k. then what about the y? There's one, two that work. Yes.
and then for the O you have one, two choices for that. And there's only one for
the, so there's three times two times two is twelve possible ways that you keep the
words in the same order but get them starting with the same letter as the . So
the probability of this would be twelve over, over 84 choose six. which is about
one in 33 million. So I mean this is partly just amusement But it's also partly
to get you to think about what happens when you see somebody quoting the odds of
something in the newspaper. That we managed to get numbers that ranged from
one in 2500, to one in 487 billion, right? By just making slightly different
assumptions which assumption is right? Well, I don't know. It's all concocted,
it's all made up. Nobody really thinks that anybody types a veto letter by
choosing words at random from the Gutenberg Corpus. Right? I mean, that's
not, sort of not a realistic way to, you know, to do this. So what I think these
numbers speak to is that. It'd be kind of hard for that to happen accidentally. But
exactly how to put a number on that is a very complicated you know, I mean it, it,
it's very difficult to do it in a reasonable way and you can get pretty much
any answer you want. Think you'd have to tell a pretty concocted story to get the
number up to say, ten%.. I don't know what, you know, but to get the number up
to one%, could probably do that pretty easily. Questions about this? S o if you
go online to check your homework scores, you will have, and you submitted the extra
credit, you will have a score for extra one, which is your number of points which
range from zero to three. I don't know half a dozen or so people got three points
on it. I think the smallest, non zero credit that I gave was half a point. Some
people didn't get any, alas. Questions? Yeah. Is this the only ? I don't know.
Alright. It stopped. Okay. We're now going to start thinking about what happens when
we excuse me. We've been talking about probabilities of events. We've been
talking about random variables. in particular random variables that come from
drawing tickets out of a box at random. and looking at the numbers on the tickets
that, that we get. And we want to start looking about long run regularities there
are in order to be able to use these box models to draw inferences about the world.
One of the most fundamental notions that supply to random variables is the expected
value and this chapter is about that but in order to get there I'm going to talk
about law of large numbers which is a key concept in probability. So let me write
this with chalk because it's easier to talk it through. So let's suppose that we
have independent trials that have the same probability of success. So we can imagine
tossing a coin repeatedly, really vigorously in such a way that whether the
coin land heads landed it's head each time is independent of the other times that we
tossed the coin. Okay? So n is the number of times I toss the coin. And p if the
coin is fair. We're sort of, by definition, what we mean is the chance
that the coin lands head is 50 percent. Okay? Now, suppose we look at the
difference between the. Number of heads, boy that's really illegible. , Okay, so
this is the number of heads we get in this many independent tosses of the coin. Okay,
we're, we're, so this is a random variable. The number of heads you get in a
certain number of tosses is random. Okay? So this ratio is random. It depends on
what actually hap pens. And suppose we look at the difference between that and
50%. which is, the probability that the coin lands heads in a single toss. We're,
by definition we're talking about a fair coin. So, this is. Kind of, rarely going
to be exactly equal to 50%. Yes? Just, every once in a while it might happen. But
what tends to happen is that. Let's look at the difference between the empirical
fraction of successors. The empirical fraction of heads and the probability of
success, the probability of heads. And let's look at, so this thing could be big.
It could be small. Right? Could so happen that, that the fraction of heads in
seventeen tosses is 90 percent. And so, 90 percent minus 50 percent is a big number.
It's 40 percent. Right? It could happen that this is really small. It could happen
that the fraction of heads in twenty tosses is exactly 50%. This number is
zero. Alright, so this difference is random. It depends on what this is, and
this is random. Let's ask though, what's the chance that this difference is less
than some positive number epsilon? So, we are going to look at the probability that.
The difference between the fraction of successes and the theoretical probability
of success is less than epsilon. And, the law of large numbers says that this
probability goes to one. As the number of tosses. . Close to infinity. So the more
trials you have, the more likely it is that . The, the fraction of successes you
see is arbitrarily close to the probability of success. Okay, so this is
true for any fixed positive number epsilon. So we have some epsilon greater
than zero. So for any fixed epsilon greater than zero, the chance that the
theoretical probability differs from the empirical fraction of successes by less
than that threshold value epsilon gets bigger and bigger and bigger the more, the
more trials you have. Okay, this is the law of large numbers. Right. Everybody
remembers what a limit is? Yeah, okay. So, this is a funny notion of a limit because
it's a limit of a probability. It's still possible that the fraction successes
differs from the probability of success by more than epsilon, right? It could still
happen that you toss a coin a million times, you get heads a million times. It's
just very, very, very unlikely. Yeah. okay. So, the chance that these things
differ by very much gets really, really small the more trials you have. The chance
that the difference is really small gets big, the more trials you have. You have
independent trials. Same probability P of success in each trial. Okay, so here it
is. The chance that the difference between the fraction of successes in N trials
differs from the probability of success by less than any positive tolerance E. That
goes to 100% as the number of trials increases. . Okay. What does this mean
about drawing tickets out of a box? . No . Okay so lets suppose I have five tickets
in a box, and I draw a ticket from the box, one ticket from the box with
replacement, over and over and over and over again. Okay. The law of large numbers
says I'm going to get the ticket at. What's the probability that I get the
ticket labelled a. From each draw it's one fifth, right? So, probability that I get
the ticket , Alright, it's twenty percent in each draw. Okay, so the law of large
numbers says that as I draw from the box more, and more, and more, and more times,
the fraction of the time that I get the ticket labelled A, is increasingly likely
to be close, arbitrarily close, to a fifth. I won't get it exactly a fifth of
the time but the more times I draw, the greater the probability is that it's going
to be arbitrarily close to a fifth. So, if I drew from this box a zillion times I
would see the ticket labeled A about twenty percent of the time. I would see
the ticket labeled B about twenty percent of the time. I would see the ticket
labeled C about twenty percent of the time, etc.. Yes? Okay. With high
probability, right. Now it's not might guaranteed to see it exactly twenty% of
the time, but I am very likely to see it nearly twenty% of the time, okay. So, what
would happen if I added up t he numbers on all the tickets that I saw in a very,
very, very large number of draws from the box and took the average. If I, if I take
all those results I draw from the box n times, where n is really, really big And I
average T in those zillion draws from the box. Well, I would have some random list
of labels. Like, I might get A, A, E, D, D, B, C, B, etcetera. . so I. There's
going to be some string of labels that I get as I draw from this thing repeatedly.
And they're going to be a random number of each label in a random order. So forth and
so on. But if I go through and I have N of these in all. Then, of those n. With high
probability the number of A's is going to be about N divided by five, right and the
number of B's is going to be about N divided by five, and the number of C's is
going to be about N divided by five, right, so the average of this whole list
of, of repeated tickets is going to be about. . I'm going to see A about N over
five times, B about N over five times, et cetera. If I average them, what am I going
to get? I'm going to get something like A plus B plus C plus D plus E over five.
Approximately. Each of them is going to occur about the same number of times in
the end trials. Yes, I see them each about a fifth of the time. Okay? All right. This
number. . Is the expected value of A draw from this box. The long, the frequency
interpretation is. If I did the experiment over and over and over and over again. The
probability that the average of the draws would differ from this number by more than
a little bit gets really, really, really small.That is, the probability that the
average of the draws is. Close to this number, it get's really, really big.
Approaches 100%, as the number of draws gets big. Okay? Good time to ask questions
if it's confusing you. All right, let me back up, because we have a little applet
to show the law of large numbers here. I'll just shrink this down a little bit.
Okay what is this plot. This is looking at the difference between, this is, this is
simulating a toss of a coin 50% chance of heads and its looking at the difference
between the observed fraction of heads and the probability of heads and in this case,
its 800 trials. So what tends to happen as the number of trials gets bigger and
bigger. Is the difference settles down to zero so initially for a small number of
trials there's a pretty good chance that the theoritical chance of success and the
emperical rate of success is going to differ by a lot but as the number of
trials goes gets bigger and bigger and bigger the probability that they differ by
a lot gets smaller and smaller. Let's do another umm Okay. So now, here's the story
for a coin that has an 11.6% chance of heads and we're taking the difference
between the observed number of, the, the observed percentage of heads and 11.6%,
the theoretical probability of heads. And again, it wanders around a while but
eventually this kind of settles down and gets very close to zero. Okay, so this is
the difference between the percentage of heads and the probability of heads for
this bias coin that has an 11.6% chance of heads. What happens if instead of looking
at the difference between the percentage and 11.6? We look at the difference
between the number of successes and 11.6% of the number of trials. Law of large
numbers doesn't say that, that gets small. Alright. So I just toggle this so now,
instead of plotting the difference between the percentage of successes in 11.6% of
the number of trials, sorry, in 11.6%, I'm plotting the difference between the number
of successes and 11.6% of the number of trials. This tends to actually grow, the
more trials you have. So the difference between the number of successes and the
probability times the number of trials tends to get big. The difference between
the fraction of successes and the probability of success tends to get small.
Okay. How can those things simultaneously be true? Okay here it is for an example
for a fair coin. So, it starts off as the difference between the number of successes
and 50% of the number of trials. An d it just wanders off. if you look at the
difference between the percentage of successes and %50, it tends to get
smaller. What's the relationship between the number of successes and the percentage
of successes? Okay, so the, the number of successes. Tends to grow. But, the
probability of success times the number of trials. Also tends to grow. So we're
comparing two things that are growing. What, just mathematically, percentage of
successes, right? So, that's the relationship between percentage of
successes and number of successes, right? The, They're related through the number of
trials. Okay, so, suppose if the number of succe-, it, it, it If the difference
between, the number of successes, and, P times the number of trials, grows, at the
rate of N. The number of trials or faster. Then, the percentage of successes wouldn't
get closer to the percentage, to the probability of success. But, if the number
of successes... The difference between the number of successes and P times the number
of trials grows slower than the number of trials. When you divide by the number of
trials, that difference is going to get closer and closer to zero. So, in fact
what happens is that the difference between the number of successes and the
number of trials times the probability of success tends to grow like the square root
of the number of trials. Okay, grow slower than the number of trials. So when you
divide it by the number of trials. It ends up getting smaller and smaller. So even
though it grows, it grows slower than the number of trials. When you divide by the
number of trials the result gets smaller and smaller. Does this make sense to
people? Alright. If, if I have. I have something that looks like a constant times
the square root of n. And I divide that by m. What happens is N gets big? This is
constant over the square root event. This gets small as that gets big okay? So, even
though this is growing it's not growing as quickly as that is. And as a result when
you divide it by n it gets smaller and smaller. That, that's, whats going on
here. The, the, difference between the number of successes and %50 times the
number of trials tends to grow, it tends to grow at the rate of the square root of
n. But when you divide it by n to look at the difference between the fraction of
successes and the probability of success, that tends to shrink at the rate of the
square root of n. The. Alright. okay so we were talking about, is this, any questions
about this, moment? Okay. This applet we saw fairly recently when we were drawing
from a box of tickets where the tickets were labelled zero and one. Here we have
four numbers in the box zero, one, one and four. Alright, now each time I draw from
the box, if I draw a sample of size one, I'm equally likely to get any of those
four tickets, right? There's a 25% chance of getting anyone of the four. If I draw
one sample, over and over and over and over gain. I'm going to get a ticket
labeled zero about a quarter of the time. I'm going to get a ticket labeled one
about a quarter of the time. I'm going to get a ticket labelled four about a quarter
of the time. So if I just did this experiment. We're here. Okay. So that
time, I got a ticket labeled one. That time, I got the ticket labeled four. That
time, I got the ticket labeled four again. Another ticket labeled one. Another ticket
labeled one. A ticket labeled zero. Etcetera, right? We could just do this.
Now, we could speed it up to take, lets say, a thousand samples. To just see what
happens if we do this many, many, many times. Okay. We've now taken 10,000
samples of size one from this box of numbers. And what we would expect to see
is that, the area of this bin should be about 25%.. The area of this bin should be
about 50%. The area of that bin should be about 25%. Alright? So we can actually
check it from zero to. Okay. It turned out to be 25.4% in those 10,000 trials. Okay?
If we did it more, it would be likely to be even closer to 25% if we keep going.
Yes? Right, what's the average of the values that we saw in those 10,000 draws,
the, that 's this line, meaner values. The average was 1.497. If we had seen the
ticket labeled zero exactly a quarter of the time. A ticket labeled one exactly
half the time, and a ticket labeled four exactly a quarter of the time, what would
that average have been? So. 1.5, right? One and a half. So, we've got zero times a
quarter plus one times a half plus four times a quarter. Okay. One and a half. the
expected value of a draw from this box is one and a half. So if we imagine doing the
experiment more, longer and longer and longer and longer, more and more and more
draws of size one. Then we would have closer and closer and closer to a quarter
of the numbers in that list would be zero. Closer and closer to a half of the numbers
would be one. Closer and closer to a quarter will be four, with high
probability. Right? Not guaranteed, but with increasingly high probability. And
so, when we average the list, we'd have the average of a list that was a quarter
zeroes, a half ones and a quarter fours. The average of that list would be 1.5,
make sense? Okay. That's the expected value. Notice the expected value is one
one-half, and yet you would be shocked to draw a ticket labelled one one-half from
this box. Yes, there are no tickets labelled one one-half in the box. Expected
value is a term of art. It is not what you expect to see. It is the long run average
of what you get if you do things over and over again. It doesn't have to be a
possible value to be the expected value. All good, okay, uh-huh. So. In general,
uh-huh, how will we define uh-huh, the expected value of the discrete random
variable. . if we know its probability distribution. And if you imagine what
happens when you draw over and over and over again. The, the frequency with which
I get a particular value is going to approach, in high probability, the
probability of getting that value. So in this case this, This is supposed to be the
probability distribution of the sum of two draws from that box of tickets that we
were just looking at. So let's verify that this is right. So I'm going to draw twice
from that box of tickets and calculate the sum of the two draws. Then I'm going to
draw twice from that box of tickets and count the sum of the two draws, calculate
the sum of the two draws. I'm going to do that over and over again so I have a new
experiment. The experiment is not drawing a sample of size one, it's drawing a
sample of size two. But I'm going to repeat that experiment many, many, many
times and look at the values that I get. And I'm interested in what's the average
outcome for that experiment, the long run average outcome. So, what's the
probability that the sum of two draws is equal to zero? Well how does the sum of
two draws end up being zero. I got to get a zero in the first draw and a zero in the
second draw. The two draws are independent, I'm drawing with replacement.
So the chance I get a zero on the first draw is a quarter, the chance I get a zero
on the second draw given I got a zero on the first draw, is also a quarter, because
they're independent. So the probability that I get a zero on both is a quarter
times a quarter or 1/16th, alright? Okay, what's the chance that I get a one? As the
sum of the draws. Let's change, the sum of the two draws is one. Well, I can either
get a zero and a one, or a one and a zero. Alright, we have two possibilities. I,
those are just joint, right? If I get a zero on the first draw, and a one on the
two, I didn't get a one on the first draw, and a zero on the second. Did I say that
right? Okay ,, , okay so, I can find the probability that the sum is one by finding
the probabilities of those two possibilities and adding them together
because they're dis-joined, and exhaustive. They're a partition of the
event that I get a one as the sum of the two draws. So what's the chance I get a
one and then a zero? Well, there's a 50% change I get a one, cuz there's two ones
in the box. And then, given that I get a one, there's a 25% chance that I get a
zero on the second draw. Cuz the draws are independent and there's 25% zeros in the
box. So I get five. Times a quarter is an eighth, right? And then the other way
around. What's the chance I first get a zero and then get a one? I've got a
quarter times a half, which is another eighth. So I have an eighth plus an eights
is a quarter is the prob, probability that the sum is equal to one. Okay? What are
the ways of getting two? We've got a one and a one, a zero and a two, or a two and
a zero, right? One and one is a half times a half is quarter. Zero and two is a
quarter times a quarter is a sixteenth, two and zero is a quarter times a quarter
is a sixteenth, to get two sixteenth which is an eighth plus a quarter. Is three
eighths? Did I do that right? No. Did I leave something out? Oh, I'm sorry there
is now two in the, okay. That, that would explain it, yeah, I was thinking there was
a two in the box. Keep your eyes on the box. okay, yeah, so there's, all, to get
two you have to get a one and a one. that's the only way you can do it. And the
chance you get a one and a one is a half times a half as a quarter, alright? Okay.
So, we can work this out, and find the whole probability distribution, for the
sum of two draws from this box. These are the possible values, those are the only
sums you can get, and these are probabilities of these values. So, if you
repeated this experience of drawing twice from the box over and over again. Each
time adding the results, we would get eight about a sixteenth of the time. We
would get five about a quarter of the time so the expected value of this is going to
be zero times a sixteenth, plus one times a quarter, plus two times a quarter, plus
four times an eighth, five times a quarter, eight times a sixteenth. Right?
Because we'll see each of these values about that often. Okay, so it turns out
that the expected value is three, which is twice what the expected value was for one
draw from the box, okay? That turns out to be true in great generality. The expected
value of N draw. The, the sum of N draws from a box is equal to N times the
expected value of one d raw from the box. The expected value of one draw from the
box is the average of the numbers on the tickets where you take into account
repeats. So, in this case, it's the average of zero, one, one, and four,
right? That turned out to be one and a half. The average of 37 draws from this
box. Sorry, the expected value of the sum of the 37 draws fro this box would be 37
times one and a half. Okay? So the experiment is, pull 37 tickets out of the
box with replacement, add them. Do it again. Pull 37 tickets out of the box with
replacement, add them, get, you get number each time. Imagine doing that over and
over and over again an enormous number of times. Each, each experiment consists of
pulling 37 tickets out of the box placement. What would the average of that
list be? Well, with increasingly high probability, the average of that list
would be increasingly close to 37 times one and a half. 37 times one and a half is
the expected value. Okay. in general, the expected value of a discrete random
variable is a weighted sum of the possible values the variable takes. So, we have a
random variable X. It can take values X1, X2, X3, right? In this case we're talking
about a single draw. The random variable was, what number do you see on the ticket.
What are the possible values? The possible values were zero, one, one, and four.
Right? These are the possible values of the random variale. What's the expected
value? Well, it's a weight of some of the possible values, where the weights are the
probability. If the random variable takes that value. Because the random variable is
going to take this value about that often, and it's going to take this value about
that often in repeated trials. . Okay I'm just going to state without proof the
thing that I said a moment ago, which is if I draw N times, from a box of tickets,
and I look at the sum of the, I draw N times at random from a box of tickets and
add the value of the tickets that I see. The expected value of that is n times the
average of the labels on the tickets in the box . If I draw once, the expected
value is the average, If I draw twice, it's twice the average, etcetera, alright?
It turns out that this is true even if I draw without replacement from the box.
Course, if I draw without replacement from the box, I can't, can't pull more tickets
out than are in the box in the first place. So little n needs to be less than
the number of tickets in the box. But it, it, it works out that the expected value
of the sum of N draws. For a box of numbered tickets is N times the average of
the labels on the tickets, numbers on the tickets. Alright, let's look at some
special cases. We were talking about 0-1 boxes for the last couple of lectures.
What is the average of the numbers in a 0-1 box? So, if I have a box that has
capital N tickets and all and a fraction P of them are labelled one and a fraction
one minus P of number labeled zero. Then what's the average of the labels? Well I
have N * P 1's + N * one - P * zero, right? That's the sum of all the labels
and I have N tickets in all . Alright. Okay. The average of the labels, on all
the tickets in the box is just P, okay? Now, remember that the binomial
distribution is the distribution of the sum of little n draws with replacement
from a box of tickets like this. Yes. the question is what's the difference between
the average of labels and the expected value. So, the labels are just some
particular list of numbers. Okay and we can, we can calculate the average of, of
that list. It turns out that, that the, the, what the expected value, what's the
expected value defined to be. It's defined to be the sum of. So we have a random
variable now, which is what do you get when you pull a ticket at random from the
box. A random variable is the number on that ticket. The expected value is, the
sum of the possible numbers you can get on the ticket times the probabilities of the
numbers that you get when you draw. So, expected value is a concept that applies
to random variables, average is a number that applies to lists. They're analogous,
the expect ed value of random variables is a weighted sum of its possible values. It
turns out that numerically The expected value of the number on a draw from this
box, a draw at random from this box, is equal to the average of the on the tickets
in the box. So that's a result rather than a definition. Alright? The definition is
it really defined to be the sum of the possible values each times the probability
of those values for a random variable. That's the definition. And then it turns
out that that's exactly equal to the average of the labels on the tickets in
the box. If the experiment is, pull the ticket at random from the box. The
expected value of the number that you get, when you draw a ticket at random from a
box, is always equal to the average of the labels of the tickets in the box. . And
indeed the expected value of the sum of, N, draws. The, the sum of labels on N
draws from the box, is equal to N times the average of the label of the tickets in
the box. And that's true, whether you're drawing with or without replacement. And
again, it's a result. It's not a definition. Okay, so remember, binomial
distribution is like the number of tickets labelled one we get if we draw with
replacement from this box, little N times. And, we count how many tickets labeled one
we got. And, I mentioned a couple of times that counting the number of tickets you
get are labeled one, is the same as adding the labels on the tickets that you get.
Because, when you add a string of 0's and 1's what do you get? How many 1's there
were, like the 0's don't contribute anything, the 1's contribute one. It's
just counting how many 1's there are. Okay so, if the probability distribution of.
The number of tickets labelled one you get, in little N draws with replacement
from this box is, that probability distribution is binomial with parameters
little n and little p, all right? So, wel, let\s calculate, let's, we're going to
look at the probability, we're going to look at the expected value of a binomial
distribution in two different ways. One i s starting with a definition and the other
is using the fact that we have a box model that leads to the binomial distribution.
So we wanted to start with a definition that says this, the expected value, is
defined to be the sum of the possible values times the probabilities of those
values. Then, what piece of math do we need to write down to calculate the
expected value of a binomial distribution. So, if we have a binomial distribution
with parameters N and P. So we have a random variable that has a binomial
distribution with parameters N and P. let me, let me back up a couple of steps. The
expected value of a random variable just depends on its probability distribution,
right? Because it's just a, a weighted value, a weighted sum of the possible
values times the probabilities of those values. Okay? It doesn't depend on the
outcome of some experiment. It just depends on the probability distribution.
So there's no, we can talk about the expected value of a probability
distribution or the expected value of a random variable. It, it makes sense
because the expected value of a random variable just depends on its probability
distribution. Okay, so let's suppose we have a random variable that has a binomial
distribution with parameters ending in p, so that is it's like the sum of the labels
on little end draws with replacement from that 0-1 box. Okay. So what's the expected
value? It would be, the sum of possible values times the probabilities of those
values. What are the possible values? when we draw N times and add the results on the
tickets little n times, what are the possible values we can see from the sum?
We might get a ticket labeled zero every time, right? Or we might get a ticket
labeled one every time. The, the possible values range from zero, one, two, three,
etcetera on up to n. Those are the possible values for the sum of, of the end
draws. So let's say this is the sum from J = zero to n. J is now standing for the
possible values, okay? Of J, that is the possible value. That's taken the role of x
and something, rig ht? So, it might be zero. What's the probability that it's
zero? Right? This is the binomial distribution. So, we have j, is the
possible value. And then, this is the probability that we get J successes, that
we get J tickets labelled one when we drop in the box. We have to get only the
tickets labelled one. J times, each time, there's probability p that, that happens,
they're independent. So we multiply, and we also have to get a ticket label zero
and minus j times. And there are N choose J different ways we can do that in the N
trials. Right? This, this, this should ring a bell by now. This, this piece.
Okay? So we could do this, sum, and it would take us the rest of the lecture.
Okay, or we can use the fact that the, a binomial is, you know what has a binomial
distribution? Well, little n draws with replacement from that box of tickets. We
take the sum of the draws, sum of little N draws, okay. And what do we know about the
expected value of the sum of n draws with or without replacement from a box of
numbered tickets? We know that it's little N times the average of the values on the
tickets. Yes. And we know that the average on the value of the tickets in this case
is P. So, this whole thing is just going to be equal to N times P, if we work it
out. Yes? Right? Right. So that's, that. J is like taking this role. So, this is like
zero times the probability that you get zero, one times the probability that you
get one, two times the probability you get two, on up to n times the probability you
get n tickets labelled one. What did, sorry. . Why what? How do you like? Why'd
it disappear? Well, we're, we're summing over it. All, all of its possible values.
This is a dummy variable. It's just inside the, the sum. We have, this sum has N1 +
one terms. One of those terms, J is zero. We get zero times something, that's not
going to contribute anything. Another term, this is one. So we get one times n
choose one, which is n*p^1*(1-p)^(n-1). That's the next term. Then J is going to
be two. We get two times n choose two, etc. We sum all that up. If we did that,
the answer would be n*p. Okay. So using our notion that the expected value of the
sum of n draws from a box of numbered tickets is just n times the average of the
numbers on the box, on the tickets in the box, we can shortcut a very difficult
calculation and just write down the answer. So far so good. The average of the
numbers on the tickets in the box is little p. We're drawing little n times
form the box. Because we're asking about a binomial distribution that has parameters,
n and p. So it's like the sum of the tickets we get on little n draws from a
box, where there's a fraction little p of tickets labelled one in the box, and the
rest are labelled zero. Okay, so we've just seen that the expected value of a
random variable that has a binomial distribution with parameters n and p is
just nxp. Sometimes when we draw from the box, little at times, we're not going to
get any tickets labeled one, sometimes we're going to get one ticket labeled one,
sometimes we're going to get seventeen, sometimes we're going to get ten, alright?
On average, how many tickets labeled one did we get? Let me draw a little n times
on the box. On average we get N * eight. That might not be an integer, alright? It
might not be a possible number of tickets. It's on average what would happen in the
long run. We all, together? Okay. I asserted without proof that it doesn't
make any difference whether you draw with replacement or without replacement. The
expected value is still the number of draws times the average of the, of the
numbers on the tickets in the box. Okay, we have another random variable that's
like drawing without replacement from this box of tickets. What, what, what
distribution is that? If I draw a little N times, without replacement from a box,
from a zero one box. Hyper geometric okay. So, for hyper geometric, I would call this
number G, right? G is the number of tickets in the box that are labeled one.
There is N tickets in the box in all. Okay. What is the average of the labels on
the ticket's in the box, Here? It's still P but I can write this in terms of the
parameters of the hyper geometric. What's the fraction in, in terms of, in terms of
G and N, what is P? Number of tickets labelled one, divided by the total number
of tickets there are. Okay. So, lets suppose that I draw at random without
replacement little end times from this box and add the numbers on the tickets that I
see. That's like counting how many ones I get. How many tickets labeled one do I get
in that sample without replacement? What's the expected value of that sum? Yeah, it
would be little M times big G, over big M. Provided little n is less than big n.
Can't pull more tickets out of the box than there are in the box. If you're
sampling without replacement. If I wanted to do that, from the definition of the
expected value, what would I have to calculate? Now suppose I have a
hypergeometric distribution with parameters. N, G, and little n. Then, how
would I calculate its expected value if I didn't know the shortcut. Yeah, I, I'd be
doing some sum that involves taking. J x the probability that the random variable
is equal to j. What's the probability that the random variable is equal to j? We have
this sort of max and min thing that we have to worry about the possible values.
But, but for a, for a feasible term, what does it look like? From the good tickets,
I need to choose J. From the tickets labeled zero, I need to choose, N minus J.
And, from all the tickets, I need to choose N. Alright, so I would be looking
at the sum of J times this thing, this combinatorial ratio thing. And adding the
terms for all the possible values of J. Possible values of J are going to depend
on how many tickets I pull out of the box, etcetera. It's going to be integers but it
might not start at zero depending on the sample size. Okay so this, this mess is
going to end up reducing to little n times big g over big N which is the analog of
little n times little p for the binomial. The expected value for the binomial is the
same as the expec ted value for the hypergeometric. It is the number of draws
times the fraction of ones there are in the box. So far so good.? Okay the
expected value satisfies some algebraic identities that make it very, make it
easier to calculate expected values in complicated situations by reducing at the
simpler, simpler problems. So, first of all, the expected value of a constant is
just that constant. If I have an experiment that no what happens returns
the value one. Then, in the long run, the average value I get is going to be one,
right? Expected value of a constant is just a constant. The expected value of a
sum of random variables is always the sum of their expected values. This is, this is
incredibly powerful, this, this little, this little of linear, property. And then
the final thing that's, that's extremely useful is the expected value of a constant
times a random variable is that constant times the expected value. So if I, if I
pull people at random from the room, one person at random and calculate their
height in meters and I ask what's the expected height in meters of someone drawn
at random from the room? And I say, okay what's the expected value in inches
instead of meters? It's just going to be the conversion between meters and inches
times the expected value in meters. It's scales. Okay alright. So, we're going to
use this last thing the, the expected value of a constant times a random
variable is a constant times the expected value of the random variable to figure out
the expected value of the sample mean from what we know about the expected value of
the sample sum, okay? So, we've been talking about pulling tickets at random.
Why didn't that. What? Okay. We've been talking about pulling tickets at random
and adding the values on the tickets. That's the sample sum of N draws from the
box. Suppose we, we take, we pull little n tickets at random, and then, instead of
adding them, we average them. That's the sample mean, okay? What's the expected
value of the sample mean? It's the sample mean of n tickets drawn at random from a
box. Is equal to, sample sum. Of the N tickets divided by N right? You add'em
together, and you divide by how many you have? So, that is you can write it as one
over N times the sample sum. Yes? That's a constant, times the sample sum. So the
expected value of the sample mean is that constant times the expected value of the
sample sum, right? That's this rule. Expected value of a constant times a
random variable is a constant times the expected value, so the expected value of
one over N times the sample sum is one over N times the expected value of the
sample sum. Yes. Can I reexplain it in a different way? what I mean by sample mean?
Okay so we've been talking about. Say, a 0-1 one box. Okay, and suppose I pull five
tickets at random out of the box, I might get the tickets zero, one, one, zero, one,
okay? The sample sum is zero + one + one + zero +one is three. The sample mean zero +
one + one + zero + one / five. Okay? okay, so suppose that I pull five tickets at
random out of this box, with replacement. Then the number of tickets labeled one I
get is going to be random. On average the fraction of tickets labelled one in the
sample is going to be little P. The, the sum of the label of tickets that I get is
going to be little n times little p on average. That's the expected value. So the
expected value of the sample sum. God, my handwriting is just so terrible, I'm so
sorry. So, the expected value of the sample sum is going to be, n * p. The
expected value of the sample mean. is equal to one over n times the expected
value of the sample sum. . Which is one over N times NP. , which is p. So if the
expected value of the sample sum of N tickets drawn at random from a box is N
times the average of the number of tickets, N times the average of the
numbers on the labels, the tickets in the box. Then once I take n times that
average, and divide it by n, I get back the average. The expected value of the
sample mean, is the mean of the numbers on the tickets in the box. So the expected
value of the sample mean is the population mean. Population of tickets in the box.
This make sense? Okay. And that's true, whether I'm drawing with replacement or
without replacement from the box. The expected value of the sample means is
still the population means. So, the mean of all the numbers on the tickets in the
box. That's going to turn out to be very important. There, the sample mean in
general is, you add the labels on the things that you get in your sample. You
divide by how many there were. In the special case that you're drawing from a
zero, one box, the sample mean is the percentage of 1s in the sample. If the
sample percentage is a special case from the sample mean when you're drawing from a
zero one box. All you can get are 0's and 1's, you know, if the things are red or
green, and you're interested in the fraction of red ones, then the number of
red ones divided by the sample size is the percentage of red ones. So it's, it's,
it's the sample percentage is a special case of the sample mean. . Alright, so,
the expected value of the sample mean of N random draws with or without replacement
for box of numbered tickets, is the average of the numbers on the tickets in
the box. Special case of that is, if the tickets are all labelled just zero one,
then, the expected value is the percentage of the tickets in the box labelled one. So
far, so good? Okay. Alright. What motivated so much of probability in the
first place is gambling, right? How do you figure out, what's a fair bet, what's not
a fair bet, how should you place your money. So, expected value. is intimently
tied to whether a bet is fair or not. The difference between your the, the so the
expected amount by which you, the expected amount of money from each dollar that you
lose in repeated play is called the house edge. That is fair if the house edge is
zero. That is, if on average, you would expect to break even in the long run. If
on average, you expect to lose money in the long run, the house has an edge. And
in all casino games, there is a house edge. If you play long enough, you will
los e all your money. I mean that's just how it's built. I mean unless you're
cheating, right that's. , Okay alright so let's look at what the house edges for a
bet on red in roulette, zero. The way a roulette wheel works is there are, there's
a wheel, and it has 38 positions. there are Um., there, there are 36 so that's,
eighteen, eighteen of them are red, eighteen of them are black, two of them
are green, if I'm remembering correctly, okay? Can we go, go to Vegas or, alright.
Okay. So the, the payoff for winning a bet on red is, is, is sort of break, break
even. That you sort of, you, you, you get your, you, you, you double your money. You
get your money back and you get your dollar back and another dollar if it, if
it lands in red, okay? Otherwise, you lose your money. Yes. So, if the chance that
the ball landed on red was 50% then there would be no house edge. But the chance
that the ball lands on red is less than 50%. Okay, the chance the ball lands on
red is eighteen out 38, rather than nineteen out of 38. We're braking it were
nineteen but only eighteen of 38 positions are red. So out of each dollar you bet you
expect to lose little more than a nickel. Kay and in the long run if you keep
playing over and over and over and over again with very, very high probability
your losses are going to increase until you have no more money. . what's a fair
bet? A fair bet is one where the expected value, payoff the, the expected amount of
money you get back is zero after you count for how much money you have to ante up to
play, okay? If your expected payoff is positive, the bet favors you. If the
expected payoff is negative, the bet favors somebody, whoever you're playing
against, the house. Okay, let/s Go over some of this stuff and hint at how you get
some of these results. So we, we already figured out that the expected value of a
binomial random variable with parameters N and P is N times P. It's like, it is the
sample sum of the numbers on the tickets that get if you draw independently with
replacement from a zero, one box that has a fraction P of tickets labeled one and
you draw little N times. Okay? expected value of a geometric random variable.
What's a geometric? Okay. So you're drawing from 0,1 box with replacement
independently until the first time you see a ticket labelled one. Alright. Why does
it make sense for the expected value to be one over p? I'll give you a heuristic and
then I'll show you the proof briefly, because you really don't want to see it.
heuristic is, you get, I-, if there's p tickets labeled one, then every time you
draw, you get sort of a, there's, there's p success per draw. Does that make sense?
You get sort of a fraction, P tickets labelled one per draw on average. An so if
you get. P, P Successes per draw. Then how many draws per success are there? One over
P, loosely speaking. So on average, how many times would you have to draw to get
one success? On average one over P. That's a loose heuristic. It's not completely
accurate. But the proof that that's the expected value is a little longer. So you
don't, you don't want to know that. You don't have to know that. okay, but let's
assume that we take this now as given that the expected value of a geometric random
variable with parameter P is one over P. From that we can actually derive the fact
that the expected value of a negative random variable, negative minomial random
variable, parameters, P and R, is R over P. So, let's, let's reason that out
because this reasoning is actually helpful. Right, remind me what's, what
kind of thing has a negative binomial distribution. Okay, so we've got a zero
one box, right? Fraction P tickets labeled one in the box. We draw with replacement
until the R time we get a ticket labeled one, okay? Let's think about it as
occurring in stages. We draw until the first time we get a ticket labeled one.
Then we draw until the next time we get a ticket labelled one. Then we draw until
the next time we get a ticket labelled one. Until we've done that r times. Okay?
What's the distribution of the number of times I have to draw until I get the first
ticket labelled one? It's geometric, right? Now, starting there, what's the
distribution, the number of times I have to draw until I get another ticket labeled
one? It's again geometric, right? I'm waiting from then until the first time I
get a ticket labeled one. And then once that's happened, I'm waiting from there
until the first time I get a ticket labeled one. And I have to do that a total
of R times. So I could actually write a negative binomial random variable as the
sum of R geometric random variables. So if we have X1, X2, so forth on up to XR, if
these are, if these all. Independent, random variables with the geometric
distribution with parameter P. . Then, if we define X to be X1 plus, plus XR, the
distribution. of X, is negative binomial with parameters R and P, okay? I wait
until the first time, and then I wait until the second time. And, then I wait
until the third time. Each time, I am drawing with replacement from the same box
of tickets, same fraction P of tickets labeled one. Everything is independent,
right? So, I can think of waiting for the seventh ticket labeled one as, waiting for
the first ticket labeled one and then starting over, and waiting for the first
ticket labeled one and then starting over, and doing that seven times. Okay, so the
distribution of the sum of R independent geometrics, is a negative binomial with
parameters r and p, okay? Now, we have that little rule that says, the expected
value of a sum is the sum of the expected values. What's the expected value of this
Geometric random variable? Alright, we've, we've sort of said it's one over P without
proving it, yeah? The expected value of this one is one over P, et cetera, and the
expected value of this one is one over P. So what's the expected value of their sum?
R over P. The expected power of their sum is the sum of their expected values. one
over P, one over P, one over P, one over P. That adds up to R over P. So, knowing
this, it's easy to prove this. Okay. Hypergeometric, we already did. Again, it
draws in this case, without replacement, from a box with a number of tickets, the
average of the tickets in the box, the labels on the tickets in the box, is
geogram. That's the fraction of tickets labelled one in the box. In general, if we
draw little n ***times with, or without replacement from the box of number tickets
and add the results that we see, that sample sum. To be expected by the sample
sum is N times the average of labels on the tickets in the box. Expected by the
sample mean is this divided by N which is just the average of labels on the tickets
in the box. Of course if we're drawing without replacement, we can't draw more
times than there are tickets in the box. Alright, let's do an example. Okay, so
we've got a loaded die We're going to roll it six times and we want to know what's
the expected number of spots that show in six rolls of this dye. So we can think of
each roll as giving us a random number of spots, right? And the total number of
spots we see in six rolls is the sum of six random variables that all have the
same distribution, right? Each time you roll it you get a new number. So each
draw, each, each experiment, each time we roll the die, it's like drawing from a box
of numbered tickets, where, there's point 028 times the number of tickets in the box
are labelled one. 035,. sorry 305. times N of the tickets in the box that are
labelled, six And then two, three, four, five. There's each, each of these has a
sixth times N. My handwriting. I'm so sorry do this a little better here. Give
myself more space to start with. Four, a six times n and five as six times n. Okay,
everyone has it on a die, the sum of the spots on the top and bottom add up to
seven? The opposite faces they add to seven? Which is why I've tweaked one and
six here because they're on, they're on opposite sides. Okay. All right, so, if I
draw from this box once. What's the expected value of one draw from this box?
It's going to be the average of the numbers on the tickets in the box, so.028
N of them are labeled one. .305 N of them are labeled six, etc. If I wrote them all
out. I need to sort of put it over a common denominator, but if I wrote them
all out and then added them up and divided by N, what would I get? one * 028. + six *
305. + two * one-third + three * one-sixth etecetera, right? That would be the
average of the labels on the capital N tickets there are in this box, right?
There aren't the same number of tickets of every, with every number. These have the
same number. These differ from each other, and from those. Right. Okay, so if I, if I
do this sum. That's the expected value of one draw from the box. What's the expected
value of six draws from the box? Six x whatever that gives us. Okay? The sum of
the numbers we get in six draws from the box. And that's true whether we do it with
or without replacement. we've got like two minutes. Let's just hint at the beginning
of exercise 18-7, I'm going to roll a fair die. If the die shows three spots. I'm
then going to roll three more dice. If the die shows one spot I'm going to roll, one
more die. Okay. I'm going to roll as many dice as, show the first time that I roll.
Just to make sense, I'm rolling a random number of dice, right? What's the expected
value of the total number of spots that are rolled in the second part of the
experiment? Not including the one that figured out how many dice should we're
going to roll in the second part of the experiment. How do, how do I do this?
well, the first roll might give me one spot. Okay, what's the chance it gives me
one spot? one-sixth, okay. If it does give me one spot. Then I am going to roll one
dye alright. If I roll one dye. I could get any number of spots between one and
six. There's probability a sixth that I get one spot, probability a sixth that I
get two spots etcetera. It's like one draw from a box. A number of tickets numbered
one through six. Okay? So if I knew that I was only rolling one die, then the
expected value of what, what I would get would be one + two + three + four + five +
six / six which is three one-half okay? Remember opposite bases sum to seven. So
you, okay. Suppose that I get two spots. Probability that is a sixth. If indeed I
got two spots, I'd be rolling two dice. What's the expected number of spots that I
would see if I did that? Okay. So it would be two times three and a half is seven.
Right? Etcetera. Okay, so what's the expected number of spots that show on the
die or dice that are rolled the second time? There's a one-sixth chance of
getting this and a one-sixth chance of getting this and a one sixth chance of
getting ten and one-half and a one-sixth chance of getting blah, blah, blah so we
add these numbers, and divide by six. I know there is a little subtlety going here
because we're sort of, we're, we're conditioning on how many spots show up the
first time we roll the dye. If you look at the detailed solution to that, it'll,
it'll, it'll make it a little more rigorous. Alright, I should let you guys
go. .