Tip:
Highlight text to annotate it
X
In this segment, we're gonna continue with a few more tools from discrete
probability, and I want to remind everyone that if you wanna read more about this,
there's more information in wiki books article that is linked over here. So first
let's do a quick recap of where we are. We said that the discrete probability is
always defined over a finite set, which we're gonna denote by U, and typically for
us, U is going to be the set of all N bit binary strings, which we denote by zero
130 N. Now a probability distribution P over this universe U is basically a
function that assigns to every element in the universe a weight in the interval zero
to one, such that if we sum the weight of all these elements, the sum basically sums
up to one. Now we have said that subset of the universe is what called an event, and
we said that probability of an event is basically the sum of all the weights of
all the elements in the event and we said that probability of an event is some real
numbers in the interval zero to one And I would remind everyone the basically
probability of the entire universe is basically the one by the fact that sum of
all the weights sums up to one. Then we define what a random variable is Formally,
a random variable is a, is a function from the universe of some other sets But the
thing that I want you to remember is that the random variable takes, values in some
set v And, in fact, the random variable defines the distribution on this set v. So
the next concept we need is what's called independence And I'm gonna very briefly
define this If you want to read more about independence, please go ahead and look at
the wiki books article. But essentially we say that two events A and B are
independent of one another if When you know that event A happens, that tells you
nothing about whether event B actually happened or not. Formally, the way we
define independence is to say that, the probability of A and B, namely, that both
events happened, is actually=to the probability of event A the probability of
event B So mult iplication, in some sense, the fact that probabilities multiply under
conjunction, captures the fact that these events are independent And as I said, if
you wanna read more about that, please take a look at the background material.
Now the same thing can be said for random variables. So suppose we have two random
variables x and y. They take values in some set v. Then we say that these random
variables are independent if the probability that x = a, and y = b is equal
to the product of these two probabilities. Basically what this means is, even if you
know that x = a, that tells you nothing about the value of y. Okay, that, that's
what this multiplication means And again this needs to hold for all A and B in the
space of values of these random variables So, just again to job your memory If
you've seen this before, a very quick example. Suppose we look at the, set of,
of two bit strings So, zero, zero, zero, one, one zero and, one, one And suppose we
choose a random, from this set. Okay so we randomly choose one of these four elements
with equal probability. Now let's define two random variables. X is gonna be the
least significant bit that was generated, and Y is gonna be the most significant bit
generated. So I claim that, these random variables, x and y, are independent of one
another. And the way to see that intuitively, is to realize that choosing r
uniformly, from the set of four elements is basically the same as flipping a coin
An unbiased coin twice. The first bit corresponds to the outcome of the first
flip and the second bit corresponds to the outcome of the second flip And of course
there are four possible outcomes. All four outcomes are equally likely which is why
we get the uniform distributions over two bit strings Now our variables X and Y. Y
the independents Basically if I tell you result of the first flip namely I tell you
the lest signify bit of R So I tell you how the first coin you know whether it
fell on its head or fell on its tails. That tells you nothing about the result of
the second flip. Which is why intuitively, you might, you might expect these random
variables to be independent of one another. But formally, we would have to
prove that, for, all 01 pairs, the probability of, x=0 and y=0, x=1, y=1, and
so on. These probabilities multiply. Let's just do it for one of these pairs. So
let's look at the probability that x is = to zero, and y is = to zero. Well, you see
that the probability that x is equal to zero and y is equal to zero is basically
the probability that r is equal to zero, zero And what's the probability that r is
equal to zero, zero? Well, by the uniform distribution, that's basically equal to
one-fourth. What it's one over side of the set which is one fourth in this case And
well low and behold that's in fact these probably provability multiply Because
again the probability that X is equal to zero. The probability that lest signify
bit of R is equal to zero. This provability is exactly one half because
there is exactly two elements that have their, lest signify bit equals to zero.
Two out of four elements gives you a provability of one half And similarly the
probability that Y is equal to zero is also one half so in fact the provability
multiplies. Okay, so that's, this concept of independence And the reason I wanted to
show you that is because we're gonna look at an, an important property of XOR that
we're gonna use again and again. So before we talk about XOR, let me just do a very
quick review of what XOR is. So, of course, XOR of two bits means the addition
of those bits, modular two. So just too kind of, make sure everybody's on the same
page If we have two bits, so 0001, ten and eleven. Their XOR or the truth table of
the XOR is basically just the addition modular two. As you can see, one+1 is= to
two, modular two. That's=to zero. So this is the truth table for [inaudible] XOR And
I'm always going to denote XOR by the circle with a + inside And then when I
wanna apply XOR to bit strings, I apply the, addition modular two operation,
bitwise. So, for example, the XOR of these two strings would be, 110, and I guess
I'll let you fill out the rest of the XORs, just to make sure we're all on the
same page. So of course is comes out to one, one zero one. Now, we're gonna be
doing a lot of XORing in this class. In fact, there's a classical joke that the
only think cryptographers know how to do is just XOR things together But I want to
explain to you why we see XOR so frequently in cryptography. Basically, XOR
has a very important property, and the property is the following. Suppose we have
a random variable y. That's distributed arbitrarily over 01 to the n. So we know
nothing about the distribution of y But now, suppose we have an independent random
variable that happens to be uniformly distributed also over 01 to the n. So it's
very important that x is uniform. N's independent of y. So now let's define the
random variable which is the XOR of x and y. Then I claim that no matter what
distribution y started with, this z is always going to be a uniform, random
variable. So in other words if I take an arbitrarily malicious distribution and I
XOR with independent uniform random variable what I end up with is a uniform
random variable. Okay and this again is kind of a key property that makes x or
very useful for crypto. So this is actually a very simple factor to prove,
let's just go ahead and do it, let's just prove it for one bit so for n = one. Okay,
so the way we'll do it is we'll basically write out the probability distributions
for the various random variables. So let's see, for the random variable y. Well, the
random variable can be either zero or one. And let's say that P0 is the probability
that it's = to zero, and P1 is the probability that it's =to one. Okay, so
that's one of our tables. Similarly, we're gonna have a table for the variable x.
Well, the variable x is much easier. That's a uniform random variable. So the
probability that it's=to zero is exactly one half The probability that's it's=to
one is also exactly one half. Now, let's write out the probabilities for the join
di stribution. In, in other words, let's see what the probability, is for the
various, joint values of y and x. In other words, how likely is, it that y is zero,
and x is zero. Y is zero, and x is one. Y is one and x is zero, and eleven. Well, so
what we do is, basically, because we assume the variables are independent, all
we have to do is multiply the probabilities. So The probability that y
is equal to zero is p zero. Probability that x is equal to zero is one-half. So
the proximity that, we get 00 as exactly p zero over two. Similarly for zero one
we'll get p zero over two, for one zero we'll get p one over two And for 1-1,
again, the probability that y is=to one, and x is=to one, Well, that's P1 the
probability that x is=to one, which is a half, so it's P1 over two. Okay? So those
are the four, probabilities for the various options for x and y. So now, let's
analyze, what is the probability that z is equal to zero? Well, the probability that
z is=to zero is basically the same as the probability that, let's write it this way,
xy. Is=to 00. Or xy is=to, eleven. Those are the two possible cases that z is=to
zero Because z is the XOR of x and y. Now, these events are disjoint, so, this
expression can simply be written as the sum of the two expressions given above. So
basically, it's the probability that xy is=to 00, plus the probability that xy,
is=to eleven. So now we can simply look up these probabilities in our table. So the
probability that xy is equal to 00 is simply p zero over two, and the
probability that xy is equal to one, one is simply p one over two. So we get P0
over two +P1 over two But what do we kn-, what do we know about P0 and P1? Well,
it's a probability distribution. Therefore, P0+P1 must =one And therefore,
this fraction here must= to a half. P0+P1 is =to one. So therefore, the sum of these
two terms must = a half And we're done. Basically, we proved that the probability
that z is = to zero. Is one half, therefore the probability that z is equal
to one is also one half. Therefore z is a unif orm random variable. So the simple
theorem is the main reason why x o is so useful in cartography. The last thing I
wanna show you about discreet probability is what's called the birthday paradox And
I'm gonna do it really fast here Because we're gonna come back later on, and talk
about this in more detail. So, the birthday paradox says the following
suppose I choose n random variables in our universe u. Okay And it just so happens
that these variables are independent of one another. They actually don't have to
be uniform. All we need to assume is that they're distributed in the same way. The
most important property though is that they're independent of one another. So the
theorem says that if you choose roughly the square root of the size of u elements,
we're kind of this one point two here, it doesn't really matter. But if you choose
square root of the size of u elements, then basically there's a good chance that
there are two elements that are the same. In other words, if you sample about square
root a few times, then it's likely that two of your samples. [inaudible] will be
equal to each other. And by the way, I should point out that this inverted e,
just means exists. So there exists in [inaudible] I and j such that r I is equal
to r j. So here's a concrete example. We'll actually see many, many times.
Suppose our universe consist of all strings of length of one hundred and
twenty eight bits. So the size of you is gigantic it's actually two to the hundred
and twenty eight. It's a very, very large set But it so happens if you sample say
around two the sixty four times from the set. This is about the square root of U
then is very likely that two of your sample messages will actually be the same.
So why is, this called the paradox? Well, traditionally it's described in terms of
people's birthdays. So you would think that each of these samples would be
someone's birthday And so the question is how many people need to get together so
that there are two people that have the same birthday? So, just as a simple cal
culation you can see there are 365 days in the year, so you would need about 1.2
times the square root of 365 people until the probability that two of them have the
same birthday is more than a half. This if I'm not mistaken is about 24, which means
that if 24 random people get together in a room it's very likely that two of them
will actually have the same birthday. This is why it's called a paradox, because 24
supposedly is a smaller number than you would expect. Interestingly, people's
birthdays are not actually uniform across all 365 days of the year. There's actually
a bias towards December. >> But, I guess that's not, that's not relative to the
discussion here. >> The last thing I wanted to do is just show you the birthday
paradox a bit more concretely. So, suppose we have a universe of about a million
samples, then you can see that when we sample roughly 1200 times, the probability
that we get, we sample the same number, the same element twice is roughly half But
the probability of sampling the same number twice actually converges very
quickly to one. You can see that if we about 2200 items, then the probability
that two of those items are the same, already is 90 percent and You know, 3000
then it's basically one. So this conversion is very quickly to one as soon
as he goes beyond the square root of the size of the universe. So we're gonna come
back and study the birthday paradox in more detail later on, but I just for now,
wanted you to know what it is. So that's the end of this segment, and then in the
next segment we'll start with our first example of encryption systems. [sound]