1 5 Discrete probability crash course cont 14 min

In this segment, we're gonna continue with a few more tools from discrete probability, and I want to remind everyone that if you wanna read more about this, there's more information in wiki books article that is linked over here. So first let's do a quick recap of where we are. We said that the discrete probability is always defined over a finite set, which we're gonna denote by U, and typically for us, U is going to be the set of all N bit binary strings, which we denote by zero 130 N. Now a probability distribution P over this universe U is basically a function that assigns to every element in the universe a weight in the interval zero to one, such that if we sum the weight of all these elements, the sum basically sums up to one. Now we have said that subset of the universe is what called an event, and we said that probability of an event is basically the sum of all the weights of all the elements in the event and we said that probability of an event is some real numbers in the interval zero to one And I would remind everyone the basically probability of the entire universe is basically the one by the fact that sum of all the weights sums up to one. Then we define what a random variable is Formally, a random variable is a, is a function from the universe of some other sets But the thing that I want you to remember is that the random variable takes, values in some set v And, in fact, the random variable defines the distribution on this set v. So the next concept we need is what's called independence And I'm gonna very briefly define this If you want to read more about independence, please go ahead and look at the wiki books article. But essentially we say that two events A and B are independent of one another if When you know that event A happens, that tells you nothing about whether event B actually happened or not. Formally, the way we define independence is to say that, the probability of A and B, namely, that both events happened, is actually=to the probability of event A the probability of event B So mult iplication, in some sense, the fact that probabilities multiply under conjunction, captures the fact that these events are independent And as I said, if you wanna read more about that, please take a look at the background material. Now the same thing can be said for random variables. So suppose we have two random variables x and y. They take values in some set v. Then we say that these random variables are independent if the probability that x = a, and y = b is equal to the product of these two probabilities. Basically what this means is, even if you know that x = a, that tells you nothing about the value of y. Okay, that, that's what this multiplication means And again this needs to hold for all A and B in the space of values of these random variables So, just again to job your memory If you've seen this before, a very quick example. Suppose we look at the, set of, of two bit strings So, zero, zero, zero, one, one zero and, one, one And suppose we choose a random, from this set. Okay so we randomly choose one of these four elements with equal probability. Now let's define two random variables. X is gonna be the least significant bit that was generated, and Y is gonna be the most significant bit generated. So I claim that, these random variables, x and y, are independent of one another. And the way to see that intuitively, is to realize that choosing r uniformly, from the set of four elements is basically the same as flipping a coin An unbiased coin twice. The first bit corresponds to the outcome of the first flip and the second bit corresponds to the outcome of the second flip And of course there are four possible outcomes. All four outcomes are equally likely which is why we get the uniform distributions over two bit strings Now our variables X and Y. Y the independents Basically if I tell you result of the first flip namely I tell you the lest signify bit of R So I tell you how the first coin you know whether it fell on its head or fell on its tails. That tells you nothing about the result of the second flip. Which is why intuitively, you might, you might expect these random variables to be independent of one another. But formally, we would have to prove that, for, all 01 pairs, the probability of, x=0 and y=0, x=1, y=1, and so on. These probabilities multiply. Let's just do it for one of these pairs. So let's look at the probability that x is = to zero, and y is = to zero. Well, you see that the probability that x is equal to zero and y is equal to zero is basically the probability that r is equal to zero, zero And what's the probability that r is equal to zero, zero? Well, by the uniform distribution, that's basically equal to one-fourth. What it's one over side of the set which is one fourth in this case And well low and behold that's in fact these probably provability multiply Because again the probability that X is equal to zero. The probability that lest signify bit of R is equal to zero. This provability is exactly one half because there is exactly two elements that have their, lest signify bit equals to zero. Two out of four elements gives you a provability of one half And similarly the probability that Y is equal to zero is also one half so in fact the provability multiplies. Okay, so that's, this concept of independence And the reason I wanted to show you that is because we're gonna look at an, an important property of XOR that we're gonna use again and again. So before we talk about XOR, let me just do a very quick review of what XOR is. So, of course, XOR of two bits means the addition of those bits, modular two. So just too kind of, make sure everybody's on the same page If we have two bits, so 0001, ten and eleven. Their XOR or the truth table of the XOR is basically just the addition modular two. As you can see, one+1 is= to two, modular two. That's=to zero. So this is the truth table for [inaudible] XOR And I'm always going to denote XOR by the circle with a + inside And then when I wanna apply XOR to bit strings, I apply the, addition modular two operation, bitwise. So, for example, the XOR of these two strings would be, 110, and I guess I'll let you fill out the rest of the XORs, just to make sure we're all on the same page. So of course is comes out to one, one zero one. Now, we're gonna be doing a lot of XORing in this class. In fact, there's a classical joke that the only think cryptographers know how to do is just XOR things together But I want to explain to you why we see XOR so frequently in cryptography. Basically, XOR has a very important property, and the property is the following. Suppose we have a random variable y. That's distributed arbitrarily over 01 to the n. So we know nothing about the distribution of y But now, suppose we have an independent random variable that happens to be uniformly distributed also over 01 to the n. So it's very important that x is uniform. N's independent of y. So now let's define the random variable which is the XOR of x and y. Then I claim that no matter what distribution y started with, this z is always going to be a uniform, random variable. So in other words if I take an arbitrarily malicious distribution and I XOR with independent uniform random variable what I end up with is a uniform random variable. Okay and this again is kind of a key property that makes x or very useful for crypto. So this is actually a very simple factor to prove, let's just go ahead and do it, let's just prove it for one bit so for n = one. Okay, so the way we'll do it is we'll basically write out the probability distributions for the various random variables. So let's see, for the random variable y. Well, the random variable can be either zero or one. And let's say that P0 is the probability that it's = to zero, and P1 is the probability that it's =to one. Okay, so that's one of our tables. Similarly, we're gonna have a table for the variable x. Well, the variable x is much easier. That's a uniform random variable. So the probability that it's=to zero is exactly one half The probability that's it's=to one is also exactly one half. Now, let's write out the probabilities for the join di stribution. In, in other words, let's see what the probability, is for the various, joint values of y and x. In other words, how likely is, it that y is zero, and x is zero. Y is zero, and x is one. Y is one and x is zero, and eleven. Well, so what we do is, basically, because we assume the variables are independent, all we have to do is multiply the probabilities. So The probability that y is equal to zero is p zero. Probability that x is equal to zero is one-half. So the proximity that, we get 00 as exactly p zero over two. Similarly for zero one we'll get p zero over two, for one zero we'll get p one over two And for 1-1, again, the probability that y is=to one, and x is=to one, Well, that's P1 the probability that x is=to one, which is a half, so it's P1 over two. Okay? So those are the four, probabilities for the various options for x and y. So now, let's analyze, what is the probability that z is equal to zero? Well, the probability that z is=to zero is basically the same as the probability that, let's write it this way, xy. Is=to 00. Or xy is=to, eleven. Those are the two possible cases that z is=to zero Because z is the XOR of x and y. Now, these events are disjoint, so, this expression can simply be written as the sum of the two expressions given above. So basically, it's the probability that xy is=to 00, plus the probability that xy, is=to eleven. So now we can simply look up these probabilities in our table. So the probability that xy is equal to 00 is simply p zero over two, and the probability that xy is equal to one, one is simply p one over two. So we get P0 over two +P1 over two But what do we kn-, what do we know about P0 and P1? Well, it's a probability distribution. Therefore, P0+P1 must =one And therefore, this fraction here must= to a half. P0+P1 is =to one. So therefore, the sum of these two terms must = a half And we're done. Basically, we proved that the probability that z is = to zero. Is one half, therefore the probability that z is equal to one is also one half. Therefore z is a unif orm random variable. So the simple theorem is the main reason why x o is so useful in cartography. The last thing I wanna show you about discreet probability is what's called the birthday paradox And I'm gonna do it really fast here Because we're gonna come back later on, and talk about this in more detail. So, the birthday paradox says the following suppose I choose n random variables in our universe u. Okay And it just so happens that these variables are independent of one another. They actually don't have to be uniform. All we need to assume is that they're distributed in the same way. The most important property though is that they're independent of one another. So the theorem says that if you choose roughly the square root of the size of u elements, we're kind of this one point two here, it doesn't really matter. But if you choose square root of the size of u elements, then basically there's a good chance that there are two elements that are the same. In other words, if you sample about square root a few times, then it's likely that two of your samples. [inaudible] will be equal to each other. And by the way, I should point out that this inverted e, just means exists. So there exists in [inaudible] I and j such that r I is equal to r j. So here's a concrete example. We'll actually see many, many times. Suppose our universe consist of all strings of length of one hundred and twenty eight bits. So the size of you is gigantic it's actually two to the hundred and twenty eight. It's a very, very large set But it so happens if you sample say around two the sixty four times from the set. This is about the square root of U then is very likely that two of your sample messages will actually be the same. So why is, this called the paradox? Well, traditionally it's described in terms of people's birthdays. So you would think that each of these samples would be someone's birthday And so the question is how many people need to get together so that there are two people that have the same birthday? So, just as a simple cal culation you can see there are 365 days in the year, so you would need about 1.2 times the square root of 365 people until the probability that two of them have the same birthday is more than a half. This if I'm not mistaken is about 24, which means that if 24 random people get together in a room it's very likely that two of them will actually have the same birthday. This is why it's called a paradox, because 24 supposedly is a smaller number than you would expect. Interestingly, people's birthdays are not actually uniform across all 365 days of the year. There's actually a bias towards December. >> But, I guess that's not, that's not relative to the discussion here. >> The last thing I wanted to do is just show you the birthday paradox a bit more concretely. So, suppose we have a universe of about a million samples, then you can see that when we sample roughly 1200 times, the probability that we get, we sample the same number, the same element twice is roughly half But the probability of sampling the same number twice actually converges very quickly to one. You can see that if we about 2200 items, then the probability that two of those items are the same, already is 90 percent and You know, 3000 then it's basically one. So this conversion is very quickly to one as soon as he goes beyond the square root of the size of the universe. So we're gonna come back and study the birthday paradox in more detail later on, but I just for now, wanted you to know what it is. So that's the end of this segment, and then in the next segment we'll start with our first example of encryption systems. [sound]