Markov's Inequality

Hi there, thanks for joining me so today I wanted to talk about Markov's inequality, and we're actually going to go ahead and prove Markov's Inequality, but just to provide you with some context as to why I am introducing Markov's Inequality is because we actually going to use it to prove another inequality in statistics, called Chebyshev's inequality, which in turn we can use to prove the law of large numbers. Ok, so what does Markov's Inequality actually mean? Well, stated mathematically, it means the probability that some random variable 'x' is greater than or equal to some constant 'a', is less than or equal to the expected value of x, our random variable, divided by a, the constant. So, how are we going to go ahead and prove this? Well we need to go ahead and create a function, and our function is going to look something like this. For values of x which are less than or equal to a, our function is going to take on the value zero. So it is just going to run along the floor of the x axis, and then for values of x which are greater than or equal to a, it is actually going to take on the value of one. So it looks something like this. So this function is a step function, and it is actually called the 'Indicator Function'. We write it: a digit one, then x is greater than or equal to a. Ok, so that's our indicator function, what happens that indicator function if we multiply it all by a constant a? This 'a' is just the constant we are thinking about. Well it doesn't change the value of the function if x is less than a, so runs still along the x-axis it's still equal to zero. But for values of x which are greater than or equal to a our function no longer has the value of one, it now has a value of 'a'. So, it still looks very much like a step function, it's just a step function with the value a' for x is greater than or equal to 'a'. Ok, so we can think about this function as having two distinct regions. In the first region we have that x is less than x, and in the second region we have that x is greater than or equal to a. And why I have introduced these two regions should become apparent in the next part of this proof. Ok, so if we think about region one, then what value does our function take? Well our function a times one, x is greater than or equal to a is actually taking on a value of zero. And, since we know x is greater than zero in this region, we know that this is less than x. Ok, so that was simple enough, because we know that here for example x would be equal to one, whereas our indicator function is in this case equal to zero in this region. Ok, and then in the second region we have that x is greater than or equal to a, well, a times our indicator function, in this case is not equal to zero, in this case it is equal to a, and what values does x take on, well at the value of x=a, it takes on the value of a, for any sort of value all x greater than a, obviously it is greater than a. So, in fact in the second region we have that a is less than or equal to x. Ok, so in both regions we have proved that x is going to be equal to or greater than a times our indicator function. Ok, so why have we done that? Well, it becomes apparent when we think about writing out our inequality for all regions. Since we have proved it for both regions, we can write that a times our indicator function is always less than or equal to x. Ok, so this is starting to look somewhat similar to what we have above, but I've got this indicator function here, rather than a probability. Well, that's easily turned into a probability. If I just take the expectation of both sides, then I just have a times the expected value of our indicator function, has got to be less than or equal to the expected value of x. But, in fact this sort of expectation of our indicator function, is really the definition of what a probability is. Because we know that this only takes on two values, it takes on a value of one, if x is greater than or equal to a, or takes the value of zero otherwise. so you can sort of think about the expected value of our indicator function as taking some value between zero and one - it can't be anything else. And in fact this is actually identical to our probability, so we can write this thing now as a times the probability that x is greater than or equal to a, has got to be less than or equal to the expected of x. Which then, if we just divide both sides by our constant, a, we get that the probability that x is greater than or equal to a, is less than or equal to the expected value of x divided by a, and that is identical to what we have at the top. So we have gone ahead and proven Markov's Inequality. In the next (but one) video we are going to go ahead and apply Markov's and it will enable us to prove Chebyshev's inequality. And the video after that, we are going to use Chebyshev's inequality to prove the Law of Large Numbers. I'll see you then.