Mod - 01 lec - 09 characteristics of distributions

In the last lecture, I have introduced certain characteristics of the Probability Distributions such as, its expected value that is the mean, variance and also some higher order moments. So, the mean of the variance distribution denotes the measure of central tendency or the measure of location for a distribution, the variance or the standard deviation denote a tell the about the variability of the values of the distribution. We may also be interested in some further characteristics of the probability distribution such as its skewness. Let us define what skewness is, so consider a distribution of this type. Let us consider another distribution and let us consider... Now, if we compare the shape of the curves the first reaction after looking at the first curve is that, it is symmetric about a certain axis say mu, if we look at the second curve then, there is lot of concentration of probability on the left hand side and there is a long tail on the right side. That means, there is a long tail to the right of the mean of the distribution whereas, if we look at the third curve here, then, there is a long tail to the left and that means there are more concentration of values on the right side, and there is a large variation towards the left of the mean, so we will call this as a symmetric curve. We considered the definition of symmetric distribution earlier, that is probability, that x is less than or equal to x is equal to probability x greater than or equal to x for a certain. Suppose, if it is a symmetric about 0 then we should have this kind of thing, so if it not symmetric we will call it skewed, so this one will be called positively skewed and this one we will call as negatively skewed distribution. A measure for this can be defined in terms of say, let me call it beta 1 that is equal to mu 3 divided by mu 2 to the power 3 by 2, we are considering this revision by mu 2 to the power 3 by 2 that is sigma cube, where sigma denotes the standard deviation of a distribution, this is to make it free from the units of measurement. So, if beta 1 is 0 we have symmetry, if it is greater than 0 it is positively skewed, if it is less than 0 it is negatively skewed. We also define another characteristic called peakedness of a distribution. Compare the curves, so if we look at it this has a high peak, this is somewhat in the middle or average or normal and this is more a flat curve, we consider this property as the kurtosis. So, this we call as a normal peak, this is called leptokurtic, that is high peak and this is called platikurtic, that is the flat peak, a measure of kurtosis or peakedness is defined to be beta 2 is equal to mu 4 by mu 2 square minus 3; so, if it is 0 we have a normal peak, if it is greater than 0 it is leptokurtic, and if it is less than 0 we call it platikurtic. The peak of a normal distribution which will be defined later on, that, will have the coefficient beta 2 is equal to 0, so peak of any distribution is actually compared with the peak of a normal distribution. Now, we have already seen that sometimes moments of the distributions may not exist or a lower order moment may exist, but higher order moments may not exist. We have a general result in this direction. If the moment of order say t, where t is greater than 0 exists, then the moment of order s where 0 is less than s is less than t exists for a given random variable x. So, if a positive order moment exists, then all lower ordered positive moments will exist for the given random variable. Let us look at the proof of this one. For convenience, let me take x to be a continuous random variable, let x be a continuous random variable with say probability density function f x. Let us write down expectation of modulus x to the power s, this is equal to integral minus infinity to infinity modulus x to the power s f x d x; this one we split into two regions, modulus x less than or equal to 1 and modulus x greater than 1. In this region where modulus x is less than or equal to 1, I can replace this by 1, so this is less than or equal to integral of modulus x less than or equal to 1 f x d x which is nothing but, the probability that modulus x is less than or equal to 1. Then modulus x is greater than 1, if I replace this power s by power t I will get a bigger quantity, so this becomes modulus x to the power t f x d x, now this is less than the expectation of modulus. So, the first term itself this is less than or equal to 1, and this is less than expectation of modulus x to the power t. Since we are assuming that the moment of order t exists, this is finite and therefore, expectation of modulus x to the power s is finite that means the moment of order s exists, this is the condition for existence of the moment of order s. Now, sometimes when the moment do not exist, then it may be difficult to find out the measures of central tendency or measure of location or measure of variability or say, measure of symmetry or kurtosis, etcetera. So, we may look at the points on the distribution itself, which divide the curve into certain regions with certain proportions, these are called quantiles of the distribution. To explain the concept, let us consider some distribution with a particular shape. Suppose, I have a point here let us call it say a, and if I say probability that x is less than or equal to a, is equal to p, that means this ((w8age)) is p and this ((w8age)) is 1 minus p then, this is called the pth quantile, it is easy to explain the concept of median. For example, that it divides the distribution into two parts, the probability is half in this portion, the probability is half in this portion, so roughly speaking ((a p the )) quantile is the point up to which the probability of random variable taking a value is p, and the probability beyond that is 1 minus p. However, to take care of the discrete distributions we give the formal definition of a quantile as follows: A number, let me call it Q p satisfying probability x less than or equal to Q p greater than or equal to p, and probability x greater than or equal to Q p greater than or equal to 1 minus p, for 0 less than p less than 1 is called p th quantile or quantile of order p of the distribution of x. So, obviously if F is absolutely continuous distribution function, then you will have F of Q p is equal to p, that is there will be a unique quantile, so Q half is called median of x we use a notation m also, Q 1 by 4 Q half and Q 3 by 4, these are called quartiles of x. We also define say, Q 1 by 10 and Q 2 by 10 and so on Q 9 by 10 etcetera, these are called deciles. You can write Q 1 by 100 and Q 2 by 100 and so on, these are called percentiles that means, if we want to divide the distribution into ten parts, the distribution into four parts, the distribution into hundred parts etcetera, so they are having different notations. In various problems, we are interested in different kind of quantiles. For example, in various studies we may be interested in percentage of the people living below poverty line etcetera, so this is some particular percentile; but, suppose we say 25 percent people lie below a certain thing or 75 percent of the items are above something, then it becomes quantiles. Let us explain through certain examples. Let us consider f x is equal to 1 by pi 1 by 1 plus x square, minus infinity less than x less than infinity. We have seen that for this distribution the mean does not exist; therefore, there is no question of higher order moments also existing. However, if we look at F x then it is equal to 1 by pi tan inverse x minus pi by 2 and if we calculate, no this is plus pi by 2. So, if we calculate say the point F x m is equal to half, then this is corresponding to simply x or m is equal to half m is equal to 0, which is clear if we plot this distribution, so it is symmetric about the point 0. So, if the distribution is symmetric about a given point, then that point will be the actually the median of the distribution. Now, we can also calculate the quantiles here. Suppose, we look at F x Q 1 is equal to 1 by 4 then, this gives 1 by pi tan inverse x plus pi by 2 is equal to 1 by 4, this means tan inverse x is equal to minus pi by 4, that means x is equal to minus 1. Similarly, if I calculate, so this is Q 1 that is the first quantile in this distribution is minus 1, second quantile that is median is 0, and a similar way if I look at F x Q 3 is equal to 1 by 4 3 by 4 then, this will give me Q 3 is equal to plus 1. So, we are able to determine the measures on the curve, so it roughly tells that 25 percent of the observations lie below minus 1 and 25 percent of the observations lie between minus 1 and 0, 25 percent of the observations lie between 0 and 1, and 25 percent of the observations lie beyond 1. So, it has a very long tail because, in fact 50 percent of the probability is between minus 1 to 1 and rest 50 percent is dispersed over minus infinity to minus 1 and 1 to infinity. Let us consider say, probability x is equal to minus 2 and probability x is equal to 0 is say 1 by 4. Probability say, x is equal to 1 is equal to say 1 by 3, and probability x is equal to 2 is suppose 1 by 6, so this is the discrete distribution concentrated on 4 points minus 2 and 0 and 1 and 2, so if we apply the definition of the median, probability x less than or equal to m is greater than or equal to half, and probability x greater than or equal to m is also greater than or equal to half, so median must satisfy this condition. So, you look at which points satisfy this condition. Now, here the probability of x being less than or equal to 0 is half because, probability of x equal to minus 2 and probability x equal to 0 both are 1 by 4. So, as soon as we approach 0 if we look at the up to at minus 2 you have 1 by 4 at 0, you have 1 by 4 at 1, you have 1 by 3, and at 2 you have 1 by 6. So, any point after 0 this will have the condition probability x less than or equal to m greater than or equal to half satisfied. If we look at the second condition here, the probability is 1 by 3. Here, the probability is 1 by 1 by 6 and 1 by 3, so if you add this becomes half. So, if I consider m to be any point before 1 then, probability that x greater than or equal to m is greater than or equal to half is satisfied. In fact, if I consider probability x greater than or equal to 1 then, it is equal to probability x plus x equal to 1 plus probability x equal to 2, which is equal to half. So, any point which is less than or equal to 1 will satisfy the second condition. Any point which is greater than or equal to 0 will satisfy the first condition. So any m, such that 0 less than or equal to m less than or equal to 1 satisfies the two conditions. Hence, m belonging to 0 to 1 is a median, so this is a case where the median is not unique. So, in particular, in discrete distributions, we may not have a unique quantile; in the continuous random variable case there will be a unique quantile. There is another function called moment generating function, which tells something about the distribution. So, let us consider that, let x be a random variable, the function m x at t is defined to be expectation of e to the power t x is called moment generating function of the random variable x provided it exists; provided the right hand side exists for some t, not equal to 0. As you can see here, at t is equal to 0 this will always exist. So, for t not equal to 0 it should exist, that means in a neighborhood of the origin, if it exist, then we say that the moment generating function is well defined. We may have a case that moment generating function may not exist. Let us consider, suppose you take f x is equal to 1 by pi 1 by 1 plus x square. So, if I look at expectation of e to the power t x, then it is equal to minus infinity to infinity 1 by pi 1 by 1 plus x square e to the power t x d x. So, if you look at this one then, this does not always exist, because in the numerator you have an exponential term, and in the denominator you have only polynomial. In fact, we have seen that, the mean itself does not exist, that means if I put here x in place of e to the power t x, that itself does not exist, for t not equal to 0. Let us take another example say, f x is equal to half e to the power minus x by 2, for x greater than 0 and 0 for x less than or equal to 0. Let us consider m x t, so it is equal to integral 0 to infinity half e to the power t x e to the power minus x by 2 d x. Now, this you can combine, so it becomes 0 to infinity half e to the power minus half 1 minus 2 t x d x that is equal to 1 by 1 minus 2 t, for t less than half, so here the moment generating function exists in a neighborhood of 0. The point that why we are interested in a function called moment generating function is that it gives first of all, it uniquely determines a distribution also, it gives lot of information about the moments, that is why the name moment generating function is there, let us look at that thing. So, we have the following result. The moment generating function uniquely determines a c d f and conversely, if the moment generating function exists, it is unique. If the moment generating function m x t exists for modulus t less than t naught, that means in a neighborhood of 0, the derivatives of all orders exist at t equal to 0 and can be evaluated under the integral sign, integral sign or you can say, summation sign depending upon whether the discrete or continuous distribution is there. So, derivative of the moment generating function of order k, at t equal to 0 gives you the k th non central moment, that is why it is known as a moment generating function. You can see this fact, if I say that the moment generating function exists, then I can consider the expansion of e to the power t x in a maclaurin series as 1 plus t x by 1 factorial t x square by 2 factorial and so on, this is equal to 1 plus t by 1 factorial mu 1 prime t square by 2 factorial mu 2 prime, etcetera. That means, coefficient of t to the power k by k factorial is the k th order non central moment for k equal to 1, 2 and so on. Let us consider this example, where m x t is equal to 1, 1 divided by 1 minus 2 t, for t less than half. Let us consider derivative of this, so that is equal to minus 1 by 1 minus 2 t square then minus, so it becomes plus and then you are multiplying 2 you put t equal to 0 here, that is equal to 1 by 4 and 2, so it is half. Let us check here directly from the distribution if I calculate expectation of x, this is equal to integral x by 2 e to the power minus x by 2 d x 0 to infinity. So, if we integrate this by parts or if we use the gamma function, then it is gamma 2 divided by 2 here, so you will get 2 that is equal to 1 by 2 square and then, this is ((this will be 2 here)) 1 by 1 minus 2 t whole square and then you have 2 here. Let mu k prime be the moment sequence of random variable x. If the series sigma mu k prime by k factorial t to the power k converges absolutely for some t greater than 0, then mu k prime uniquely determines the c d f F of the random variable x. Sometimes, we have partial information about the probability distribution of a random variable. So, we may not have substantial knowledge about probabilities of various intervals or random variable taking value less than something or greater than something. So, in such cases, we have certain probability inequalities which are useful if we know only a certain moment say, mean or variance or one particular moment. So, these are known as one of the first one in this direction is called chebyshev’s inequality. Let x be a random variable with mean mu, and variance sigma square then, for any k positive probability of modulus x minus mu greater than or equal to k is less than or equal to sigma square by k square, so you can see this denotes the probability of x lying in certain interval or lying outside a certain interval, we do not have full information about the random variable except its mean and variance. Nevertheless, we are able to give certain bound for this probability to prove this, let us consider, x to be continuous with certain p d f f x. So, let us consider the expression for variance this is equal to expectation of x minus mu square, which is equal to integral x minus mu square f x d x. Now, this particular integral is greater than or equal to modulus x minus mu greater than or equal to k, this is because the integrant is non negative. So, if you reduce the reason of integration, the value will become smaller. Now, on this reason x minus mu whole square is greater than or equal to k square, so we can replace by that this is nothing but, probability of modulus x minus mu greater than or equal to k. So, as a consequence, probability of modulus x minus mu greater than or equal to k is less than or equal to sigma square by k square or you can write down the alternative forms of this inequality by taking the complementary event here. So, you will have 1 minus probability of modulus x minus mu greater than or equal to k is greater than or equal to 1 minus sigma square by k square, or you can write probability of modulus x minus mu less than k is greater than or equal to 1 minus sigma square by k square. Sometimes, the form is written in this fashion; probability of modulus x minus mu less than k sigma is greater than or equal to 1 minus 1 by k square or probability of modulus x minus mu greater than or equal to k sigma is less than or equal to 1 by k square. A more general inequality of the same type is called markov’s inequality. Let x be a random variable and g a non negative, even and non decreasing function of modulus x, then the probability of modulus x greater than or equal to k is less than or equal to expectation of g x divided by g of k. You can see that, if I replace x by x minus mu and take g as the squared function that is x square, then this markov inequality gives the chebyshev’s inequality. In this one, if we replace x by x minus mu and g x is equal to x square, and then we get exactly this one, so this is a more general inequality of the same type. Let us take some example to explain this, the number of customers who visit a store everyday is a random variable x with mean say 18 and standard deviation is equal to 2.5. With what probability can we assert that there will be between 8 to 28 customers? That means we are interested in an estimate of the probability that the number of customers is between 8 to 28. If we want to utilize the chebyshev’s inequality here, the mean is given to be 18, so this becomes probability of x minus 18 lying between minus 10 to 10; that means probability of modulus x minus 18 less than or equal to 10. So, by chebyshev’s inequality it is greater than or equal to 1 minus sigma square by 100 that is equal to 1 minus 6.25 by 100 or 15 by 16. You can see that this is very high probability for this particular event to be true. So, although, here we do not have full information about the probability distribution of the random variable, but we can tell about certain probability. Show that for 40,000 flips of a fair coin, the probability is at least 0.99 that the proportion of heads will be between 0.475 to 0.525. So, here if we consider x to be the number of heads, then x follows binomial 40000 and half, we are interested in probability of x by n lying between 0.475 to 0.525, now n is here 40000, so this is equal to probability that x lie between 19000 to 21000. Now, here mean of this distribution here is n p that is equal to 20000, and the standard deviation is square root n p q which is 100, so this probability is then modulus of x minus 20000 less than or equal to 1000, which is greater than or equal to 1 minus sigma square by k square, which is 1 minus 1 by 100 that is equal to 99 by 100. So, if it is a fair coin the probability that the proportion of heads is between 0.475, that is 47.5 percent to 52.5 percent heads are there in 40000 tosses of a fair coin, the probability is at least 0.99. Here, we can even get an appropriate exact value of this, but that is too complicated, so it is a simple solution for a complex looking situation. Independent observations are available from a population with mean mu and variance 1. How many observations are needed in order that probability is at least 0.9 that the mean of observations differs from mu by not more than 1? So, if we are looking at expectation of x bar that is equal to 1 by n expectation of x 1 plus x 2 plus x n that is equal to mu. If we are looking at variance of x bar that is variance of x 1 plus x 2 plus x n by n then, it is equal to 1 by n square sigma variance of x I each of this is 1, so it becomes 1 by n. So, probability of modulus x bar minus mu should be less than 1, that is the mean of observations differs from mu by not more than 1, that is probability of modulus x bar minus mu less than 1, so by chebyshev’s inequality it is greater than or equal to 1 minus 1 by n. Now, we want it to be more than 0.9, so this means n should be greater than 10, so we need at least 10 observations that probability that mean of observations differs from mu by not more than 1. Let us look at some examples of calculation of certain distributions and the moments and other characteristics that we have discussed so far. So, let us consider one example. Let x be a continuous random variable with the cumulative with the probability density function given by 1 by beta 1 minus, we will analyze various properties of this distribution. So, if we look at say, what are the values of alpha and beta for which this is a valid probability distribution, let us consider, so first thing we observe that in order that this is a non negative function beta should be positive and 1 minus modulus of x minus alpha by beta. So, here if you look at modulus of x minus alpha by beta is less than 1 therefore, this quantity is always positive, therefore beta has to be positive in order that this this is a density. Now, we look at the integral of the density over the region, in order to resolve this in a simple way, we can consider the transformation y is equal to x minus alpha by beta. Now, this is a symmetric function, so it becomes integral 0 to 1 1 minus y d y and it is equal to twice 1 minus y whole square by 2 from 0 to 1 and this is simply equal to 1. Therefore, there is no restriction on the range of alpha, alpha can be any real number and beta should be positive real number in order that, this is a valid probability density function. If we want to look at the shape of this distribution, in fact we can see from here, for y positive, this is 1 minus y and for y negative it becomes 1 plus y, so the value at y is equal to plus 1 and minus 1 is 0 and at y is equal to 0 it is 1. So, if we consider this point as alpha this has alpha minus beta, this has alpha plus beta, and then the shape of the distribution is triangular, so this is basically triangular distribution. Therefore, easily you can see that, mean and median of this distribution must be alpha, expectation of x must be alpha, the median of x must be alpha. Therefore, we can consider higher order central moments, let us consider say variance. So, variance of x is equal to expectation of x minus mu square that is equal to expectation of x minus alpha Square. So, this is equal to integral alpha minus beta to alpha plus beta x minus alpha square 1 by beta 1 minus modulus x minus alpha by beta d x. So, in order to evaluate this, we can consider the same transformation y is equal to x minus alpha by beta, so after substitution this turns out to be integral from minus 1 to 1 beta square y square into 1 minus modulus y d y, as this is a even function this becomes 2 beta square integral 0 to 1 y square into 1 minus y d y. So, the integral of this is equal to twice beta square 1 by 3 minus 1 by 4. So, that is equal to after simplification beta square by 6, so the variance of this distribution is beta square by 6. You can see here that, if beta is a small value then, the variability little high is little low and if beta is a bigger value, then the variability will be more which is obvious here because, this is concentrated from alpha minus beta to alpha plus beta; so beta is large, this curve will increase further that means variability is increasing if beta is becoming smaller, then the variability is becoming less. We may also look at say, quantiles of order 1 by 4 and 3 by 4. So, if we consider the point, we can also consider the cumulative distribution function of this. So, if we calculate F x naturally for x less than alpha minus beta it should be 0 and for x greater than alpha plus beta this should be 1. So, we need to concentrate on the integral alpha minus beta to say x 1 by beta 1 minus modulus t minus alpha by beta d t, for t lying between alpha minus beta to alpha plus beta. So, by considering the transformation, this becomes minus 1 to x minus alpha by beta 1 minus modulus y d y that is why considering the transformation t minus alpha by beta is equal to y. Now, here there are two cases. If x minus alpha by beta is less than 0 in that case is less than alpha that means, x is less than alpha. So, since we are looking at this 1 less than 0 means, x is less than alpha. Yeah, so we should consider x minus alpha by beta less than 0, which is basically x is less than alpha, that means it is before the point of the symmetry. If that is so, then this is equal to integral minus 1 to x minus alpha by beta 1 minus y d y, which is equal to 1 plus y d y which is simply equal to 1 plus y square by 2 minus 1 to x minus alpha by beta. So, this is evaluated to 1 plus x minus alpha by beta square 1 by 2 1 plus x minus alpha by beta whole square by 2, so this is for alpha minus beta less than x less than or equal to alpha. If you look at this value at x is equal to alpha this is becoming 0, so this value will become exactly equal to half. So, if I am choosing a point x to be greater than or equal to alpha but less than alpha plus beta, then this will be integral half plus 0 to, sorry, alpha plus 1 by beta 1 minus t minus alpha by beta d t that is equal to half plus, so this becomes 0 x minus alpha by beta 1 minus modulus of y d y, that is equal to half plus integral 0 to x minus alpha by beta 1 minus y d y, that is equal to half plus 1 minus y whole square by 2 with a minus sign integral from 0 to x minus alpha by beta, that is equal to 1 minus, so it is half and at 0 this value is becoming half and 1 minus x minus alpha by beta whole square, that is equal to 1 minus 1 by 2 1 minus x minus alpha by beta square. Therefore, we can write the complete description of the c d f as it is 0, for x less than or equal to alpha minus beta. It is equal to half 1 plus x minus alpha by beta whole square alpha minus beta less than x less than or equal to alpha, it is equal to 1 minus half 1 minus x minus alpha by beta square. As a check, we can see that the value at the end points of each intervals match, because the function is continuous. In fact, the function is absolutely continuous, so if we look at the value at x equal to alpha minus beta here, this is becoming alpha minus beta, so minus 1 will come here. So, 1 minus 1 becomes 0, which is same as the value at x equal to alpha minus beta from the left hand side. If we look at the value at x equal to alpha here, this value is becoming 0, so you are getting half here, if we put x equal to alpha here this is half, so 1 minus half is half, so the values are matching. If we look at the value at x equal to alpha plus beta of this expression, then this value is 0; that means, this is equal to 1 and the value here at x equal to alpha plus beta is also 1. So, this is satisfying the conditions for the c d f. If we look at the point, where the value becomes say 1 by 4, then we need to look at this one, because the probability at x equal to alpha is equal to half. So, 1 by 4 will be naturally in this interval alone. So, this means that half 1 plus Q 1 minus alpha by beta whole square is equal to 1 by 4. So, we can do the simplification here, 1 plus Q 1 minus alpha by beta is equal to 1 by root 2 that means Q 1 minus alpha by beta is equal to 1 minus 1 by root 2 so Q 1 becomes alpha plus beta into 1 minus 1 by root 2. In a similar way, we can calculate Q 3 also, Q 2 is of course alpha that is the median of this distribution. That is all in today’s lecture, we will be considering special discrete and continuous distributions in the upcoming classes.