Tip:
Highlight text to annotate it
X
In the last lecture, I have introduced certain characteristics of the Probability Distributions
such as, its expected value that is the mean, variance and also some higher order moments.
So, the mean of the variance distribution denotes the measure of central tendency or
the measure of location for a distribution, the variance or the standard deviation denote
a tell the about the variability of the values of the distribution.
We may also be interested in some further characteristics of the probability distribution
such as its skewness.
Let us define what skewness is, so consider a distribution of this type. Let us consider
another distribution and let us consider... Now, if we compare the shape of the curves
the first reaction after looking at the first curve is that, it is symmetric about a certain
axis say mu, if we look at the second curve then, there is lot of concentration of probability
on the left hand side and there is a long tail on the right side. That means, there
is a long tail to the right of the mean of the distribution whereas, if we look at the
third curve here, then, there is a long tail to the left and that means there are more
concentration of values on the right side, and there is a large variation towards the
left of the mean, so we will call this as a symmetric curve. We considered the definition
of symmetric distribution earlier, that is probability, that x is less than or equal
to x is equal to probability x greater than or equal to x for a certain. Suppose, if it
is a symmetric about 0 then we should have this kind of thing, so if it not symmetric
we will call it skewed, so this one will be called positively skewed and this one we will
call as negatively skewed distribution. A measure for this can be defined in terms of
say, let me call it beta 1 that is equal to mu 3 divided by mu 2 to the power 3 by 2,
we are considering this revision by mu 2 to the power 3 by 2 that is sigma cube, where
sigma denotes the standard deviation of a distribution, this is to make it free from
the units of measurement. So, if beta 1 is 0 we have symmetry, if it is greater than
0 it is positively skewed, if it is less than 0 it is negatively skewed.
We also define another characteristic called peakedness of a distribution. Compare the
curves, so if we look at it this has a high peak, this is somewhat in the middle or average
or normal and this is more a flat curve, we consider this property as the kurtosis. So,
this we call as a normal peak, this is called leptokurtic, that is high peak and this is
called platikurtic, that is the flat peak, a measure of kurtosis or peakedness is defined
to be beta 2 is equal to mu 4 by mu 2 square minus 3; so, if it is 0 we have a normal peak,
if it is greater than 0 it is leptokurtic, and if it is less than 0 we call it platikurtic.
The peak of a normal distribution which will be defined later on, that, will have the coefficient
beta 2 is equal to 0, so peak of any distribution is actually compared with the peak of a normal
distribution. Now, we have already seen that sometimes moments
of the distributions may not exist or a lower order moment may exist, but higher order moments
may not exist. We have a general result in this direction.
If the moment of order say t, where t is greater than 0 exists, then the moment of order s
where 0 is less than s is less than t exists for a given random variable x. So, if a positive
order moment exists, then all lower ordered positive moments will exist for the given
random variable. Let us look at the proof of this one. For
convenience, let me take x to be a continuous random variable, let x be a continuous random
variable with say probability density function f x. Let us write down expectation of modulus
x to the power s, this is equal to integral minus infinity to infinity modulus x to the
power s f x d x; this one we split into two regions, modulus x less than or equal to 1
and modulus x greater than 1. In this region where modulus x is less than or equal to 1,
I can replace this by 1, so this is less than or equal to integral of modulus x less than
or equal to 1 f x d x which is nothing but, the probability that modulus x is less than
or equal to 1. Then modulus x is greater than 1, if I replace this power s by power t I
will get a bigger quantity, so this becomes modulus x to the power t f x d x, now this
is less than the expectation of modulus. So, the first term itself this is less than or
equal to 1, and this is less than expectation of modulus x to the power t.
Since we are assuming that the moment of order t exists, this is finite and therefore, expectation
of modulus x to the power s is finite that means the moment of order s exists, this is
the condition for existence of the moment of order s.
Now, sometimes when the moment do not exist, then it may be difficult to find out the measures
of central tendency or measure of location or measure of variability or say, measure
of symmetry or kurtosis, etcetera. So, we may look at the points on the distribution
itself, which divide the curve into certain regions with certain proportions, these are
called quantiles of the distribution.
To explain the concept, let us consider some distribution with a particular shape. Suppose,
I have a point here let us call it say a, and if I say probability that x is less than
or equal to a, is equal to p, that means this ((w8age)) is p and this ((w8age)) is 1 minus
p then, this is called the pth quantile, it is easy to explain the concept of median.
For example, that it divides the distribution into two parts, the probability is half in
this portion, the probability is half in this portion, so roughly speaking ((a p the )) quantile
is the point up to which the probability of random variable taking a value is p, and the
probability beyond that is 1 minus p. However, to take care of the discrete distributions
we give the formal definition of a quantile as follows: A number, let me call it Q p satisfying
probability x less than or equal to Q p greater than or equal to p, and probability x greater
than or equal to Q p greater than or equal to 1 minus p, for 0 less than p less than
1 is called p th quantile or quantile of order p of the distribution of x.
So, obviously if F is absolutely continuous distribution function, then you will have
F of Q p is equal to p, that is there will be a unique quantile, so Q half is called
median of x we use a notation m also, Q 1 by 4 Q half and Q 3 by 4, these are called
quartiles of x.
We also define say, Q 1 by 10 and Q 2 by 10 and so on Q 9 by 10 etcetera, these are called
deciles. You can write Q 1 by 100 and Q 2 by 100 and so on, these are called percentiles
that means, if we want to divide the distribution into ten parts, the distribution into four
parts, the distribution into hundred parts etcetera, so they are having different notations.
In various problems, we are interested in different kind of quantiles. For example,
in various studies we may be interested in percentage of the people living below poverty
line etcetera, so this is some particular percentile; but, suppose we say 25 percent
people lie below a certain thing or 75 percent of the items are above something, then it
becomes quantiles. Let us explain through certain examples. Let
us consider f x is equal to 1 by pi 1 by 1 plus x square, minus infinity less than x
less than infinity. We have seen that for this distribution the mean does not exist;
therefore, there is no question of higher order moments also existing. However, if we
look at F x then it is equal to 1 by pi tan inverse x minus pi by 2 and if we calculate,
no this is plus pi by 2. So, if we calculate say the point F x m is equal to half, then
this is corresponding to simply x or m is equal to half m is equal to 0, which is clear
if we plot this distribution, so it is symmetric about the point 0. So, if the distribution
is symmetric about a given point, then that point will be the actually the median of the
distribution. Now, we can also calculate the quantiles here.
Suppose, we look at F x Q 1 is equal to 1 by 4 then, this gives 1 by pi tan inverse
x plus pi by 2 is equal to 1 by 4, this means tan inverse x is equal to minus pi by 4, that
means x is equal to minus 1. Similarly, if I calculate, so this is Q 1 that is the first
quantile in this distribution is minus 1, second quantile that is median is 0, and a
similar way if I look at F x Q 3 is equal to 1 by 4 3 by 4 then, this will give me Q
3 is equal to plus 1. So, we are able to determine the measures
on the curve, so it roughly tells that 25 percent of the observations lie below minus
1 and 25 percent of the observations lie between minus 1 and 0, 25 percent of the observations
lie between 0 and 1, and 25 percent of the observations lie beyond 1. So, it has a very
long tail because, in fact 50 percent of the probability is between minus 1 to 1 and rest
50 percent is dispersed over minus infinity to minus 1 and 1 to infinity.
Let us consider say, probability x is equal to minus 2 and probability x is equal to 0
is say 1 by 4. Probability say, x is equal to 1 is equal to say 1 by 3, and probability
x is equal to 2 is suppose 1 by 6, so this is the discrete distribution concentrated
on 4 points minus 2 and 0 and 1 and 2, so if we apply the definition of the median,
probability x less than or equal to m is greater than or equal to half, and probability x greater
than or equal to m is also greater than or equal to half, so median must satisfy this
condition. So, you look at which points satisfy this
condition. Now, here the probability of x being less than or equal to 0 is half because,
probability of x equal to minus 2 and probability x equal to 0 both are 1 by 4. So, as soon
as we approach 0 if we look at the up to at minus 2 you have 1 by 4 at 0, you have 1 by
4 at 1, you have 1 by 3, and at 2 you have 1 by 6. So, any point after 0 this will have
the condition probability x less than or equal to m greater than or equal to half satisfied.
If we look at the second condition here, the probability is 1 by 3. Here, the probability
is 1 by 1 by 6 and 1 by 3, so if you add this becomes half. So, if I consider m to be any
point before 1 then, probability that x greater than or equal to m is greater than or equal
to half is satisfied. In fact, if I consider probability x greater
than or equal to 1 then, it is equal to probability x plus x equal to 1 plus probability x equal
to 2, which is equal to half. So, any point which is less than or equal to 1 will satisfy
the second condition. Any point which is greater than or equal to 0 will satisfy the first
condition. So any m, such that 0 less than or equal to m less than or equal to 1 satisfies
the two conditions. Hence, m belonging to 0 to 1 is a median, so this is a case where
the median is not unique. So, in particular, in discrete distributions, we may not have
a unique quantile; in the continuous random variable case there will be a unique quantile.
There is another function called moment generating function, which tells something about the
distribution. So, let us consider that, let x be a random variable, the function m x at
t is defined to be expectation of e to the power t x is called moment generating function
of the random variable x provided it exists; provided the right hand side exists for some
t, not equal to 0. As you can see here, at t is equal to 0 this
will always exist. So, for t not equal to 0 it should exist, that means in a neighborhood
of the origin, if it exist, then we say that the moment generating function is well defined.
We may have a case that moment generating function may not exist.
Let us consider, suppose you take f x is equal to 1 by pi 1 by 1 plus x square. So, if I
look at expectation of e to the power t x, then it is equal to minus infinity to infinity
1 by pi 1 by 1 plus x square e to the power t x d x. So, if you look at this one then,
this does not always exist, because in the numerator you have an exponential term, and
in the denominator you have only polynomial. In fact, we have seen that, the mean itself
does not exist, that means if I put here x in place of e to the power t x, that itself
does not exist, for t not equal to 0. Let us take another example say, f x is equal
to half e to the power minus x by 2, for x greater than 0 and 0 for x less than or equal
to 0. Let us consider m x t, so it is equal to integral 0 to infinity half e to the power
t x e to the power minus x by 2 d x. Now, this you can combine, so it becomes 0 to infinity
half e to the power minus half 1 minus 2 t x d x that is equal to 1 by 1 minus 2 t, for
t less than half, so here the moment generating function exists in a neighborhood of 0.
The point that why we are interested in a function called moment generating function
is that it gives first of all, it uniquely determines a distribution also, it gives lot
of information about the moments, that is why the name moment generating function is
there, let us look at that thing.
So, we have the following result. The moment generating function uniquely determines a
c d f and conversely, if the moment generating function exists, it is unique. If the moment
generating function m x t exists for modulus t less than t naught, that means in a neighborhood
of 0, the derivatives of all orders exist at t equal to 0 and can be evaluated under
the integral sign, integral sign or you can say, summation sign depending upon whether
the discrete or continuous distribution is there.
So, derivative of the moment generating function of order k, at t equal to 0 gives you the
k th non central moment, that is why it is known as a moment generating function. You
can see this fact, if I say that the moment generating function exists, then I can consider
the expansion of e to the power t x in a maclaurin series as 1 plus t x by 1 factorial t x square
by 2 factorial and so on, this is equal to 1 plus t by 1 factorial mu 1 prime t square
by 2 factorial mu 2 prime, etcetera. That means, coefficient of t to the power k by
k factorial is the k th order non central moment for k equal to 1, 2 and so on.
Let us consider this example, where m x t is equal to 1, 1 divided by 1 minus 2 t, for
t less than half.
Let us consider derivative of this, so that is equal to minus 1 by 1 minus 2 t square
then minus, so it becomes plus and then you are multiplying 2 you put t equal to 0 here,
that is equal to 1 by 4 and 2, so it is half. Let us check here directly from the distribution
if I calculate expectation of x, this is equal to integral x by 2 e to the power minus x
by 2 d x 0 to infinity. So, if we integrate this by parts or if we use the gamma function,
then it is gamma 2 divided by 2 here, so you will get 2 that is equal to 1 by 2 square
and then, this is ((this will be 2 here)) 1 by 1 minus 2 t whole square and then you
have 2 here. Let mu k prime be the moment sequence of random
variable x. If the series sigma mu k prime by k factorial t to the power k converges
absolutely for some t greater than 0, then mu k prime uniquely determines the c d f F
of the random variable x. Sometimes, we have partial information about
the probability distribution of a random variable. So, we may not have substantial knowledge
about probabilities of various intervals or random variable taking value less than something
or greater than something. So, in such cases, we have certain probability inequalities which
are useful if we know only a certain moment say, mean or variance or one particular moment.
So, these are known as one of the first one in this direction is called chebyshev’s
inequality.
Let x be a random variable with mean mu, and variance sigma square then, for any k positive
probability of modulus x minus mu greater than or equal to k is less than or equal to
sigma square by k square, so you can see this denotes the probability of x lying in certain
interval or lying outside a certain interval, we do not have full information about the
random variable except its mean and variance. Nevertheless, we are able to give certain
bound for this probability to prove this, let us consider, x to be continuous with certain p d f f x. So, let us consider
the expression for variance this is equal to expectation of x minus mu square, which
is equal to integral x minus mu square f x d x.
Now, this particular integral is greater than or equal to modulus x minus mu greater than
or equal to k, this is because the integrant is non negative. So, if you reduce the reason
of integration, the value will become smaller. Now, on this reason x minus mu whole square
is greater than or equal to k square, so we can replace by that this is nothing but, probability of modulus
x minus mu greater than or equal to k. So, as a consequence, probability of modulus
x minus mu greater than or equal to k is less than or equal to sigma square by k square
or you can write down the alternative forms of this inequality by taking the complementary
event here.
So, you will have 1 minus probability of modulus x minus mu greater than or equal to k is greater
than or equal to 1 minus sigma square by k square, or you can write probability of modulus
x minus mu less than k is greater than or equal to 1 minus sigma square by k square.
Sometimes, the form is written in this fashion; probability of modulus x minus mu less than
k sigma is greater than or equal to 1 minus 1 by k square or probability of modulus x
minus mu greater than or equal to k sigma is less than or equal to 1 by k square.
A more general inequality of the same type is called markov’s inequality. Let x be
a random variable and g a non negative, even and non decreasing function of modulus x, then the probability
of modulus x greater than or equal to k is less than or equal to expectation of g x divided
by g of k. You can see that, if I replace x by x minus mu and take g as the squared
function that is x square, then this markov inequality gives the chebyshev’s inequality.
In this one, if we replace x by x minus mu and g x is equal to x square, and then we
get exactly this one, so this is a more general inequality of the same type.
Let us take some example to explain this, the number of customers who visit a store
everyday is a random variable x with mean say 18 and standard deviation is equal to
2.5. With what probability can we assert that there will be between 8 to 28 customers? That
means we are interested in an estimate of the probability that the number of customers
is between 8 to 28. If we want to utilize the chebyshev’s inequality here, the mean
is given to be 18, so this becomes probability of x minus 18 lying between minus 10 to 10;
that means probability of modulus x minus 18 less than or equal to 10. So, by chebyshev’s
inequality it is greater than or equal to 1 minus sigma square by 100 that is equal
to 1 minus 6.25 by 100 or 15 by 16. You can see that this is very high probability for
this particular event to be true. So, although, here we do not have full information about
the probability distribution of the random variable, but we can tell about certain probability.
Show that for 40,000 flips of a fair coin, the probability is at least 0.99 that the
proportion of heads will be between 0.475 to 0.525. So, here if we consider x to be
the number of heads, then x follows binomial 40000 and half, we are interested in probability
of x by n lying between 0.475 to 0.525, now n is here 40000, so this is equal to probability
that x lie between 19000 to 21000. Now, here mean of this distribution here is
n p that is equal to 20000, and the standard deviation is square root n p q which is 100,
so this probability is then modulus of x minus 20000 less than or equal to 1000, which is
greater than or equal to 1 minus sigma square by k square, which is 1 minus 1 by 100 that
is equal to 99 by 100. So, if it is a fair coin the probability that the proportion of
heads is between 0.475, that is 47.5 percent to 52.5 percent heads are there in 40000 tosses
of a fair coin, the probability is at least 0.99. Here, we can even get an appropriate
exact value of this, but that is too complicated, so it is a simple solution for a complex looking
situation.
Independent observations are available from a population with mean mu and variance 1.
How many observations are needed in order that probability is at least 0.9 that the
mean of observations differs from mu by not more than 1? So, if we are looking at expectation
of x bar that is equal to 1 by n expectation of x 1 plus x 2 plus x n that is equal to
mu. If we are looking at variance of x bar that is variance of x 1 plus x 2 plus x n
by n then, it is equal to 1 by n square sigma variance of x I each of this is 1, so it becomes
1 by n. So, probability of modulus x bar minus mu
should be less than 1, that is the mean of observations differs from mu by not more than
1, that is probability of modulus x bar minus mu less than 1, so by chebyshev’s inequality
it is greater than or equal to 1 minus 1 by n. Now, we want it to be more than 0.9, so
this means n should be greater than 10, so we need at least 10 observations that probability
that mean of observations differs from mu by not more than 1. Let us look at some examples
of calculation of certain distributions and the moments and other characteristics that
we have discussed so far. So, let us consider one example.
Let x be a continuous random variable with the cumulative with the probability density
function given by 1 by beta 1 minus, we will analyze various properties of this distribution.
So, if we look at say, what are the values of alpha and beta for which this is a valid
probability distribution, let us consider, so first thing we observe that in order that
this is a non negative function beta should be positive and 1 minus modulus of x minus
alpha by beta. So, here if you look at modulus of x minus alpha by beta is less than 1 therefore,
this quantity is always positive, therefore beta has to be positive in order that this
this is a density. Now, we look at the integral of the density
over the region, in order to resolve this in a simple way, we can consider the transformation y is equal
to x minus alpha by beta. Now, this is a symmetric function, so it becomes integral 0 to 1 1
minus y d y and it is equal to twice 1 minus y whole square by 2 from 0 to 1 and this is
simply equal to 1. Therefore, there is no restriction on the range of alpha, alpha can
be any real number and beta should be positive real number in order that, this is a valid
probability density function. If we want to look at the shape of this distribution, in
fact we can see from here, for y positive, this is 1 minus y and for y negative it becomes
1 plus y, so the value at y is equal to plus 1 and minus 1 is 0 and at y is equal to 0
it is 1. So, if we consider this point as alpha this
has alpha minus beta, this has alpha plus beta, and then the shape of the distribution
is triangular, so this is basically triangular distribution. Therefore, easily you can see
that, mean and median of this distribution must be alpha, expectation of x must be alpha,
the median of x must be alpha. Therefore, we can consider higher order central moments,
let us consider say variance. So, variance of x is equal to expectation of x minus mu
square that is equal to expectation of x minus alpha Square.
So, this is equal to integral alpha minus beta to alpha plus beta x minus alpha square
1 by beta 1 minus modulus x minus alpha by beta d x. So, in order to evaluate this, we
can consider the same transformation y is equal to x minus alpha by beta, so after substitution
this turns out to be integral from minus 1 to 1 beta square y square into 1 minus modulus
y d y, as this is a even function this becomes 2 beta square integral 0 to 1 y square into
1 minus y d y. So, the integral of this is equal to twice
beta square 1 by 3 minus 1 by 4. So, that is equal to after simplification beta square
by 6, so the variance of this distribution is beta square by 6. You can see here that,
if beta is a small value then, the variability little high is little low and if beta is a
bigger value, then the variability will be more which is obvious here because, this is
concentrated from alpha minus beta to alpha plus beta; so beta is large, this curve will
increase further that means variability is increasing if beta is becoming smaller, then
the variability is becoming less. We may also look at say, quantiles of order
1 by 4 and 3 by 4. So, if we consider the point, we can also consider the cumulative
distribution function of this. So, if we calculate F x naturally for x less than alpha minus
beta it should be 0 and for x greater than alpha plus beta this should be 1. So, we need
to concentrate on the integral alpha minus beta to say x 1 by beta 1 minus modulus t
minus alpha by beta d t, for t lying between alpha minus beta to alpha plus beta. So, by
considering the transformation, this becomes minus 1 to x minus alpha by beta 1 minus modulus
y d y that is why considering the transformation t minus alpha by beta is equal to y.
Now, here there are two cases. If x minus alpha by beta is less than 0 in that case
is less than alpha that means, x is less than alpha. So, since we are looking at this 1
less than 0 means, x is less than alpha. Yeah, so we should consider x minus alpha by beta
less than 0, which is basically x is less than alpha, that means it is before the point
of the symmetry. If that is so, then this is equal to integral minus 1 to x minus alpha
by beta 1 minus y d y, which is equal to 1 plus y d y which is simply equal to 1 plus
y square by 2 minus 1 to x minus alpha by beta.
So, this is evaluated to 1 plus x minus alpha by beta square 1 by 2 1 plus x minus alpha
by beta whole square by 2, so this is for alpha minus beta less than x less than or
equal to alpha. If you look at this value at x is equal to alpha this is becoming 0,
so this value will become exactly equal to half. So, if I am choosing a point x to be
greater than or equal to alpha but less than alpha plus beta, then this will be integral
half plus 0 to, sorry, alpha plus 1 by beta 1 minus t minus alpha by beta d t that is
equal to half plus, so this becomes 0 x minus alpha by beta 1 minus modulus of y d y, that
is equal to half plus integral 0 to x minus alpha by beta 1 minus y d y, that is equal
to half plus 1 minus y whole square by 2 with a minus sign integral from 0 to x minus alpha
by beta, that is equal to 1 minus, so it is half and at 0 this value is becoming half
and 1 minus x minus alpha by beta whole square, that is equal to 1 minus 1 by 2 1 minus x
minus alpha by beta square.
Therefore, we can write the complete description of the c d f as it is 0, for x less than or
equal to alpha minus beta. It is equal to half 1 plus x minus alpha by beta whole square
alpha minus beta less than x less than or equal to alpha, it is equal to 1 minus half
1 minus x minus alpha by beta square. As a check, we can see that the value at the
end points of each intervals match, because the function is continuous. In fact, the function
is absolutely continuous, so if we look at the value at x equal to alpha minus beta here,
this is becoming alpha minus beta, so minus 1 will come here. So, 1 minus 1 becomes 0,
which is same as the value at x equal to alpha minus beta from the left hand side. If we
look at the value at x equal to alpha here, this value is becoming 0, so you are getting
half here, if we put x equal to alpha here this is half, so 1 minus half is half, so
the values are matching. If we look at the value at x equal to alpha plus beta of this
expression, then this value is 0; that means, this is equal to 1 and the value here at x
equal to alpha plus beta is also 1. So, this is satisfying the conditions for the c d f.
If we look at the point, where the value becomes say 1 by 4, then we need to look at this one,
because the probability at x equal to alpha is equal to half. So, 1 by 4 will be naturally
in this interval alone. So, this means that half 1 plus Q 1 minus alpha by beta whole
square is equal to 1 by 4. So, we can do the simplification here, 1 plus Q 1 minus alpha
by beta is equal to 1 by root 2 that means Q 1 minus alpha by beta is equal to 1 minus
1 by root 2 so Q 1 becomes alpha plus beta into 1 minus 1 by root 2. In a similar way,
we can calculate Q 3 also, Q 2 is of course alpha that is the median of this distribution.
That is all in today’s lecture, we will be considering special discrete and continuous
distributions in the upcoming classes.