Tip:
Highlight text to annotate it
X
Today, we will start looking at inference problems inference about multivariate normal
distribution. So specifically, we have already introduced, what is a multivariate normal
distribution, and what are the characterizing parameters of a multivariate normal distribution.
So, today we will address the problem of inference about the multivariate mean vector mu and
the covariance matrix sigma.
So, we have actually the following setup that it is a basically random sampling from multivariate
normal distribution. So suppose, we have a population, multivariate normal population
N p mu sigma. So, what we have here is this mu to be giving the mean vector and sigma
to be the covariance matrix, we assume that sigma is positive definite. Now, the elements
in mu; this mu vector is unknown, and so is the sigma matrix. So, what we do is, we take
random samples from this multivariate population, and then try to build up the inference procedure
concerning the unknown mean vector mu and the unknown covariance matrix sigma. So, what
we do is, we take a random sample X 1 X 2 X n be a random sample from this multivariate normal population and
then using this random sample X 1 X 2 X n the random sample of size n will been the
inference procedure about the mean vector mu and the covariance matrix sigma.
Now, given this random sample the joint pdf of X 1 X 2 X n remember that it is a random
sample and hence this X 1 X 2 X n are independent. So, that the joint distribution the joint
pdf of X 1 X 2 X n is given by the following that i equal to 1 to n, then the pdf of the
respective components X 1 X 2 X n. So, that this is f X i x i. So, that this is equal
to i equal to 1 to n 2 pi to the power minus p by 2 determinant of sigma to the power minus
half, then exponent minus half x i minus mu transpose sigma inverse x i minus mu. So,
this is …what is the joint pdf. So, this can be compactly written as 2 pi to the power
minus n p by 2 determinant of sigma to the power minus n by 2 then exponent minus half
summation i equal to 1 to n x i minus mu transpose sigma inverse x i minus mu. So, the first
thing about this random sample X 1 X 2 X n from the multivariate normal distribution,
the first question that; we are going to answer is that what is the sufficient statistic that
is based on this random sample X 1 X 2 X n.
So, we move on to the sufficiency. So, we will be deriving sufficient statistic under
various possibilities. Let us consider the first case, case one which is the most general
case mu and sigma are unknown. So, we have both the two parameters characterizing this
multivariate normal population, the mu vector and the covariance matrix sigma both of them
are unknown. Now, let us consider this particular term which is present in the exponent this
is a term which is present in the exponent, let us see how we can manipulate this particular
term this is summation i equal to 1 to n x i minus mu transpose sigma inverse x i minus
mu. So, we introduce an x bar term. Now, note that this derivation of sufficient
statistic for the multivariate normal population goes along exactly in the same line as to
what we usually do when we have a univariate normal population. So, we introduce this x
bar vector which is the sample mean vector in this following way sigma inverse x i minus
x bar the same adjustment in the second term as well. So, this now, we will split it into
two terms this is one term here and this is the second term here and then take sigma inverse
multiplication term by term. So, the first term will be x i minus x bar transpose sigma
inverse x i minus x bar x i minus x bar and then this term multiplied by sigma inverse
into this particular term now, that is a term which is independent of i and hence we will
have n times x bar minus mu transpose sigma inverse x bar minus mu, and then the cross
product term of this term multiplied by this and this second term out here and the two
terms would be similar the two cross product terms and what will be having is the following
that it is summation i equal to 1 to n x i minus x bar transpose sigma inverse x bar
minus mu. Now, this is if we look at the two factor
here it is basically coming from the cross product, because we having two terms which
are similar just differing by transpose and the two terms is same. Now, what is going
to happen with this particular term is the following. Let me put this as equation number
1. Now, if we look at this expression here, what is that going to give us note that this
summation i equal to 1 to n x i minus x bar transpose sigma inverse x bar minus mu. Now,
this term Now, this part here is independent of i and hence it can stay outside the summation.
So, what we well be having is summation i equal to 1 to n x i minus x bar transpose
and then that is post multiplied by this sigma inverse x bar minus mu which is independent
of i. Now, what is going to happen when we look at the summation here, it is going to
give us 0 , because it is the deviation from this mean vector mu x bar vector sample mean
and hence this is going to give us n times x bar transpose minus n times x bar transpose.
So, this is n times x bar transpose minus this x bar transpose times sigma inverse x
bar minus mu and that is equal to 0 . So, this cross product term in equation number
1 this specifically. So, this term is going to vanish, because this is equal to 0.
So, we will have thus. So, this will imply that this summation i equal to 1 to n x i
minus mu transpose sigma inverse x i minus mu, this term is equal to summation i equal
to 1 to n x i minus x bar transpose sigma inverse x i minus this x bar. So, this is
basically that term of here, what we have this as a first term and the second term remains
as it is and the third term vanishes. So, we will have that n times x bar minus mu transpose
sigma inverse x bar minus mu. Now, this is a scalar quantity and hence we can write this
as trace of this term here summation i equal to 1 to n x i vector minus x bar vector transpose
sigma inverse into x i vector minus this x bar vector leave the second term as it is
n times x bar minus mu transpose sigma inverse x bar minus this mu. Now, trace of a plus
b is trace a plus trace b and hence we can take the trace inside.
So, we can write this as a summation i equal to 1 to n trace of this term which is inside
the summation. So, this is x i vector minus, this x bar vector transpose sigma inverse
x i minus this x bar vector leave the second term as it is; so, x bar minus mu vector transpose
sigma inverse x bar vector minus this mu vector. Now, here we have trace of this particular
quantity which is a scalar quantity. So, what we will do is we will use the result that
trace of a b equal to trace of b a, where we considered this to be a first term and
this to be second term. So, trace of a b would be trace of b a.
So, this would be given by summation i equal to 1 to n trace of sigma inverse x i vector
minus x bar vector that multiplied by this particular vector out here. So, that is x
i minus this x bar transpose second term is left untouched sigma inverse x bar minus this
mu. Now, this sigma inverse is something which is actually independent of this i index out
here, we will once again make use of the fact that trace of a plus b is trace of a plus
trace of b and hence we can take this trace outside also. So, we will write this as trace
of sigma inverse is independent of this i index and hence it can be taken outside the
summation. So, what we have is trace of sigma inverse i equal to 1 to n x i vector minus
x bar vector that multiplied by x i vector minus, this x bar vector transpose that plus
the second term carried forward as it is. So, this is x bar vector transpose sigma inverse
x bar minus this mu vector. Now, what is this particular term, if we look
at this particular term here carefully it is nothing but a constant multiplier of the
sample variance covariance matrix. So, we can write that as trace of sigma inverse times
n minus 1 times s that plus n times x bar vector minus this mu vector transpose sigma
inverse x bar minus this mu vector, where we have used the notation that n minus 1 times,
s is the sample variance covariance matrix with a divisor n minus 1.So, that; that is
equal to summation i equal to 1 to n x i vector minus this x bar vector into x i vector minus
x bar vector it is transpose. So, this term here which was actually in the exponent of
the joint pdf this was actually if we look at this term here. So, the term which we have
simplified and reduced it into that form is nothing but here sitting in the exponent.
So, we will go to that term there. So, we have that term as trace of sigma inverse n
minus 1 times s plus n times x bar minus mu transpose sigma inverse x bar minus mu. So,
we will use this term.
So, this would imply that the joint pdf of X 1 X 2 X n the random sample of size n from
that multivariate normal distribution with mean vector mu and covariance matrix sigma
is given by the following. That it is 2 pi to the power minus n p by 2 determinant of
sigma to the power minus n by 2, and then the exponent term is what we have done simplification
on. So, that is n minus 1 by 2 trace of sigma inverse s is that matrix, what we have defined
in the last slide. So, this minus n by 2 x bar vector minus this mu vector transpose
sigma inverse x bar minus mu. Now, when the joint pdf of X 1 X 2 X n is written in this
particular form, it is easy now, to use the Neyman Fisher factorization theorem in order
to derive what is the set of sufficient statistic, when we have both the parameters mu vector,
and the covariance matrix sigma to be unknown. Now, this is written in this form, I will
say what these are basically. So, I am writing it in this way f x 1 x 2 x n, now this let
me first complete this and then I will explain what these quantities are now, I am factorizing
this particular joint pdf of X 1 X 2 X n in the form that the first term is a function
only of x 1 x 2 x n. It does not include any parameter in this particular distribution
the parameters are mu vector and the covariance matrix sigma and the second part is written
as a function it is a function of mu and sigma and it is a function of x 1 x 2 x n the random
sample only through x bar and s. So, this is a function which depends on mu sigma x
1 x 2 x n only through x bar and s. So, what are these functions from the joint pdf that
we have written here. So, that let me put a number here equation number 2.
So, we will have this particular quantity here what we have defined. So, this f x 1
x 2 x n we will have to look for the terms into which depend on x 1 x 2 x n only. So,
it cannot involve in a mu it cannot involve any element of sigma. So, this as we see that
it is a constant function which is 2 pi to the power minus n p by 2 all other terms which
are present here depends on mu and sigma. So, it cannot be separated from there. And
the second function that we have introduced g mu sigma x bar and s is basically the rest
which is determinant of sigma to the power minus n by 2 and then exponent minus n minus
1 by 2 trace of sigma inverse s minus n by 2 x bar minus mu transpose sigma inverse x
bar minus this mu we close the bracket here. So, this is a term which depends on sigma,
sigma is here and so it is here it depends on mu, mu is present here and it depends on
x 1 x 2 x n the random sample only through the two statistic which is x bar vector the
sample mean vector and s which is a sample variance covariance matrix.
And hence if we use this particular factorization it is in the form of the Neyman Fisher factorization
and hence by Neyman Fisher factorization theorem by Neyman Fisher factorization. We have X
bar the random vector X bar the sample mean random variable vector and s is jointly sufficient for the unknown mu and
sigma. So, this basically is a set of jointly sufficient statistic for the unknown parameters
the mean vector mu and the covariance matrix sigma. So, if this is a set of sufficient
statistic as in any other statistical inference problem instead of having carrying X 1 X 2
X n the entire set of random samples we can carry this X bar vector and this S the sample
variance covariance matrix for all the inference procedure concerning this mu vector and covariance
matrix sigma. So, this gives us the desired compression of the data without loss of information.
So, this contains all the information that is present in X 1 X 2 X n about the parameter
vector mu and the matrix sigma the population variance covariance matrix. So, this will
be the jointly sufficient statistic, if we have both these two set of parameters rather
to be unknown. Now, we move on to the second possibility where sigma is known to us.
So, we look at case 2 sigma is known and this mu vector is unknown. So, it is a theoretical
possibility actually for all practical purposes what we usually have is both the mean vector
mu and the covariance matrix sigma to be unknown, but it is a theoretical possibility that we
have sigma matrix to be known, and the mu vector to be unknown. Then if this is the
only parameter set present in this population we will try to compress the data. So, that
it would contain every information that is present in the data about this vector mu.
Now, how are we going to do that, now note that we have this as a joint pdf of X 1 X
2 X n. So, we will use this particular form itself and then infer then use the Neyman
Fisher factorization in order to get to this official statistic and what is that, the joint
pdf of X 1 X 2 X n is once again what is going to be used in order to find the sufficient
statistic. So, this as we have seen is 2 pi to the power
minus n p by 2 determinant of sigma to the power minus n by 2 e to the power minus n
minus 1 by 2 trace of sigma inverse s this minus n by 2 x bar minus mu transpose sigma
inverse x bar minus this mu. Now, if this is this we write it as another function f
1 x 1 x 2 x n. So, we are trying to reduce it to the form of the Neyman Fisher factorization.
Now, the only unknown parameter in this case 2 is the mean vector. So, we are trying to
find out what is that particular function which depends on mu and also depends on x
1 x 2 x n through some statistic. Now, we will see that it can be reduced into this
particular form, where the first function f x 1 x 2 x n is a function of x1 x 2 x n.
So, that is basically a known function when we have x 1 x 2 x n to be given. So, this
now takes the form that it is 2 pi to the power minus n p by 2. Now, note that sigma
is now unknown to us, we have assume then case 2 that sigma is known and hence this
determinant of sigma is also known to us. So, this also is a function which is known
and if we look at in the exponent the first term is n minus 1 by 2 trace of sigma inverse
s. Now, sigma is known to us and so will be sigma inverse and s is a matrix which is based
on x 1 x 2 x n only. So, this also thus would be absorbed in the first function. So, this
is exponent to the power minus n by 2 trace of sigma inverse s.
So, this entire function here thus is known to us if x1 x 2 x n is known to us and this
g mu vector x bar. So, this is… thus the rest of it which is e to the power minus n
by 2 x bar minus this mu transpose sigma inverse x bar vector minus this mu vector. So, what
is the characteristics of this particular function, the second function it depends on
mu well that is clear and it depends on x 1 x 2 x n through this x bar vector. Note
that once again sigma is known to us and so will be sigma inverse, this is a function
which depends on the unknown parameter vector mu and its dependence on x 1 x 2 x n. Now,
is compressed through it is compressed in terms of this vector x bar.
So, this in this particular form the factorization once again will be using the Neyman Fisher
factorization by Neyman Fisher factorization. The Neyman Fisher factorization theorem will
tell us that, this x bar vector random variable. So, this x bar vector is now, sufficient for
the unknown mean vector. So, if both mu and sigma are unknown then x bar and s is jointly
sufficient for mu and sigma if sigma the population variance covariance matrix is known, then
x bar is sufficient for mu. Now, we look at the third possibility which once again is
a theoretical possibility.
Suppose in the third possibility mu is known say equal to mu naught vector and the covariance
matrix sigma is unknown. So, this is the set of …then we will try to find out what is
the sufficient statistic corresponding to the unknown this sigma matrix. Now, the joint
pdf we do not need to look at the form of the joint pdf that we had derived, we will
take the original form the joint pdf of X 1 X 2 X n is 2 pi to the power minus n p by
2 determinant of sigma to the power minus n by 2, then this e to the power minus half
summation i equal to 1 to n x i minus mu naught, because we have assumed that the mean vector
mu is known to us and that is equal to mu naught. So, we plug in this particular value
out here sigma inverse transpose x i minus this mu naught vector.
Now, if we have this particular form we are trying to derive remember, we are trying to
derive the sufficient statistic under the condition that mu is equal to mu naught is
known to us and sigma is unknown. So, we will once again have to use the Neyman Fisher factorization
theorem. So, let us write these terms as it is and then we will write this exponent term
as now, note that this particular term is scalar quantity and hence we can write trace
of that particular quantity. So, it is half trace of summation i equal to 1 to n x i minus
mu naught transpose sigma inverse x i minus mu naught. So, once again we will use this
trace of a plus b is trace a plus trace b and trace of a b equal to trace of b a. So,
using the two trace results, what we can have is this quantity exponent.
So, we will take first the trace inside and then we will consider this to be one term
this to be one term and along with sigma inverse this to be the second term and then flip these
two terms are trace of a b will be equal to trace of b a. That this… thus will reduce
to half trace of sigma inverse times summation i equal to 1 to n x i minus mu naught into
x i minus mu naught this transpose. So, we have taken this trace sigma inverse outside.
So, this once again we will write it as a function say f 2 x 1 x 2 x n now, our objective
is to use the Neyman Fisher factorization theorem and this the unknown quantity is sigma
now, so we will have to write it in terms of sigma and some statistic here. Let us write
this as some t we will have to define what it t where the first function f 2 x 1 x 2
x n. Now, would be a function of x 1 x 2 x n and
only known quantities what is that, if we look at this term only this is the term which
satisfies that particular criterion. So, this is 2 pi to the power minus n p by 2 and this
g sigma t is basically determinant of sigma to the power minus n by 2 and e to the power
minus half trace of sigma inverse I denote this entire quantity what we have here as
this t matrix. Now, that t matrix is a function of x 1 x 2 x n. So, this is t of x 1 x 2 x
n only, because mu naught is known to us. So, that is a known quantity.
And this t x 1 x 2 x n is nothing but this particular matrix which is summation i equal
to 1 to n x i vector minus, this mu naught vector that multiplied by this x i vector
minus mu naught vector its transpose. So, see what we have obtained, we have obtained
the following that this second factor is a function of the parameter matrix sigma, sigma
is appearing here, and so it is present here and the term and the quantity which is…
which we have denoted by t x 1 x 2 x n it depends on x 1 x 2 x n through this particular
matrix. So, that would imply by Neyman Fisher factorization, once again by Neyman Fisher
factorization this T X 1 X 2 X n which is summation i equal to 1 to n x i minus mu naught
x i minus mu naught transpose is sufficient for the unknown parameter matrix which is
sigma matrix. So, this is all the three possible cases when
we talk about sufficient see that is going to be derived on the basis of the random sample
x 1 x 2 x n. The first case we had seen that that is the most general case, that is the
most practical case actually that both the mean vector mu and the covariance matrix sigma
are unknown to us and in such a situation x bar vector and the sample variance covariance
matrix s is jointly sufficient for mu and sigma in the second case we had assumed that
sigma is known to us mu is unknown. So, if that is the situation then the x bar vector
is going to be sufficient for the unknown mean vector mu and in the third case we had
the mean vector mu to be known to us say equal to mu naught and in such a situation this
matrix T what we have this statistic actually is sufficient for the sigma matrix. That is
all about this sufficiency possibilities when we are considering a multivariate normal distribution.
Now, second point of interest that one usually is interested in is what is this unbiased
estimator corresponding to this unknown quantities. So, let us answer these question unbiased
estimators. Now, we have this mu and sigma are both unknown.
So, if both this mean vector mu and the covariance matrix sigma are unknown. So, this is corresponding
to the case 1, what we had considered for sufficient statistic. In such a situation
we have already derived for a general case of a multi random sampling for the multivariate
distribution not necessarily a multivariate normal distribution that, if we have this
X bar vector is a sample mean the random vector. So, that is this i equal to 1 to n from here,
we have seen earlier that expectation of this X bar vector is the population vector mu.
So, this would imply that X bar vector is an unbiased estimator of the population mean
vector which is mu. Now, what about the sigma matrix, we have once again in the earlier
lectures I had proved that if we consider S to be equal to 1 upon n minus 1 1 upon n
minus 1 summation i equal to 1 to n X i minus X bar into X i minus X bar transpose.
So, this is the sample variance covariance matrix with a divisor n minus 1, then we have
already proved this result that expectation of this S matrix is nothing but this sigma
matrix that is for a general case that we had proved not necessarily for a multivariate
normal distribution the result still holds true, if we have the underline distribution
to be a multivariate normal distribution. This would imply that S is an unbiased estimator
of the covariance matrix which is sigma. So, S is going to be an unbiased estimator of
sigma and X bar is going to be an unbiased estimator for mu. So, we will have the two,
if on the other hand we consider S the sample variance covariance matrix with a divisor
n, then that is definitely not going to remain an unbiased estimator. So, that is going to
be a biased estimator for the small sample however, for the large sample it is going
to be unbiased in the limit. So, this X bar is an unbiased estimator of the unknown mean
vector mu and S with a divisor n minus 1 is an unbiased estimator of sigma on the other
hand, if we have defined S n to be 1 upon n of the same quantity as, what we had considered
with this X i, this is what i just refer to that this quantity is biased estimator.
However, this S n is unbiased in the limit. So, these are the two unbiased estimator for
more general case that mu and sigma are unknown. The unbiased estimators in the other two situations
were say sigma matrix is known to us or mu vector is known to us can similarly be derived.
Now, let us move on to another important estimation procedure which is the
maximum likelihood estimator. So, we will be deriving a corresponding to this unknown
quantities mu and sigma, we take the most general case, where mu and sigma are unknown.
The other cases of mu being known or sigma being known can be tackled in a similar way.
So, suppose mu and sigma are unknown, we have still this X 1 X 2 X n random sample from that population N p mu sigma, then we
will look at the likelihood function. So, the Likelihood function is L mu sigma
given the observations x 1 x 2 x n and that is nothing but the joint pdf actually looked
upon as a function of mu and sigma given x 1 x 2 x n and in order to find out the maximum
likelihood estimators, we will try to see what mu and sigma basically maximizes this
particular likelihood function. So, as we had seen earlier the joint pdf which we are
now taking as the Likelihood function that is given by the following 2 pi to the power
minus n by 2 determinant of sigma to the power minus n by 2 and e to the power minus n minus
1 by 2 trace of sigma inverse s matrix this minus n by 2 x bar minus mu vector transpose
sigma inverse x bar minus this mu vector. So, this is now, the Likelihood function of
mu and sigma given x 1 x 2 x n. Now, we note the following that for a fixed sigma positive definite, we will
have the quantity which is here, now we are at the first step fixed this sigma to any
particular matrix any positive definite matrix and then we will try to see what is the value
of mu which would maximize this particular likelihood with respect to that particular
new vector. Now, sigma is positive definite, I will just extend at this particular bracket
little bit sigma is positive definite sigma inverse is also positive definite.
So, for that fixed sigma positive definite matrix sigma inverse is also positive definite.
So, what will be having is for that fixed sigma x bar minus mu transpose sigma inverse
x bar minus mu, this would be greater than or equal to 0 why, because sigma inverse is
positive definite actually that would be greater than 0 , if we look at that. So, this is greater
than equal to 0 , an 0 if x bar is equal to mu only, because sigma is positive definite.
So, this sigma inverse being positive definite will imply that this quadratic form will be
equal to 0 only, if we have x bar to be equal to mu that is this vector that we have taken
here is just a null vector. So, this would be 0 if we have x bar to be equal to 0 .
Now, let me give an equation number star here star 1 say, now for this fixed sigma if we
have this quantity, we can say that thus for a fixed sigma positive definite star 1 is
maximum with respect to this mean vector mu if we have, now note that in this likelihood
function here, mu is present only in the exponent. So, if we are trying to maximize the likelihood
function star 1 with respect to mu for a fixed value of the sigma matrix, then this is going
to be the slightly hood function is going to be maximized, if the exponent is maximized.
Now, in the exponent also you will see that this is a quantity which does not dependent
on mu and hence if we are trying to maximize the exponent with respect to mu, then it is
maximization corresponding to this particular path of the exponent. So, this for a fixed
sigma star 1 is maximum with respect to mu, if e to the power minus n by 1 x bar minus
mu transpose sigma inverse x bar minus mu is maximum. So, we will be looking at, what
is the value of that mu which would maximize this particular quantity. Now, thus if we
look at this exponent here, this exponent is going to be maximized, if this part what
is sitting in the negative part of this exponent is minimized.
That is ; if this x bar minus mu transpose sigma inverse x bar minus mu is this is… if that is minimum if this part
is. So, we are trying to find out what is that value of mu which would maximize this
particular quantity and that is equivalent to finding what is that value of mu which
actually would minimize this particular quantity, because it has got negative sign out here
and what is the value of mu that would minimize this particular quantity. We have seen that
for a fixed sigma this quantity is greater than or equal to 0 . So, this quantity here
is going to be minimize, if we choose mu to be x bar. So, x bar vector is the one that
is going to minimize this particular quantity and hence that is what is going to maximize
this exponent part here, and that would maximize the likelihood.
So, that what we can thus say is that, this would imply that this mu hat equal to x bar
would maximize this star 1 likelihood with respect to this mean vector mu for a fixed
sigma matrix. However, we see that this maximizing value of mu does not depend on the particular
level of sigma, where we have fixed and hence this mu hat equal to x bar is going to maximize
the likelihood with respect to mu for every possible values of sigma. So, that this would
imply that the maximum likelihood estimator of mu is mu hat M L E is equal to the sample
mean vector X bar which is also incidentally the unbiased estimator. So, we have got this
to be the maximum likelihood estimator of the mean vector mu. Let us now, try to see
what is the maximum likelihood estimator of sigma in order to derive that let us now,
look at the likelihood function at mu equal to x bar.
So, the likelihood function at mu equal to x bar is… let us denote that by this as
mu hat sigma x 1 x 2 x n. So, that it is equal to 2 pi to the power minus n p by 2 determinant
of sigma to the power minus n by 2 and then we have in the exponent. Now, what would that
look like, let us look at the likelihood function. So, this is the likelihood function star 1,
where we are looking at the likelihood function at the maximum likelihood point mu hat equal
to x bar. So, here this x bar and mu hat, what we will be having will get cancelled
out. So, mu will be replaced by mu hat and that is equal to x bar. So, this would be
a null vector. So, there will not be any contribution coming
from here. So, we will have the exponent to be given by minus n minus 1 by 2 trace of
sigma inverse s. Now, let us write this term as 2 pi to the power minus n by 2 for convenience,
we write it in the mu notation as minus half trace of sigma inverse a matrix A say what
is that matrix A, where this A is equal to n minus 1 times this s matrix. So, that is
nothing but that particular product i equal to 1 to n x i minus x bar into x transpose.
So, this is now, the form of the likelihood which we denote by say star 2 this is the
star 2 from of the likelihood at mu equal to x bar the maximize likelihood point.
Now, maximization of star 2 is equivalent to maximization of log of this likelihood
maximization of star 2 is equivalent to maximization of log of star 2 natural logarithm that is
of minus n p by 2 log of 2 pi then we take sigma inverse. So, we will take this as positive
sign n by 2 log of determinant of sigma inverse. So, we what we have done is, we have kept
this sigma inverse and taken the positive sign n by 2 and taken the log of that. So,
it is plus n by 2 log of determinant of sigma inverse, this minus half trace of sigma inverse
A. Now, note that this particular quantity is constant and hence there is nothing to
maximize here. So, that is of n by 2 log determinant of sigma inverse minus half trace of this
sigma inverse A.
Now, we will do A bit of adjustment here that is of n by 2 log determinant of sigma inverse
A. So, what we have done is, we have introduced an A in order to have a similar expression
as to that, what we have here on the trace. So, A is a matrix which does not dependent
on sigma that is not going to matter much. So, this is the second term which is minus
half trace of sigma inverse A. So, this is minus half trace of sigma inverse A then we
have actually taken this A matrix here. So, we will have to make adjustment for that.
So, determinant of this matrix will be determinant of this plus determinant of this into determinant
of A. So, that we will require a term which is n
by 2 minus log of determinant of A in order to adjust that particular term. Now, once
again see that this particular term is independent of sigma and if we are interested in finding
the sigma which would maximize the likelihood star 2 that is not going to depend on this
particular function, because this is independent of sigma. So, that is of n by 2 log determinant
of sigma inverse A minus half trace of sigma inverse A. So, in order to find out thus the
maximum likelihood estimator of sigma, we can maximize this particular quantity, what
we have here with respect to sigma and that would lead us to maximum likelihood estimator.
Now, let us denote by lamda j let lambda j denote the eigenvalues of this matrix sigma
inverse A of this sigma inverse a now, this j of course, what is the dimension of this
sigma inverse A sigma was coming from the p-dimensional multivariate normal distribution.
So, that is p by p and A is a constant multiplier of the sample variance covariance matrix that
is also a p by p matrix and hence the order is p by p sigma is positive definite and they
probably t 1 A will also be positive definite and hence this matrix entirely is positive
definite and thus this j is from 1 to upto p.
So, suppose lamda j denote the eigenvalue of sigma inverse A you may also note the following
that, this since lamda js are the eigenvalues of the sigma inverse A lamda j is also the
eigen value of A half sigma inverse A half matrix that is easy to see, because if you
look at the eigen value equation then this would lead us to this. Now, we will replace
this let us denote this by star 3. So, this is that equation now, if lamda js are the
eigen values j equal to 1 to p of this sigma inverse A matrix then this star 3 is equal
to n by 2 log of now, determinant of this particular matrix would be product of the
respective eigenvalues. So, it is product this is for j equal to 1 to upto p this minus
half. Now, trace of sigma inverse A would be just sum of these eigenvalues. So, this
is lamda j. So, this is equal to n by 2 now, log of this product is summation of this log
of lamda j terms this minus half summation lamda j.
So, this we can write this these summations are j equal to 1 upto p. So, this is equal
to half summation j equal to 1 to p n times log lamda j this minus. So, we have taken
this half outside there. So, what remains is this lamda j. So, in order to maximize
the likelihood, we can maximize this particular quantity with respect to what now with respect
to these lamda j quantities and what is that lamda j with should maximize this likelihood,
if each of these lamda js are equal to n. So, this is… So, star 4 is maximized if
lamda j is equal to n, for all j equal to 1 to upto p. So, that is basically the lamda
j, now lamda j actually is a the crucial term lamda j is what is now, being involve which
is being associated with this unknown, covariance matrix sigma and that lamda j being equal
to 9, that is lamda j being equal to n for j equal to 1 to p is what is going to give
us lead us to rather than maximum likelihood estimator.
This would imply, if all lamda js are equal to 1 or lamda j is equal to n, then this would
imply that this A half sigma inverse A half that is equal to P and say, what these terms
are d lamda n times I p P prime, where P is now holding the auto normalized eigen vectors
corresponding to this particular matrix. So, this is what the factorization, what we have
now these are orthogonal matrices. So, what we have that is n times I p. Now, using this
particular result, it is simple to get to the maximum likelihood estimators. So, we
have this particular expression, if the eigen values the maximizing eigen values are all
n then this A half sigma inverse A half which we had actually seen here that lamda js are
also eigen values of this A half sigma inverse A half matrix and hence this in terms of its
spectral decomposition, we will have this P D lamda which is this matrix, now P prime
and that is equal to n times I p that is sigma inverse is going to be equal to… Now this
is A half A half. So, pre and post multiplying by A minus half, what will be having here
is A inverse term here. And thus this would imply that this sigma
is going to be 1 upon n A matrix would maximize the likelihood function with respect to this
sigma matrix and what is that. So, this would imply further that this sigma hat maximum
likelihood estimator is going to be 1 upon n A, and if you remember what A is that cross
product matrix. So, that is X i minus X bar X i minus this X bar transpose i equal to
1 to n. So, this thus is the maximum likelihood estimator which in our previous notation is
nothing but this S n. So, it is a sample variance covariance matrix with a divisor n is… thus
the maximum likelihood estimator. So, thus we have proved the following that X bar vector
is the maximum likelihood estimator of this mu vector and S n this matrix with the sample
variance covariance matrix with a divisor n is the maximum likelihood estimator of the
sigma matrix. So, these two thus X bar vector and S n are
the two maximum likelihood estimators. Now, this also reminds us about the invariant normal
maximum likelihood estimator, where X bar was also the maximum likelihood estimator
and the sample variance with a divisor n was the maximum likelihood estimator of the sigma
square quantity corresponding to any invariant normal distribution. It is just to be noted
that this X bar as, we had in a seen earlier is an unbiased estimator of mu on the other
hand, S n is not an unbiased estimator of the sigma variance covariance matrix. However,
it is going to be unbiased in the limit. In the next lecture we will try to derive the
distributions of these maximum likelihood estimators X bar S n or S n minus 1.