Tip:
Highlight text to annotate it
X
In the previous lecture, we started discussing an application of Monte Carlo simulation methods
for analyzing response of randomly driven systems. So, the frame work of our discussion
is as shown here: this is a system which is typically governed by set of stochastic differential
equations and this is driven by Non symbol of inputs f t, which is the random process,
and we would like to simulate samples f t, which are compatibles with a prescribed probabilistic
model. Suppose if f t is a Gaussian random process with a given probabilistic density
functions, suppose it is 0 mean and stationary, I should able to generate a Non symbol of
time histories, which are, which are compatible with the given P S D and probability density
function. Now, for each of this sample, we will integrate
the governing equations of motion, and then get a ensemble of response quantities of interest;
this ensemble of response quantities of interest, we will process statistically and arrive at
probabilistic model for the response. So, given that we are basically approaching the
problem through numerical simulation, the scope of this method is very worst, you can
apply this method to any problem for which a sample calculation can be performed. One
of the things that we have to appreciate at the outset is, when you simulate samples of
f t compatible with the target probabilistic model for f t, we need to ascertain that we
have succeeded in doing so, and the tool that is needed is, to address that problem is methods
of mathematical statistics. Similarly, after we produce the ensemble of
response, to process this ensemble of response time histories, to arrive at a model for the
response process, again, we need to use statistical methods. So, we have initiated a discussion
on statistical methods, so, we will continue that in this lecture. And we are discussing
the problem of estimation of parameters; so, one of the methods is method of moments, where
we find basically quantities like expected value of X to the power of k.
So, that would mean we can find mean, variance, skewness, kurtosis, etcetera. The other alternative
method is, it directly estimates the parameter of an assumed probability density function-
that is method of maximum likelihood. Here the parameter themselves need not be one of
this moments, they can, they will be related to these moment, but they may not be directly
those quantities; similarly, quantities like mode, median, range etcetera., cannot be estimated
using method of moments.
So, the maximum likelihood estimation method helps us to address some of these issues.
So, we are discussing estimation of the mean, so, we started by assuming that x is the random
variable with the, probability density function, Px of x, probability density function, lower
case Px of x, mean mu and standard deviation sigma; we formed a sequence of iid that is,
identical and independently distributed random variables with the common probability distribution
function which is Px of x, which agrees with the probability density function of the random
variable that I am talking about that is, X i is independent of X j for all i not equal
to j from 1to n, and each one of this exercise has mean mu, variance sigma square and probability
density function Px of x. In the previous lecture, I showed that theta
given by 1 by n i equal to one to X i is an unbiased estimator for all n of mu with minimum
variance, and the lowest variance is sigma square by n- that means, this an unbiased
estimator irrespective of the size of the sample. And the lowest variance is sigma square
by n, so, as n become large this variance reduces; on an average the statistic provides
an exact solution to the problem of parameter estimation of the mean of the population.
Now, we now talk, we, it is clear that since Xi's are random variables here, and theta
is a transformation on random variables, theta is also a random variable; the probability
distribution function of theta is known as sampling distribution for mean, and it is
standard deviation is known as standard error. So, we are now interested in postulating a
model for this sampling distribution of theta; so, theta is an unbiased estimator of mu with
variance sigma square by n.
Now, let us begin by considering the case in which variance is known. Now, if x is Gaussian,
it would mean that all these X i's would also be Gaussian because they are i i d sequence
with the common pdf, which is Gaussian, and since we are adding this Gaussian random variables,
theta would also be Gaussian; so, in this case the sampling distribution for theta would
be Gaussian with mean mu and variance sigma square by n.
However, if x is not Gaussian, by virtue of central limit theorem and for large n, we
may still consider theta to be Gaussian; so, this an approximation which we generally make
and therefore, we assume that theta is normal with mean mu and standard deviation sigma
by square root of n, or if we form a standard normal variable, we remove the mean and divide
by standard deviation, and this theta minus mu divided by sigma divide by square root
n is normally distributed with 0 mean and unit standard deviation. So, this is the sampling
distribution for the mean
Now, based on the idea of sampling distribution, we can construct what are known as confidence
interval estimation. So, in the discussion that we had till now, if you use this for
a given realization of X I, we will get a realization of theta, and this is the point
estimator, you get one realization of theta. So, this is one way of answering the question,
but there is another way known as confidence interval estimation, so, let us discuss what
it is. Now, we have just now showed that the random
variable theta minus mu divided by sigma divide by square root n is normal with zero mean
and unit standard deviation.
Now, let us consider a probability level 1 minus Alpha associated with this random variable
for example, 1 minus Alpha could be .95 that means, Alpha is .05. Now, if you draw the
probability density function of the random variable as shown here, we define two points
such that the area between these two points is equal to 1 minus Alpha, so, that means,
the definition is, I consider probability, I am defining two points K of Alpha by 2 and
K of 1 minus Alpha by 2, that are defined as follows. The probability that the random
variable theta minus mu divided by this sigma divided by square root n, lies in the interval
K Alpha by 2 K 1 minus Alpha by 2, is 1 minus Alpha.
Now, this K Alpha by 2 is actually the inverse probability distribution function evaluated
at 1 minus Alpha by 2 minus of that, and K of 1 minus Alpha by 2 is phi inverse of 1
minus Alpha by 2, Alpha is given here, so, we are determining K Alpha by 2 and K 1 minus
Alpha by 2.
So, we have now the statement, probability that theta minus mu divided by sigma divided
by square root n, lying between K Alpha by 2 and K 1 minus Alpha by 2 is 1 minus Alpha.
Now, I can rearrange these terms, and I can write this as probability that the interval
theta plus K divided Alpha by 2 sigma by square root n and theta plus K 1 minus Alpha by 2
sigma by square root n is 1 minus x contains this mean is 1minus Alpha. Theta is observed
value of theta from the sample, so what we say is that, this interval theta plus K Alpha
by 2 sigma square root n, theta plus K 1 minus Alpha by 2 sigma square root n is the confidence
interval on population parameter mu with confidence 1 minus Alpha. Suppose 1 minus Alpha is x
0.95 so, what we are telling is, with 95 percent confidence I can say that the true mean is
contained in the interval theta plus K 0.025 into this plus theta plus K 0.075 sigma by
square root n.
So, instead of telling the estimate of population parameter is a single number, now, I am providing
an interval, that means this interval encloses the true population mean with a given level
of confidence; so, these are much more useful way of providing the answer.
So, we can make some remarks now: this statement that mu lies between these two numbers should
be interpreted as the probability that the random interval, the random interval theta
plus K Alpha by 2 sigma by square root n and theta plus K 1 minus Alpha by 2 by sigma by
square root n contains the population mu, that probability is 1 minus Alpha; mu is not
a random variable, mu is a deterministic constant, so, this should not be constituted as a probability
statement made on mu; mu is a not at all random variable, the random variables are here, so,
this is the random interval, this another random interval, so, this is a one random
variable, this is another random variable, so the difference between the two can be constituted
as a random interval, and that probability, that random interval encloses the population
mean is 1minus Alpha. So, theta equal to 1 by n i equal to 1X i
is a point estimate, and this interval is the confidence interval estimate for population
mean.
Some quick examples: suppose I take ten numbers as displayed here, and I find a point estimator,
you can verify that it is 0.013. Now, if Alpha is 0.05, I can find out K Alpha by 2 is 1
point minus 1.96 and K of 1 minus Alpha by 2 1.96; so, with 95 percent confidence I can
say that the population mean lies between minus 0. 6068 and 0.6328 whereas, according
to this, the estimate for the population mean is 0.0013 with ten samples. That is all that
I can say with this statement whereas, here what I am able to say is with 95 percent confidence
the interval minus 0.6068, 0.6328 contains the mean.
This is, you can see here, there were only ten samples and this interval is fairly big.
Now, if I use thousand samples- I just plotted these 1000 numbers- I get point estimate as
point minus 0.0446, and again with 95 percent confidence I can say that the interval 0.1065,
0.0174 contains the population mean, I can say that with 95 percent confidence.
Now, with 10,000, this answer seems to be close to 0, but this confidence interval is
shrinking; we are getting narrower and narrower confidence intervals because I am using larger
number of samples. That is what the conclusion we can draw from this exercise.
I talked about sampling distribution for mean, so, in principle we should be able to construct
sampling distribution for any statistic that you are interested in. For example, if you
are interested in sampling distribution for variance, so, we use this estimator s square
is 1 by n minus 1 i equal to 1 to n x i minus X bar whole square, where X bar is the point
estimate 1 by n j equal to one X j, this is the estimate for the mean. Now, you find the
mean square value, I mean expected value of s square, which is this, and for X bar I am
going to add and subtract mu and rewrite this in a slightly different form, expand this,
I get three terms, and if I manipulate these terms and use the fact that Xi's are all identical
having mean mu and variance sigma square, and X bar is an unbiased estimator with a
known variance, which is sigma square by n, if I use this information I can show that
this estimator is unbiased; please, see that I am dividing by n minus 1 not by n, that
is to be expected because X bar is a number that has been computed from the sample. So,
if I use n here instead of 1 by n minus 1, if I use 1 by n, it will be a biased estimator
for variance, so this n minus 1 ensures that it is unbiased.
We can show that the variance of this estimator is given by this; I leave this as an exercise.
You have to understand carefully what is being said here, we are talking about variance of
a random variable X, that itself, the estimator for that itself is the random variable, which
has a mean, which is agrees with the population variance, but it being a random variable it
has its own variance and that is given by this; and here you can see that as n tends
to infinity this variance comes down therefore, this estimator is consistent.
So, s square equal to 1 by n minus 1 i equal to 1 to n X i minus X bar whole square is
an unbiased and consistent estimator for sigma square. Now, what is my objective of our discussion,
the objective of discussion is to arrive at sampling distribution for variance. So, I
will rearrange this term, I rewrite this n minus 1 into s square by sigma square and
I rewrite in this form, if the population is Gaussian X i and X bar are Gaussian, then
this r h s will be sum of squares of Gaussian random variables and such sums have distribution
known as chi square distribution; if you add n squares of Gaussian random variables, the
resulting random variable has a distribution known as chi square distribution that can
be shown, and the form of the distribution is displayed here.
So, this is a chi square distribution, probability density function of chi square distribution,
chi squares random variable with n minus one degrees of freedom. So, this is the well studied
probability density function, its properties are tabulated, and we can use that information
to analyze the properties of estimator for the variance.
Now, in estimating mean we assume the standard deviation of the population is known, but
that may not be always the case; so, if you are going to estimate the, while estimating
mean, you are going to estimate standard deviation of the population again from the same sample,
so, then what happens is, the sampling distribution for the mean will not be Gaussian, it will
have certain other property and that is known as student's t distribution; So, what that
is, if you take 2 random variables X and Y, where X is normally distributed zero mean
and unit standard deviation, and Y is chi squares distributed with n degrees of freedom,
and we form the ratio X divided by square root Y divided by n, this random variable,
T, has this probability density function and this is known as student's t probability density
function. Student was a pen name of a scientist was writing papers in the, with that name,
and this distribution goes with that name. So, here, you can see here this is, this is
gamma function and n is on the right hand side, this t is the state variable and t takes
values from minus infinity to plus infinity.
Equipped with this description we can now talk about sampling distribution for the estimator
of a mean with variance not known, so, that means the variance that is needed to construct
the sampling distribution for the mean will be estimated from the same sample; to get
point estimator for the mean you do not need the variance, but to write the sampling distribution
and hence the confidence interval you need the variance.
Now, theta given by 1 by n summation i equal to 1 to n x i is an unbiased estimator of
mu with variance sigma square by n. Now, s square, which is the sampling, the sample
variance as chi square distribution, so, I form now the sum, the ratio theta minus mu
divided by s divide by square root n, and this is chi square, this is gaussian therefore,
this will be a student's t distribution with n minus one degrees of freedom. Therefore,
the sampling distribution for estimator of a mean when variance of the population is
not known is given by this.
Now, if you want to construct confidence interval, from this you have to use this density function;
so, earlier, if you recall in the derivation of confidence interval, we used this, this
curve was Gaussian, now, this Gaussian density function has to be replaced by the student's
t distribution while computing the confidence interval that would mean, if this is the sampling
probability density function, if we make the statement theta Alpha by 2 n minus 1 theta
minus mu divided by s divided by square root of n, that is, this random variable lying
in this interval is one minus Alpha, where 1 minus Alpha is the confidence level, from
this we can construct the confidence interval with given level of confidence.
So, the thing is, as I said instead of using Gaussian density function you need to use
student's t distribution function. Now, an example: I have selected 100 samples of random
numbers, and I am constructing, first, the point estimator for the mean is minus 0.9677
and sigma hat is 0.9777 for this sample. Now, here, in this table I have shown the confidence
intervals for mean and standard deviation at different levels of confidence; so, with
99 percent confidence the interval is this and with 10 percent confidence the interval
is this, and this is for standard deviation, this is for standard deviation and this is
for mean.
So, you can try to simulate this and see what information this conveys. Now, the data used
in this example is provided here, so, you could actually replicate this table and I
leave that as an exercise for you to do that.
In this graph what I have shown is, I use five 5000 numbers and try to find the confidence
intervals for different levels of Alpha, so, see, with 100 percent confidence I can say
that the confidence interval is from minus infinity to plus infinity; so, as we go as
our confidence level increases the width of the confidence band widens.
So, at 95 percent the confidence level interval is this, so, this is the upper limit of the
confidence interval, this is a lower limit of the confidence interval; so, this is actually
the population mean, and this 5000 numbers are generated synthetically with 0 mean and
unit standard deviation, I will explain how that can be done in due course, but right
now you can believe that there of 5000 numbers drawn from a population whose mean is zero
and standard deviation is 1. So, we are getting in answer of a something less than 0.1 as
point estimator, this red line is the point estimator and this blue line is actually the
population mean, and this red line is an approximation to this blue line, that is one answer, the
other answer is, you take any you take any value of confidence, so, 95 percent confidence
I can say that this number here on the Y axis, this number here on the Y axis, this range
contains the population mean, and I am able to make that statement with 95 percent confidence.
Now, this is the result on mean and a similar result on standard deviation; the standard
deviation as I said is unity for the population, and I am getting an answer, which is between
1.01 to 1.02 as the point estimator; and the confidence bands, the lower confidence band
value at 95 percent confidence level and the upper 1 is here, so, with 95 percent confidence
I can say that the standard deviation population standard deviation is contained in this interval.
So, we can summarize that some of the factors that influence confidence interval are: the
statistic that we are using as the estimate, the actual observations made, the confidence
level that you prescribed and the sampling distribution for the statistic, and actually
a sample size. They are some of the factors that influence the confidence interval.
Now, the confidence interval is the function of the sample size. Now, we can ask the question,
if I fix the width of the confidence interval, can I determine what is the number of samples
needed. So, that problem can be addressed as shown here: the number of samples needed
for a given width of confidence interval, what is that? Now, consider the estimator
for population mean with known variance, so that the sampling distribution is Gaussian,
and this is our sampling distribution, mean mu and standard deviation sigma by square
root of n, where n is the sample size, or this standard normal random variable, theta
minus mu divided by this sigma by square root n, which is 0 mean unit standard deviation
normal random variable; and this is the confidence interval that, this is a statement that helps
us to define the confidence interval. Now I define w as k 1 minus Alpha by 2 sigma
by square root of n, as half width of the confidence interval to be specified that means,
as a user I will say this must this is a width, that I want an estimate with this width, how
many samples I should use. Now, minimum number of samples required, we can compute from this,
you solve for n here, and I get this, where w is the width that you specified.
Now, so, what we have done here is, in this graph the Y axis's has to be read on left
and the right, on the left it is the estimate of the mean and on the right this is the minimum
number of samples that we have to use, and on the X axis I have w, the width of a confidence
interval; so, that would mean, if the width of the confidence interval is point 2, so,
I will go along this curve and determine that this much of samples I need to use, and that
gives a- this is plotted on logarithmic scale- so, from this you will find out the number
of samples needed to achieve this level of width.
Now, if you actually performance these simulations, the blue line here is the mean point estimate,
red is the lower limit of confidence interval and green is the upper limit of confidence
interval; and if you exactly find the width here, that would be meeting the requirement
that we are specified that means, with this number of samples this width will be this.
So, that helps you to select sample size, if you want you, the narrower the confidence
limit better is the answer, so you need more samples if you want narrower confidence intervals.
So, this is the plot of simply the minimum samples needed against the half confidence
width. So, you want narrower confidence width, you need larger number of sample for example,
with 10 to the power of 4 samples you are width will be 0.02, but if we are willing
to only 100 samples your width will be somewhere 0.07. So, this is the much better solution
than this, but you have to pay in terms of larger number of samples.
Now, I move on to a next topic. What is known as hypothesis testing? Hypothesis testing
is a method for making decisions on properties of a population based on observed samples.
Typically, we postulate 2 competing hypothesis: one is what is known as null hypothesis denoted
as H 0; and other one known as alternative hypothesis denoted by Ha.
Now, we can imagine a situation where, suppose there is a mass production of some product
say, some steel rod or a yarn is being produced, and you are looking at say, weight of the
steel rod across say, one meter, and you have, you want that, that should be some prescribed
number, suppose you want that the population mean should be 5 in some unit for some physical
quantity. Now, as the production is proceeding you draw
10 samples and find out it is sample mean; it is 5.0038, this is different from 5.0,
why the difference occurs? There could be two reasons: one is there are inevitable random
fluctuations, which are beyond our control, so, nothing can be done about that, and also
I am finding this sample mean with only 10 samples so there is the fluctuations due to
sampling fluctuations, I mean sample size, limited sample size and inherent randomness;
but the other one could be, the production could be defective, that is could be something
going wrong, that is what is produce producing this difference.
So, we are interested in knowing whether these 10 samples are actually being drawn from a
population whose mean is 5.0 or not. So, we make the null hypothesis that mu is 5.0 that
means, the sample is drawn from a population whose mean is 5.0- the word null in null hypothesis
means hypothesis of no difference; the alternative hypothesis here is the negation of this, no
this 10 samples are do not drawn from a population whose mean is 5, the population from which
this is being drawn, the mean of that is not 5, that is the alternative hypothesis.
Now, we test this hypothesis. The question that we are trying to answer is, the observed
difference between estimate and the population mean is due to sampling fluctuations, that
is, random causes or due to systematic non random causes, or is the observed variation,
theta minus mu, arising due to some assignable cause or due to non assignable causes. If
it is due to an assignable cause, you need to take an action, may be you have to stop
the production and examine what is go on, so, you have to take an action if it is assignable;
if it is random fluctuations, you can continue with the production process. So, is observed
variations significant? The word significant here means variation is due to assignable
causes- something is indeed going wrong, there is something significantly wrong here I have
to correct; if the difference is due to random causes, no action is needed, otherwise action
is necessary. The decision that we need to make is accept or reject the null hypothesis.
Now, the errors in making the decisions. Now, we may reject the hypothesis when it should
have been accepted, this is known as Type 1 error, it is error of commission; the actual
change, differences that you have you observe is due to random fluctuations, but you think
it is due to systematic causes and you stop production, so you are making an error, that
is known as Type 1 error. Now, accept the hypothesis when it should have been rejected,
this is Type 2 error, this is error of omission; you should actually stop the production, but
you think the difference that you are seeing is due to random fluctuations, so you permit
productions to go ahead. Now, Type 1 error is action when no action
was needed and Type 2 error is inaction when action was needed.
Now, which error is more dangerous? Actually, Type 2 errors are more dangerous because inaction
when action was needed is more dangerous, if you take an action, you will come to know
that the action was not needed, so, it is lesser evil.
So, Type 1 error is the error of commission, based on the action taken one would come to
know what a wrong, that a wrong decision was taken, but of course, you have to pay the
price because you stopped the production. The price that you would pay for pay for Type
2 error is that a faulty product would go into a final product, the faulty component
will get into a final product. Now, can these errors be eliminated? There
is no way you can eliminate this as long as decisions are based on samples, sampling errors
cannot be avoided; the only way you can avoid this is you have to measure everything that
is produced, every meter of yarn or steel rod that you produce, you have to weigh and
make sure it meets the criterion, then of course, there would not be any error, but
you cannot be doing that. So therefore, there is no way that we can eliminate the error.
So, what we can do? Now, the null hypothesis is H0, mu is 5 that means, sample is drawn
for a population whose mean is five point zero; alternative hypothesis is mu not equal
to 5.0 that means, the sample is not drawn from a population, the sample is drawn for
a population whose mean is not 5.0. Now, the decisions are: H0 is true, H0 is not true.
You accept H0 when H0 is true, it is a correct decision; you accept H0 when H0 is not true,
you are making Type 2 errors. When you reject H0 when H0 is true, you are making Type 1
error. You reject H0 when H0 is not true, it is a correct decision. So, there are two
wrong decisions and two correct decisions. So, what you do is, probability of committing
Type 1 error, we call it as Alpha, and probability of committing Type 2 error, we call it as
beta. Therefore, probability of accepting H0 when H0 is true is 1 minus Alpha and probability
of accepting alternative hypothesis when, alternative hypothesis true is 1 minus beta.
Now, what we do is, probability of committing Type 1 error, Alpha, we fix that. It is difficult
to fix probability of committing Type 2 error- that you have to assign a value for beta-
since it is difficult to assess the consequence of action which has not taken. You can have
experience on fixing Alpha, but not on beta. Ideally, you will try to minimize, but if
you minimize Alpha, beta will increase, and if we minimize beta, Alpha will increase,
I am not going to show that, but that is result.
So, equipped with this method we can now go through the steps in hypothesis test. Step
one: we formulate the null and alternative hypothesis, here, mu is equal to mu 0 and
mu is not equal to mu 0. We choose Alpha, that is level, this is called level of significant
0.01, 0.05, 0.1, it is arbitrarily; actually, we have to make the choice, as a convention
it is taken as 0.01 or 0.05 or 0.1, this is an a significant level. And one minus Alpha
is known as confidence level. So, if you select 0.05 as significance level, you have 95 percent
confidence in what you are saying. Now, you have to identify the test statistic
and its distribution. To test this hypothesis you need to define a statistic, so, obviously,
we are now taking about mean, so the test statistic would be related to the estimate
of the mean that is, theta 1 by n i equal to 1 by n X i- this is our statistic. Now,
the sampling distribution for this would be needed, so, we know that Z, which is theta
minus mu divided by sigma by square root n is normal with zero mean and unit standard
deviation; so, this Z will take it as the test statistic. Now, we have a sample therefore,
we can find out realization of Z. Now, we define the region of rejection of the null
hypothesis, how do we do that?
This is a probability density function of Z, and this area is 1 minus Alpha. If the
observed statistic, that is the observed value of Z is in these region, we reject the null
hypothesis otherwise you accept the null hypothesis. This is how we proceed.
So, again, let us take a set of 15 numbers drawn from a population whose standard deviation
is 1, and let samples is 15. Now, I make a hypothesis that this sample is drawn from
a population whose mean is zero, that is, my null hypothesis; the alternative hypothesis
mu is not equal to zero. I select Alpha to be 0.05. So, the estimator
is 1 by n i equal to 1 to n X i and the statistic is theta minus mu divided by sigma divide
by square root n, which is normal zero to 1. So, based on the these numbers, I get the
point estimator for mean to the 0.1943, and substituting that into this and using n equal
to 15, I get realization of Z to be 0.7524. Now, you look at the first critical point
here, this one, that will be 5 inverse of 0.025, which is minus 1.96, and 5 inverse
of the next critical point is 1 minus 0 .025, which is 1.96.
Now, this observed value of Z is indeed, contain in this interval 1.96 minus 1.96 to 1.96;
so, Z lies in the region of acceptance, so we accept the null hypothesis at 5 percent
significance level. Now, another example, I again take 15 numbers drawn from the population
standard deviation is taken to be known, which is equal to one, and I go through this exercise
again; the null hypothesis is mu is zero, alternative hypothesis mu not equal to zero,
and I select Alpha to be 0.05, and again I get Z value to be 4.0033, this 4.0033 is not
contain in my acceptance region. So, for this set of numbers I have to reject the null hypothesis
at 5 percent significance level. So, the sample mean is 1.03 whereas, the hypothesis
that we are testing that it is zero, so what I have actually done here is, I have generated
15 random numbers from exponentially distributed random variables- I will come to that shortly-
and that has mean 1; so, obviously, if I look at this data, it is clear that mean is not
zero, but is this due to sampling fluctuations or is this due to systematic cause, that is
what we are trying to discover. Now, for this same data, now, I make the null hypothesis
that it is drawn from the population whose mean is 1, the earlier hypothesis it is drawn
from a population whose mean is zero; now, I will test, since I know that I have generated
the number for exponentially distributed random variable whose mean is one, I can now test
whether this simulation of this random number is correct or not.
So, null hypothesis is, mu is 1, alternative hypothesis mu not equal to zero, and again
significance level is 0.05, and I get the Z statistic to be 0.130 and that lies in the
acceptance region, and now I can accept the null hypothesis that, this sample is drawn
from a population whose mean is 1 and at 5 percent significance.
Now, if samples, I take 5000 samples, so this again an exercise, this is hypothesis, null
hypothesis this drawn from a population of mean is zero, and I get the sample statistics
of 0.8732, and it leads to the conclusion that we can accept the null hypothesis at
5 percent significance level.
Similarly, 5000 numbers drawn from exponential random variable.
This null hypothesis is drawn from a population with mean one, and here again variable to
show that Z is 0.7040 and it passes the acceptance criteria. Now, what is to be noted here is
that, the population here is not Gaussian, I know that I am drawing it for my exponential
population whose basic distribution is exponential random variable, but still by a virtual center
limit theorem, because the sampling distribution even for that population we are assuming that
it is Gaussian, so, this result kind of illustrates how center limit theorem leads to what seems
to be a correct answer.
Now, if population standard deviation is not known, how do you do hypothesis test? Now,
I make the null hypothesis that mu is mu 0 and alternative hypothesis mu is not equal
to mu 0. Now, again I choose significance level and level of confidence. Now, we have
to identify the test statistic, now, and its distribution. Earlier, I knew, I assumed that
variance is known therefore, sampling distribution was Gaussian, now variance is also estimated
from the same samples therefore, the test statistic will be now related to student's
t distribution; I define capital t as theta minus mu s divide by square root n where this
s is the sample variance.
Now, based on the sample obtain the estimate of the test statistic; now, I have to get
value of t for that from observed value of theta and s, and we have to define the region
of rejection of the null hypothesis, again based on the t distribution. This can be done,
it is tedious, but it can be done. Now, we can similarly, now, we can argue of the logic
or the steps in constructing hypothesis test procedures for other statistics for example,
I make the null hypothesis that the sample is drawn from a population whose variance
is 100, the alternative hypothesis is variance is greater than hundred, how do we test it
? So, it is same story, we select the confidence
level, significance level and confidence level, now you have to identify the test statistics;
the test statistic depends on what you are testing as hypothesis, it is now one variance,
so, I take the variance estimator of variance to be one by n minus one X i theta whole square
where this theta is the unbiased estimator for the mean, this is also unbiased estimator
for a variance. Now, the test statistic that we will select is related to the sampling
distribution of variance and I have shown that this is the chi square random variable,
and I define this to be n minus one whole square s square theta minus mu divided by
sigma square, so, this is chi square with n degrees of freedom. Based on this, now,
I calculate the sample realization of the test statistic and identify the region of
acceptance and rejection, and I can then decide whether I should accept hypothesis or reject.
Now, till now, I have being talking about moments, now how about probability distributions?
So, if I want to see whether a sample is drawn form a population whose probability distribution
is Gaussian or not with given mean and variance, how do we test that? So, we need to now think
of modeling probability distributions and there are certain tools available for that,
one of that is that is probability paper. So, let X be a random variable with PDF Px
of x and this lower case X i be a sample of X. This probability paper is a special plotting
device in which y axis is scaled in such a way that the probability distribution function
appears as a straight line, we distort the y axis in such a way that the probability
distribution function becomes a straight line, probability distribution function or it is
complement for example, if I take a exponential random variable, Px of x is one minus exponential
minus lambda X where X takes values from zero to infinity.
Now, one minus Px of x, I call it as g x of x, is a complementary probability distribution
function is exponential minus lambda X. Now, you take logarithm of this I get log of g
X of X is minus lambda X, so, if I now distort the Y axis, instead of plotting Px of x if
I plot log g x of x, I will get a straight line for the probability distribution function;
So, on this paper if I now plot the observations that I am making, those points will lie on
this line if the numbers are drawn from a exponential random variable.
So, listed that, let us consider, I think this is fifty numbers from population whose
mean is zero, standard deviation is one, and this is the probability paper for normal probability
distribution function. The probability paper, for every probability distribution there will
be one paper; the normal probability paper is meant for Gaussian random variables. Now,
this red line is a straight line, straight line, this red straight line is actually the
theoretical probability distribution function, and this distorted the y axis, the probability
distribution function appears as a straight line. If you plot the Y axis in an arithmetic
scale, this is what we have been doing, you get this familiar curve like this, but now
we have adjusted the Y axis so that this curve appears is the straight line.
Now, if I plot these numbers on this, they are following this straight line. So, this
is a simple device to see whether the numbers are Gaussian or not.
Now, the same data if I plot on another probability paper, that is probability paper corresponding
to another random variable for example, if I plot the same numbers on Weibull probability
paper, it is not a straight line, see I am getting something like this, which it is not
a straight line, so, this clearly says the numbers are not Weibull.
Now, if I indeed simulate numbers which are according to Weibull distribution, this what
has been done here, they appea r along the straight line. These numbers if you plot on
normal distribution paper, it will be appearing distorted on that.
So that is what is shown here, the Weibull numbers are shown on normal probability paper,
you can see that there is a distortion with two ends. So, probability paper is a useful
tool in modeling for a quick assessment on a nature of probability distribution function.
Now, can we formulate hypothesis testing procedures to verify the null hypothesis? For example,
H0, the null hypothesis X as specified distribution Px of x that means, the population has this
prescribed probability distribution function. You have a sample now, based on properties
of sample you have to test this hypothesis. The null hypothesis is PDF of x is other than
what is specified. This Px of x need to be completely specified for example, this has
to be, population can be Gaussian, but with different parameters, so, your null hypothesis
should be for example, X as a normal probability distribution function with mean equal to say,
1, standard deviation equal to 0.5; if it is drawn from a population whose mean is 20
and standard deviation is 30, then you cannot accept the null hypothesis. It is not on the
Gaussian nature that we are testing the hypothesis, we are specifically testing for a given distribution
with all the parameters as a given specific values.
We choose Alpha, again 0.01,0.05,0.1, and the test statistic here is, we, this, we plot
the empirical probability distribution function, for that what we do we rank order all the
observations and plot the X, we define the probability distribution i of n at that value
X of i; then we define a statistic known as D2, which is the maximum of the difference
between the observed empirical probability distribution function and the corresponding
theoretical probability distribution function, I think this has to be evaluated at X of i.
Based on the sample obtain an estimate of the test statistic. Now, again we have to
define the region of rejection of the null hypothesis, accept H0 if D2 is less than or
equal to c. To be able to do that we need the sampling distribution of the test statistic;
in this test known as kolmogoy smirnov test, this sampling distribution is tabulated and
we can refer to that and conduct this test.
So, I will give some examples. The null hypothesis, H, is a specified distribution n zero, 1 that
means, Gaussian is zero mean and standard deviation; the alternate hypothesis PDFx is
other than what is specified. We select 5 percent significance level and go through
this exercise, and I have done this plotting here.
So, you see here, the blue line is the theory, theoretical population probability distribution
function, which is normal zero one; this red one is the probability distribution function
constructed from the data; this green line, which we are seeing here, which is to be read
in conjunction with axis on Y axis on the right hand side, is the difference between
the 2, Z minus i by n.
Now, if you take the absolute value of that, we can re plot that, and the maximum difference
is the observed estimate of statistic that I am looking for, and for this gain sample
size I can find out what is the critical value. So, the observed estimate of the observed
statistic is less than the critical value therefore, we can accept the null hypothesis.
This is another case, same calculations.
But here, the critical value is here, the observed estimate is here, and we need to
reject the null hypothesis that the sample is drawn from a population whose mean is 0
and standard deviation is 1. So, what I basically did was, the same random numbers that I used
in this study, I added artificially some mean and multiplied the sample, I mean standard
deviation I enhanced artificially by some number so as to make the numbers, I mean they
are still Gaussian, but not having the mean and standard deviation that is being proposed
in the null hypothesis. So, the null hypothesis has to be rejected.
There is another test known as chi square test, which is again helpful in verifying
the probability nature of probability density function. Here, the statistic is defined in
terms of deviation, histogram's deviation not on distribution, but on histogram's. So,
we make k number of bins and find out how many points in your sample lying in each bin,
and according to theory how many points you expect in in each of this bins; you can estimate
knowing the sample size, and we can form this statistic, and we can show that this D1 has
a chi square distribution with k minus 1 degrees of freedom where k is an number of intervals
in making histogram's, N i are observed frequencies, n p i are frequencies calculated from the
assumed theoretical model for the PDF. So, there are this chi squared distributions,
again well tabulated, and for given level of Alpha you can always find the critical
value and therefore, you can develop the procedure to accept or reject the null hypothesis.
Now, what I have down till now is I have quickly reviewed the main results in theory, mathematical
theory of statistics, and now we need to move on to the problem of digital simulation of
samples of random variables. Our basic aim is to be able to stimulate samples of random
quantities, which can be random variable, random processes, random processes evolving
in time, random processes evolving in space; like a wind load on a chimney it evolves both
in space and time, so, that has to be done; and ocean wave if you take, in space it is
multidimensional, and there is a time; so, and at given point in ocean you may look at
the displacement of the wave or some other field variable like pressure or something;
so, you have a vector random field evolving in multiple parameters.
So, we need to now, we have now learnt how to completely characterize, specify such stochastic
quantities, the question that we are now is asking how to simulate numerically samples
of realizations of those random quantities. Suppose X is log normal, how do I generate
100 samples from the given probability distribution function with the parameter specified, how
I can generate hundred number whose, if I were to empirically estimate the probability
distribution function, you should able to accept the hypothesis that this 100 numbers
are drawn from a population of log normal random variables, which prescribed mean is
standard deviation, so how do you achieve that? So, the question is let X be a random
variable with a prescribed probability distribution function, and the question we are asking is
how to generate samples of X on a computer so that estimated models for probability distribution
of X from the data matches with the target PDF, how do we do that?
This is a question that we will consider now, and the starting point for that is what are
known as pseudo random number generators. That means, on a computer, how can we simulate
random numbers? So, I will begin discussion on this in the
next lecture, and this will be the starting point to construct samples of random variables,
random process evolving in time, random process evolving space and time, vector random processes,
Gaussian random process and Non Gaussian random processes; so, the mathematics of simulation
for each one of this differs from each other and we will see some of these details in the
next lecture. So, we conclude this lecture at this stage.