Mod - 07 lec - 26 monte carlo simulation approach - 2

In the previous lecture, we started discussing an application of Monte Carlo simulation methods for analyzing response of randomly driven systems. So, the frame work of our discussion is as shown here: this is a system which is typically governed by set of stochastic differential equations and this is driven by Non symbol of inputs f t, which is the random process, and we would like to simulate samples f t, which are compatibles with a prescribed probabilistic model. Suppose if f t is a Gaussian random process with a given probabilistic density functions, suppose it is 0 mean and stationary, I should able to generate a Non symbol of time histories, which are, which are compatible with the given P S D and probability density function. Now, for each of this sample, we will integrate the governing equations of motion, and then get a ensemble of response quantities of interest; this ensemble of response quantities of interest, we will process statistically and arrive at probabilistic model for the response. So, given that we are basically approaching the problem through numerical simulation, the scope of this method is very worst, you can apply this method to any problem for which a sample calculation can be performed. One of the things that we have to appreciate at the outset is, when you simulate samples of f t compatible with the target probabilistic model for f t, we need to ascertain that we have succeeded in doing so, and the tool that is needed is, to address that problem is methods of mathematical statistics. Similarly, after we produce the ensemble of response, to process this ensemble of response time histories, to arrive at a model for the response process, again, we need to use statistical methods. So, we have initiated a discussion on statistical methods, so, we will continue that in this lecture. And we are discussing the problem of estimation of parameters; so, one of the methods is method of moments, where we find basically quantities like expected value of X to the power of k. So, that would mean we can find mean, variance, skewness, kurtosis, etcetera. The other alternative method is, it directly estimates the parameter of an assumed probability density function- that is method of maximum likelihood. Here the parameter themselves need not be one of this moments, they can, they will be related to these moment, but they may not be directly those quantities; similarly, quantities like mode, median, range etcetera., cannot be estimated using method of moments. So, the maximum likelihood estimation method helps us to address some of these issues. So, we are discussing estimation of the mean, so, we started by assuming that x is the random variable with the, probability density function, Px of x, probability density function, lower case Px of x, mean mu and standard deviation sigma; we formed a sequence of iid that is, identical and independently distributed random variables with the common probability distribution function which is Px of x, which agrees with the probability density function of the random variable that I am talking about that is, X i is independent of X j for all i not equal to j from 1to n, and each one of this exercise has mean mu, variance sigma square and probability density function Px of x. In the previous lecture, I showed that theta given by 1 by n i equal to one to X i is an unbiased estimator for all n of mu with minimum variance, and the lowest variance is sigma square by n- that means, this an unbiased estimator irrespective of the size of the sample. And the lowest variance is sigma square by n, so, as n become large this variance reduces; on an average the statistic provides an exact solution to the problem of parameter estimation of the mean of the population. Now, we now talk, we, it is clear that since Xi's are random variables here, and theta is a transformation on random variables, theta is also a random variable; the probability distribution function of theta is known as sampling distribution for mean, and it is standard deviation is known as standard error. So, we are now interested in postulating a model for this sampling distribution of theta; so, theta is an unbiased estimator of mu with variance sigma square by n. Now, let us begin by considering the case in which variance is known. Now, if x is Gaussian, it would mean that all these X i's would also be Gaussian because they are i i d sequence with the common pdf, which is Gaussian, and since we are adding this Gaussian random variables, theta would also be Gaussian; so, in this case the sampling distribution for theta would be Gaussian with mean mu and variance sigma square by n. However, if x is not Gaussian, by virtue of central limit theorem and for large n, we may still consider theta to be Gaussian; so, this an approximation which we generally make and therefore, we assume that theta is normal with mean mu and standard deviation sigma by square root of n, or if we form a standard normal variable, we remove the mean and divide by standard deviation, and this theta minus mu divided by sigma divide by square root n is normally distributed with 0 mean and unit standard deviation. So, this is the sampling distribution for the mean Now, based on the idea of sampling distribution, we can construct what are known as confidence interval estimation. So, in the discussion that we had till now, if you use this for a given realization of X I, we will get a realization of theta, and this is the point estimator, you get one realization of theta. So, this is one way of answering the question, but there is another way known as confidence interval estimation, so, let us discuss what it is. Now, we have just now showed that the random variable theta minus mu divided by sigma divide by square root n is normal with zero mean and unit standard deviation. Now, let us consider a probability level 1 minus Alpha associated with this random variable for example, 1 minus Alpha could be .95 that means, Alpha is .05. Now, if you draw the probability density function of the random variable as shown here, we define two points such that the area between these two points is equal to 1 minus Alpha, so, that means, the definition is, I consider probability, I am defining two points K of Alpha by 2 and K of 1 minus Alpha by 2, that are defined as follows. The probability that the random variable theta minus mu divided by this sigma divided by square root n, lies in the interval K Alpha by 2 K 1 minus Alpha by 2, is 1 minus Alpha. Now, this K Alpha by 2 is actually the inverse probability distribution function evaluated at 1 minus Alpha by 2 minus of that, and K of 1 minus Alpha by 2 is phi inverse of 1 minus Alpha by 2, Alpha is given here, so, we are determining K Alpha by 2 and K 1 minus Alpha by 2. So, we have now the statement, probability that theta minus mu divided by sigma divided by square root n, lying between K Alpha by 2 and K 1 minus Alpha by 2 is 1 minus Alpha. Now, I can rearrange these terms, and I can write this as probability that the interval theta plus K divided Alpha by 2 sigma by square root n and theta plus K 1 minus Alpha by 2 sigma by square root n is 1 minus x contains this mean is 1minus Alpha. Theta is observed value of theta from the sample, so what we say is that, this interval theta plus K Alpha by 2 sigma square root n, theta plus K 1 minus Alpha by 2 sigma square root n is the confidence interval on population parameter mu with confidence 1 minus Alpha. Suppose 1 minus Alpha is x 0.95 so, what we are telling is, with 95 percent confidence I can say that the true mean is contained in the interval theta plus K 0.025 into this plus theta plus K 0.075 sigma by square root n. So, instead of telling the estimate of population parameter is a single number, now, I am providing an interval, that means this interval encloses the true population mean with a given level of confidence; so, these are much more useful way of providing the answer. So, we can make some remarks now: this statement that mu lies between these two numbers should be interpreted as the probability that the random interval, the random interval theta plus K Alpha by 2 sigma by square root n and theta plus K 1 minus Alpha by 2 by sigma by square root n contains the population mu, that probability is 1 minus Alpha; mu is not a random variable, mu is a deterministic constant, so, this should not be constituted as a probability statement made on mu; mu is a not at all random variable, the random variables are here, so, this is the random interval, this another random interval, so, this is a one random variable, this is another random variable, so the difference between the two can be constituted as a random interval, and that probability, that random interval encloses the population mean is 1minus Alpha. So, theta equal to 1 by n i equal to 1X i is a point estimate, and this interval is the confidence interval estimate for population mean. Some quick examples: suppose I take ten numbers as displayed here, and I find a point estimator, you can verify that it is 0.013. Now, if Alpha is 0.05, I can find out K Alpha by 2 is 1 point minus 1.96 and K of 1 minus Alpha by 2 1.96; so, with 95 percent confidence I can say that the population mean lies between minus 0. 6068 and 0.6328 whereas, according to this, the estimate for the population mean is 0.0013 with ten samples. That is all that I can say with this statement whereas, here what I am able to say is with 95 percent confidence the interval minus 0.6068, 0.6328 contains the mean. This is, you can see here, there were only ten samples and this interval is fairly big. Now, if I use thousand samples- I just plotted these 1000 numbers- I get point estimate as point minus 0.0446, and again with 95 percent confidence I can say that the interval 0.1065, 0.0174 contains the population mean, I can say that with 95 percent confidence. Now, with 10,000, this answer seems to be close to 0, but this confidence interval is shrinking; we are getting narrower and narrower confidence intervals because I am using larger number of samples. That is what the conclusion we can draw from this exercise. I talked about sampling distribution for mean, so, in principle we should be able to construct sampling distribution for any statistic that you are interested in. For example, if you are interested in sampling distribution for variance, so, we use this estimator s square is 1 by n minus 1 i equal to 1 to n x i minus X bar whole square, where X bar is the point estimate 1 by n j equal to one X j, this is the estimate for the mean. Now, you find the mean square value, I mean expected value of s square, which is this, and for X bar I am going to add and subtract mu and rewrite this in a slightly different form, expand this, I get three terms, and if I manipulate these terms and use the fact that Xi's are all identical having mean mu and variance sigma square, and X bar is an unbiased estimator with a known variance, which is sigma square by n, if I use this information I can show that this estimator is unbiased; please, see that I am dividing by n minus 1 not by n, that is to be expected because X bar is a number that has been computed from the sample. So, if I use n here instead of 1 by n minus 1, if I use 1 by n, it will be a biased estimator for variance, so this n minus 1 ensures that it is unbiased. We can show that the variance of this estimator is given by this; I leave this as an exercise. You have to understand carefully what is being said here, we are talking about variance of a random variable X, that itself, the estimator for that itself is the random variable, which has a mean, which is agrees with the population variance, but it being a random variable it has its own variance and that is given by this; and here you can see that as n tends to infinity this variance comes down therefore, this estimator is consistent. So, s square equal to 1 by n minus 1 i equal to 1 to n X i minus X bar whole square is an unbiased and consistent estimator for sigma square. Now, what is my objective of our discussion, the objective of discussion is to arrive at sampling distribution for variance. So, I will rearrange this term, I rewrite this n minus 1 into s square by sigma square and I rewrite in this form, if the population is Gaussian X i and X bar are Gaussian, then this r h s will be sum of squares of Gaussian random variables and such sums have distribution known as chi square distribution; if you add n squares of Gaussian random variables, the resulting random variable has a distribution known as chi square distribution that can be shown, and the form of the distribution is displayed here. So, this is a chi square distribution, probability density function of chi square distribution, chi squares random variable with n minus one degrees of freedom. So, this is the well studied probability density function, its properties are tabulated, and we can use that information to analyze the properties of estimator for the variance. Now, in estimating mean we assume the standard deviation of the population is known, but that may not be always the case; so, if you are going to estimate the, while estimating mean, you are going to estimate standard deviation of the population again from the same sample, so, then what happens is, the sampling distribution for the mean will not be Gaussian, it will have certain other property and that is known as student's t distribution; So, what that is, if you take 2 random variables X and Y, where X is normally distributed zero mean and unit standard deviation, and Y is chi squares distributed with n degrees of freedom, and we form the ratio X divided by square root Y divided by n, this random variable, T, has this probability density function and this is known as student's t probability density function. Student was a pen name of a scientist was writing papers in the, with that name, and this distribution goes with that name. So, here, you can see here this is, this is gamma function and n is on the right hand side, this t is the state variable and t takes values from minus infinity to plus infinity. Equipped with this description we can now talk about sampling distribution for the estimator of a mean with variance not known, so, that means the variance that is needed to construct the sampling distribution for the mean will be estimated from the same sample; to get point estimator for the mean you do not need the variance, but to write the sampling distribution and hence the confidence interval you need the variance. Now, theta given by 1 by n summation i equal to 1 to n x i is an unbiased estimator of mu with variance sigma square by n. Now, s square, which is the sampling, the sample variance as chi square distribution, so, I form now the sum, the ratio theta minus mu divided by s divide by square root n, and this is chi square, this is gaussian therefore, this will be a student's t distribution with n minus one degrees of freedom. Therefore, the sampling distribution for estimator of a mean when variance of the population is not known is given by this. Now, if you want to construct confidence interval, from this you have to use this density function; so, earlier, if you recall in the derivation of confidence interval, we used this, this curve was Gaussian, now, this Gaussian density function has to be replaced by the student's t distribution while computing the confidence interval that would mean, if this is the sampling probability density function, if we make the statement theta Alpha by 2 n minus 1 theta minus mu divided by s divided by square root of n, that is, this random variable lying in this interval is one minus Alpha, where 1 minus Alpha is the confidence level, from this we can construct the confidence interval with given level of confidence. So, the thing is, as I said instead of using Gaussian density function you need to use student's t distribution function. Now, an example: I have selected 100 samples of random numbers, and I am constructing, first, the point estimator for the mean is minus 0.9677 and sigma hat is 0.9777 for this sample. Now, here, in this table I have shown the confidence intervals for mean and standard deviation at different levels of confidence; so, with 99 percent confidence the interval is this and with 10 percent confidence the interval is this, and this is for standard deviation, this is for standard deviation and this is for mean. So, you can try to simulate this and see what information this conveys. Now, the data used in this example is provided here, so, you could actually replicate this table and I leave that as an exercise for you to do that. In this graph what I have shown is, I use five 5000 numbers and try to find the confidence intervals for different levels of Alpha, so, see, with 100 percent confidence I can say that the confidence interval is from minus infinity to plus infinity; so, as we go as our confidence level increases the width of the confidence band widens. So, at 95 percent the confidence level interval is this, so, this is the upper limit of the confidence interval, this is a lower limit of the confidence interval; so, this is actually the population mean, and this 5000 numbers are generated synthetically with 0 mean and unit standard deviation, I will explain how that can be done in due course, but right now you can believe that there of 5000 numbers drawn from a population whose mean is zero and standard deviation is 1. So, we are getting in answer of a something less than 0.1 as point estimator, this red line is the point estimator and this blue line is actually the population mean, and this red line is an approximation to this blue line, that is one answer, the other answer is, you take any you take any value of confidence, so, 95 percent confidence I can say that this number here on the Y axis, this number here on the Y axis, this range contains the population mean, and I am able to make that statement with 95 percent confidence. Now, this is the result on mean and a similar result on standard deviation; the standard deviation as I said is unity for the population, and I am getting an answer, which is between 1.01 to 1.02 as the point estimator; and the confidence bands, the lower confidence band value at 95 percent confidence level and the upper 1 is here, so, with 95 percent confidence I can say that the standard deviation population standard deviation is contained in this interval. So, we can summarize that some of the factors that influence confidence interval are: the statistic that we are using as the estimate, the actual observations made, the confidence level that you prescribed and the sampling distribution for the statistic, and actually a sample size. They are some of the factors that influence the confidence interval. Now, the confidence interval is the function of the sample size. Now, we can ask the question, if I fix the width of the confidence interval, can I determine what is the number of samples needed. So, that problem can be addressed as shown here: the number of samples needed for a given width of confidence interval, what is that? Now, consider the estimator for population mean with known variance, so that the sampling distribution is Gaussian, and this is our sampling distribution, mean mu and standard deviation sigma by square root of n, where n is the sample size, or this standard normal random variable, theta minus mu divided by this sigma by square root n, which is 0 mean unit standard deviation normal random variable; and this is the confidence interval that, this is a statement that helps us to define the confidence interval. Now I define w as k 1 minus Alpha by 2 sigma by square root of n, as half width of the confidence interval to be specified that means, as a user I will say this must this is a width, that I want an estimate with this width, how many samples I should use. Now, minimum number of samples required, we can compute from this, you solve for n here, and I get this, where w is the width that you specified. Now, so, what we have done here is, in this graph the Y axis's has to be read on left and the right, on the left it is the estimate of the mean and on the right this is the minimum number of samples that we have to use, and on the X axis I have w, the width of a confidence interval; so, that would mean, if the width of the confidence interval is point 2, so, I will go along this curve and determine that this much of samples I need to use, and that gives a- this is plotted on logarithmic scale- so, from this you will find out the number of samples needed to achieve this level of width. Now, if you actually performance these simulations, the blue line here is the mean point estimate, red is the lower limit of confidence interval and green is the upper limit of confidence interval; and if you exactly find the width here, that would be meeting the requirement that we are specified that means, with this number of samples this width will be this. So, that helps you to select sample size, if you want you, the narrower the confidence limit better is the answer, so you need more samples if you want narrower confidence intervals. So, this is the plot of simply the minimum samples needed against the half confidence width. So, you want narrower confidence width, you need larger number of sample for example, with 10 to the power of 4 samples you are width will be 0.02, but if we are willing to only 100 samples your width will be somewhere 0.07. So, this is the much better solution than this, but you have to pay in terms of larger number of samples. Now, I move on to a next topic. What is known as hypothesis testing? Hypothesis testing is a method for making decisions on properties of a population based on observed samples. Typically, we postulate 2 competing hypothesis: one is what is known as null hypothesis denoted as H 0; and other one known as alternative hypothesis denoted by Ha. Now, we can imagine a situation where, suppose there is a mass production of some product say, some steel rod or a yarn is being produced, and you are looking at say, weight of the steel rod across say, one meter, and you have, you want that, that should be some prescribed number, suppose you want that the population mean should be 5 in some unit for some physical quantity. Now, as the production is proceeding you draw 10 samples and find out it is sample mean; it is 5.0038, this is different from 5.0, why the difference occurs? There could be two reasons: one is there are inevitable random fluctuations, which are beyond our control, so, nothing can be done about that, and also I am finding this sample mean with only 10 samples so there is the fluctuations due to sampling fluctuations, I mean sample size, limited sample size and inherent randomness; but the other one could be, the production could be defective, that is could be something going wrong, that is what is produce producing this difference. So, we are interested in knowing whether these 10 samples are actually being drawn from a population whose mean is 5.0 or not. So, we make the null hypothesis that mu is 5.0 that means, the sample is drawn from a population whose mean is 5.0- the word null in null hypothesis means hypothesis of no difference; the alternative hypothesis here is the negation of this, no this 10 samples are do not drawn from a population whose mean is 5, the population from which this is being drawn, the mean of that is not 5, that is the alternative hypothesis. Now, we test this hypothesis. The question that we are trying to answer is, the observed difference between estimate and the population mean is due to sampling fluctuations, that is, random causes or due to systematic non random causes, or is the observed variation, theta minus mu, arising due to some assignable cause or due to non assignable causes. If it is due to an assignable cause, you need to take an action, may be you have to stop the production and examine what is go on, so, you have to take an action if it is assignable; if it is random fluctuations, you can continue with the production process. So, is observed variations significant? The word significant here means variation is due to assignable causes- something is indeed going wrong, there is something significantly wrong here I have to correct; if the difference is due to random causes, no action is needed, otherwise action is necessary. The decision that we need to make is accept or reject the null hypothesis. Now, the errors in making the decisions. Now, we may reject the hypothesis when it should have been accepted, this is known as Type 1 error, it is error of commission; the actual change, differences that you have you observe is due to random fluctuations, but you think it is due to systematic causes and you stop production, so you are making an error, that is known as Type 1 error. Now, accept the hypothesis when it should have been rejected, this is Type 2 error, this is error of omission; you should actually stop the production, but you think the difference that you are seeing is due to random fluctuations, so you permit productions to go ahead. Now, Type 1 error is action when no action was needed and Type 2 error is inaction when action was needed. Now, which error is more dangerous? Actually, Type 2 errors are more dangerous because inaction when action was needed is more dangerous, if you take an action, you will come to know that the action was not needed, so, it is lesser evil. So, Type 1 error is the error of commission, based on the action taken one would come to know what a wrong, that a wrong decision was taken, but of course, you have to pay the price because you stopped the production. The price that you would pay for pay for Type 2 error is that a faulty product would go into a final product, the faulty component will get into a final product. Now, can these errors be eliminated? There is no way you can eliminate this as long as decisions are based on samples, sampling errors cannot be avoided; the only way you can avoid this is you have to measure everything that is produced, every meter of yarn or steel rod that you produce, you have to weigh and make sure it meets the criterion, then of course, there would not be any error, but you cannot be doing that. So therefore, there is no way that we can eliminate the error. So, what we can do? Now, the null hypothesis is H0, mu is 5 that means, sample is drawn for a population whose mean is five point zero; alternative hypothesis is mu not equal to 5.0 that means, the sample is not drawn from a population, the sample is drawn for a population whose mean is not 5.0. Now, the decisions are: H0 is true, H0 is not true. You accept H0 when H0 is true, it is a correct decision; you accept H0 when H0 is not true, you are making Type 2 errors. When you reject H0 when H0 is true, you are making Type 1 error. You reject H0 when H0 is not true, it is a correct decision. So, there are two wrong decisions and two correct decisions. So, what you do is, probability of committing Type 1 error, we call it as Alpha, and probability of committing Type 2 error, we call it as beta. Therefore, probability of accepting H0 when H0 is true is 1 minus Alpha and probability of accepting alternative hypothesis when, alternative hypothesis true is 1 minus beta. Now, what we do is, probability of committing Type 1 error, Alpha, we fix that. It is difficult to fix probability of committing Type 2 error- that you have to assign a value for beta- since it is difficult to assess the consequence of action which has not taken. You can have experience on fixing Alpha, but not on beta. Ideally, you will try to minimize, but if you minimize Alpha, beta will increase, and if we minimize beta, Alpha will increase, I am not going to show that, but that is result. So, equipped with this method we can now go through the steps in hypothesis test. Step one: we formulate the null and alternative hypothesis, here, mu is equal to mu 0 and mu is not equal to mu 0. We choose Alpha, that is level, this is called level of significant 0.01, 0.05, 0.1, it is arbitrarily; actually, we have to make the choice, as a convention it is taken as 0.01 or 0.05 or 0.1, this is an a significant level. And one minus Alpha is known as confidence level. So, if you select 0.05 as significance level, you have 95 percent confidence in what you are saying. Now, you have to identify the test statistic and its distribution. To test this hypothesis you need to define a statistic, so, obviously, we are now taking about mean, so the test statistic would be related to the estimate of the mean that is, theta 1 by n i equal to 1 by n X i- this is our statistic. Now, the sampling distribution for this would be needed, so, we know that Z, which is theta minus mu divided by sigma by square root n is normal with zero mean and unit standard deviation; so, this Z will take it as the test statistic. Now, we have a sample therefore, we can find out realization of Z. Now, we define the region of rejection of the null hypothesis, how do we do that? This is a probability density function of Z, and this area is 1 minus Alpha. If the observed statistic, that is the observed value of Z is in these region, we reject the null hypothesis otherwise you accept the null hypothesis. This is how we proceed. So, again, let us take a set of 15 numbers drawn from a population whose standard deviation is 1, and let samples is 15. Now, I make a hypothesis that this sample is drawn from a population whose mean is zero, that is, my null hypothesis; the alternative hypothesis mu is not equal to zero. I select Alpha to be 0.05. So, the estimator is 1 by n i equal to 1 to n X i and the statistic is theta minus mu divided by sigma divide by square root n, which is normal zero to 1. So, based on the these numbers, I get the point estimator for mean to the 0.1943, and substituting that into this and using n equal to 15, I get realization of Z to be 0.7524. Now, you look at the first critical point here, this one, that will be 5 inverse of 0.025, which is minus 1.96, and 5 inverse of the next critical point is 1 minus 0 .025, which is 1.96. Now, this observed value of Z is indeed, contain in this interval 1.96 minus 1.96 to 1.96; so, Z lies in the region of acceptance, so we accept the null hypothesis at 5 percent significance level. Now, another example, I again take 15 numbers drawn from the population standard deviation is taken to be known, which is equal to one, and I go through this exercise again; the null hypothesis is mu is zero, alternative hypothesis mu not equal to zero, and I select Alpha to be 0.05, and again I get Z value to be 4.0033, this 4.0033 is not contain in my acceptance region. So, for this set of numbers I have to reject the null hypothesis at 5 percent significance level. So, the sample mean is 1.03 whereas, the hypothesis that we are testing that it is zero, so what I have actually done here is, I have generated 15 random numbers from exponentially distributed random variables- I will come to that shortly- and that has mean 1; so, obviously, if I look at this data, it is clear that mean is not zero, but is this due to sampling fluctuations or is this due to systematic cause, that is what we are trying to discover. Now, for this same data, now, I make the null hypothesis that it is drawn from the population whose mean is 1, the earlier hypothesis it is drawn from a population whose mean is zero; now, I will test, since I know that I have generated the number for exponentially distributed random variable whose mean is one, I can now test whether this simulation of this random number is correct or not. So, null hypothesis is, mu is 1, alternative hypothesis mu not equal to zero, and again significance level is 0.05, and I get the Z statistic to be 0.130 and that lies in the acceptance region, and now I can accept the null hypothesis that, this sample is drawn from a population whose mean is 1 and at 5 percent significance. Now, if samples, I take 5000 samples, so this again an exercise, this is hypothesis, null hypothesis this drawn from a population of mean is zero, and I get the sample statistics of 0.8732, and it leads to the conclusion that we can accept the null hypothesis at 5 percent significance level. Similarly, 5000 numbers drawn from exponential random variable. This null hypothesis is drawn from a population with mean one, and here again variable to show that Z is 0.7040 and it passes the acceptance criteria. Now, what is to be noted here is that, the population here is not Gaussian, I know that I am drawing it for my exponential population whose basic distribution is exponential random variable, but still by a virtual center limit theorem, because the sampling distribution even for that population we are assuming that it is Gaussian, so, this result kind of illustrates how center limit theorem leads to what seems to be a correct answer. Now, if population standard deviation is not known, how do you do hypothesis test? Now, I make the null hypothesis that mu is mu 0 and alternative hypothesis mu is not equal to mu 0. Now, again I choose significance level and level of confidence. Now, we have to identify the test statistic, now, and its distribution. Earlier, I knew, I assumed that variance is known therefore, sampling distribution was Gaussian, now variance is also estimated from the same samples therefore, the test statistic will be now related to student's t distribution; I define capital t as theta minus mu s divide by square root n where this s is the sample variance. Now, based on the sample obtain the estimate of the test statistic; now, I have to get value of t for that from observed value of theta and s, and we have to define the region of rejection of the null hypothesis, again based on the t distribution. This can be done, it is tedious, but it can be done. Now, we can similarly, now, we can argue of the logic or the steps in constructing hypothesis test procedures for other statistics for example, I make the null hypothesis that the sample is drawn from a population whose variance is 100, the alternative hypothesis is variance is greater than hundred, how do we test it ? So, it is same story, we select the confidence level, significance level and confidence level, now you have to identify the test statistics; the test statistic depends on what you are testing as hypothesis, it is now one variance, so, I take the variance estimator of variance to be one by n minus one X i theta whole square where this theta is the unbiased estimator for the mean, this is also unbiased estimator for a variance. Now, the test statistic that we will select is related to the sampling distribution of variance and I have shown that this is the chi square random variable, and I define this to be n minus one whole square s square theta minus mu divided by sigma square, so, this is chi square with n degrees of freedom. Based on this, now, I calculate the sample realization of the test statistic and identify the region of acceptance and rejection, and I can then decide whether I should accept hypothesis or reject. Now, till now, I have being talking about moments, now how about probability distributions? So, if I want to see whether a sample is drawn form a population whose probability distribution is Gaussian or not with given mean and variance, how do we test that? So, we need to now think of modeling probability distributions and there are certain tools available for that, one of that is that is probability paper. So, let X be a random variable with PDF Px of x and this lower case X i be a sample of X. This probability paper is a special plotting device in which y axis is scaled in such a way that the probability distribution function appears as a straight line, we distort the y axis in such a way that the probability distribution function becomes a straight line, probability distribution function or it is complement for example, if I take a exponential random variable, Px of x is one minus exponential minus lambda X where X takes values from zero to infinity. Now, one minus Px of x, I call it as g x of x, is a complementary probability distribution function is exponential minus lambda X. Now, you take logarithm of this I get log of g X of X is minus lambda X, so, if I now distort the Y axis, instead of plotting Px of x if I plot log g x of x, I will get a straight line for the probability distribution function; So, on this paper if I now plot the observations that I am making, those points will lie on this line if the numbers are drawn from a exponential random variable. So, listed that, let us consider, I think this is fifty numbers from population whose mean is zero, standard deviation is one, and this is the probability paper for normal probability distribution function. The probability paper, for every probability distribution there will be one paper; the normal probability paper is meant for Gaussian random variables. Now, this red line is a straight line, straight line, this red straight line is actually the theoretical probability distribution function, and this distorted the y axis, the probability distribution function appears as a straight line. If you plot the Y axis in an arithmetic scale, this is what we have been doing, you get this familiar curve like this, but now we have adjusted the Y axis so that this curve appears is the straight line. Now, if I plot these numbers on this, they are following this straight line. So, this is a simple device to see whether the numbers are Gaussian or not. Now, the same data if I plot on another probability paper, that is probability paper corresponding to another random variable for example, if I plot the same numbers on Weibull probability paper, it is not a straight line, see I am getting something like this, which it is not a straight line, so, this clearly says the numbers are not Weibull. Now, if I indeed simulate numbers which are according to Weibull distribution, this what has been done here, they appea r along the straight line. These numbers if you plot on normal distribution paper, it will be appearing distorted on that. So that is what is shown here, the Weibull numbers are shown on normal probability paper, you can see that there is a distortion with two ends. So, probability paper is a useful tool in modeling for a quick assessment on a nature of probability distribution function. Now, can we formulate hypothesis testing procedures to verify the null hypothesis? For example, H0, the null hypothesis X as specified distribution Px of x that means, the population has this prescribed probability distribution function. You have a sample now, based on properties of sample you have to test this hypothesis. The null hypothesis is PDF of x is other than what is specified. This Px of x need to be completely specified for example, this has to be, population can be Gaussian, but with different parameters, so, your null hypothesis should be for example, X as a normal probability distribution function with mean equal to say, 1, standard deviation equal to 0.5; if it is drawn from a population whose mean is 20 and standard deviation is 30, then you cannot accept the null hypothesis. It is not on the Gaussian nature that we are testing the hypothesis, we are specifically testing for a given distribution with all the parameters as a given specific values. We choose Alpha, again 0.01,0.05,0.1, and the test statistic here is, we, this, we plot the empirical probability distribution function, for that what we do we rank order all the observations and plot the X, we define the probability distribution i of n at that value X of i; then we define a statistic known as D2, which is the maximum of the difference between the observed empirical probability distribution function and the corresponding theoretical probability distribution function, I think this has to be evaluated at X of i. Based on the sample obtain an estimate of the test statistic. Now, again we have to define the region of rejection of the null hypothesis, accept H0 if D2 is less than or equal to c. To be able to do that we need the sampling distribution of the test statistic; in this test known as kolmogoy smirnov test, this sampling distribution is tabulated and we can refer to that and conduct this test. So, I will give some examples. The null hypothesis, H, is a specified distribution n zero, 1 that means, Gaussian is zero mean and standard deviation; the alternate hypothesis PDFx is other than what is specified. We select 5 percent significance level and go through this exercise, and I have done this plotting here. So, you see here, the blue line is the theory, theoretical population probability distribution function, which is normal zero one; this red one is the probability distribution function constructed from the data; this green line, which we are seeing here, which is to be read in conjunction with axis on Y axis on the right hand side, is the difference between the 2, Z minus i by n. Now, if you take the absolute value of that, we can re plot that, and the maximum difference is the observed estimate of statistic that I am looking for, and for this gain sample size I can find out what is the critical value. So, the observed estimate of the observed statistic is less than the critical value therefore, we can accept the null hypothesis. This is another case, same calculations. But here, the critical value is here, the observed estimate is here, and we need to reject the null hypothesis that the sample is drawn from a population whose mean is 0 and standard deviation is 1. So, what I basically did was, the same random numbers that I used in this study, I added artificially some mean and multiplied the sample, I mean standard deviation I enhanced artificially by some number so as to make the numbers, I mean they are still Gaussian, but not having the mean and standard deviation that is being proposed in the null hypothesis. So, the null hypothesis has to be rejected. There is another test known as chi square test, which is again helpful in verifying the probability nature of probability density function. Here, the statistic is defined in terms of deviation, histogram's deviation not on distribution, but on histogram's. So, we make k number of bins and find out how many points in your sample lying in each bin, and according to theory how many points you expect in in each of this bins; you can estimate knowing the sample size, and we can form this statistic, and we can show that this D1 has a chi square distribution with k minus 1 degrees of freedom where k is an number of intervals in making histogram's, N i are observed frequencies, n p i are frequencies calculated from the assumed theoretical model for the PDF. So, there are this chi squared distributions, again well tabulated, and for given level of Alpha you can always find the critical value and therefore, you can develop the procedure to accept or reject the null hypothesis. Now, what I have down till now is I have quickly reviewed the main results in theory, mathematical theory of statistics, and now we need to move on to the problem of digital simulation of samples of random variables. Our basic aim is to be able to stimulate samples of random quantities, which can be random variable, random processes, random processes evolving in time, random processes evolving in space; like a wind load on a chimney it evolves both in space and time, so, that has to be done; and ocean wave if you take, in space it is multidimensional, and there is a time; so, and at given point in ocean you may look at the displacement of the wave or some other field variable like pressure or something; so, you have a vector random field evolving in multiple parameters. So, we need to now, we have now learnt how to completely characterize, specify such stochastic quantities, the question that we are now is asking how to simulate numerically samples of realizations of those random quantities. Suppose X is log normal, how do I generate 100 samples from the given probability distribution function with the parameter specified, how I can generate hundred number whose, if I were to empirically estimate the probability distribution function, you should able to accept the hypothesis that this 100 numbers are drawn from a population of log normal random variables, which prescribed mean is standard deviation, so how do you achieve that? So, the question is let X be a random variable with a prescribed probability distribution function, and the question we are asking is how to generate samples of X on a computer so that estimated models for probability distribution of X from the data matches with the target PDF, how do we do that? This is a question that we will consider now, and the starting point for that is what are known as pseudo random number generators. That means, on a computer, how can we simulate random numbers? So, I will begin discussion on this in the next lecture, and this will be the starting point to construct samples of random variables, random process evolving in time, random process evolving space and time, vector random processes, Gaussian random process and Non Gaussian random processes; so, the mathematics of simulation for each one of this differs from each other and we will see some of these details in the next lecture. So, we conclude this lecture at this stage.