Mod - 01 lec - 36 sampling distribution and parameter estimation - Contd.

Welcome to this second lecture of this module and you know that this module is onprobability and statistics.We, in the last class, have started thediscussion on the sampling distribution and parameter estimation and we have seen there are two different methods of this point distribution and we started thatinterval estimation.We completed that interval estimation forthe meanwith known variance. So, basically in today’s lecture, we will continuefrom the samepointandwhatever was there for the samplingdistribution for the otherestimation that is for thevariance and proportion and those we will see,both for their point estimation and for their interval estimation. So, our today’slecture outline is thatinterval estimation of this mean.Basically, this we started and started in the sense that we haveused thisfor thecertain when thestandard deviation for the population is known.In that case, we have just seen and we have also discussed the theory whenthestandard deviation is not known.So, we will start from herewherein which case thatthe standard deviation for the population is not known.We will start from that point and we will start with that example. And after that we will see what is interval estimation for thevariance and then we will see that boththe point estimation as well as for the interval estimation for the proportion.You know the proportion is generally used forsome of this distribution.For example,the Bernoulli distribution that we use there,so we will seethose estimates.If time permits, we will also discuss about this test of the hypothesis,which we have mentioned in the last class that when we are interested about the two different samples and we want to infer somethingabout their respective population then this test of hypothesis isimportant. So, to start with,we will start with thisinterval estimation of mean, basically one problem where thestandard deviation for the population is not known.Random samples of 25 concrete cubes are selected from the batch of concrete cubes prepared under certain process.The sample mean of25 concrete cubes found to be 24 kilo newton per meter cube and the sample standard deviation is 4kilo newton per meter cube.Determine the 99 percent and 95 percent confidence interval of the mean strength of thisconcrete cube.So,this is mean density of thisconcrete cubes. Now, if you justcompare this problem with ourlast class problem then we will see thatwhatever the data is given that is whatever the sample mean and this standard deviation, these arenumerically same to the whatever the problem that we have taken.In the last class, butonly thing here is that this is both.This isbecause, in the last class also, the mean was the sample mean butthe standard deviation was the standard deviation for the population, buthere we are using it is to be the sample standard deviation. Because you can see that thisrandom sample of 25 concrete cubes, earlier it wasmore than30,which can be considered to be the large sample and you know that we have discussed in the last class lecture that when it is more than 30, we can assume it to be the large sample,particularly for the estimation for this mean.When it is less than 30 that is generally a small sample,so you have to estimate the standard deviation from thesample itself. We have to use that standard t-distribution with n minus 1degrees of freedomto calculate whatever the confidence interval for such cases.So,basically here as you can see that the 24 kilo newton per meter cube and 4 kilo newton per meter cube,as thismean and standard deviation is given.What can we have is that basically that 25 concrete cubes are there, so 25values are can be obtained.From this one and from there, we can calculate what this mean isand from the estimator of this mean that we discuss in the last class.That is that x bar is equals to summation of all thesevalues divided by n; n is here 25, so that we can estimate and similarlywe can usethe estimator for the standard deviation to get this one. So, herethese values are already given just for this, because we arein this problem what we are mainly discussing is that to find out theirconfidence interval.So,basically the background to this data that isgiven to us, this 24 as well as this 4, is basically obtained from this 25 differentvalues that is obtained from the experiment. Well, here the n that isthe sample size is 25 and we know from our last lecture is that x bar minusthis mu divided by sand by square root n.Now, this is the s.Earlier, it was thepopulation, generally you know thatwhen we use thatEnglish letters then it is the sample, it is related to the sample.When we use thatthe Greekletters then this is generally to the population. So, this s when we are using,in earlierexample, we use the sigma, which is from the population and here the s it is from the sample that denotes.So,this x bar minus mu divided by sigma and by square root n, has a t-distribution with degrees of freedom n minus 1;n is25 here so n minus 1 is 24 degrees of freedom. Now, for the 99 percent confidence interval, here the alpha by 2 is equals to 0.005, so this is where from we have discussed in this last class that where you are getting thisvalue is that whenever we are talking about this confidence interval.So, if this is the distribution table thenso this area is your 99 percent.So,0.99 and so that whatever is remaining here and whatever is remaining here from the symmetry is same. So, this is your 0.005 and this is your 0.005.So, that makes total equals to 1. So, here we can have this value and that value multipliedby your s by square root nand added to that one, will give you the upper limit and subtracted from the mean will give you thelower limit. So, here that99 percent confidence interval, the alpha by 2 is equals to 0.005. From the t distributiontable we get the values of t 0.005 comma 24.This value when we are taking that cumulativeprobability equals to 0.995, and for f equals to,which is degrees of freedom that equals to 24.So, this p equals to 0.995 means if you refer to this one, that is from the minus infinity, this to support of t-distribution also you know that minus infinity plus infinity.From minus infinity to this point, the totalprobability that is covered is0.99 plus0.005 that is 0.995,sothatthis value we areinterested.What is this value?This is the value at which the cumulative probability is 0.995, so this is the value that we are looking for. This t-distribution safe changes with respect to f degrees of freedom.So, for the degrees of freedom 24, this value is equals to 2.797.Now, if you recall thatfrom our earlierexample that if it is a standard normal distribution,thenat this point99 percent confidence interval, this value was 2.575.So, this is changed and this change that you can see.This is basically for these degrees of freedom and that is 24. So,the lesser thesample size, lesser the degrees of freedom and this difference from this standard normalto this t will be more.Now, if this is increased and it goes beyond 30 then we can say that this value and the corresponding value for this standard normal distribution are essentially same.That is why that the 30 can be used as a judicial cut off to declare that greater than 30 is the large sample and less than 30 is the smaller sample for which we have to use thatt-distributionand that is you have done here. So, nowthis will be s,as just now I discussed that this is a sample estimate.The s divided by square root n multiplied by t alpha by 2n minus 1, so this quantity now become 4 by square root 25multiplied by this valuewill give you that of 2.24. If we justtake that sothis 99 percent confidenceinterval will be thedifference between whatever the sample mean, which is 24.So,24 minus2.24 and 24 plus 2.24 that is 99 percent confidenceinterval is 21.76 and 26.24.Note that this interval is larger,when compared to that where the standard deviation of the population was known.This is expected, because uncertainty is greater whenthe standard deviation is unknown. So, this was also we are mentioning in the last class, when we aretalking about this large sample and this small sample.So, you can see thatif we have this,if the sample size is small that means that time yourconfidence to infer something will be lower.So, as this is lower,so you have to specify a widerah interval to declare thatat this confidence level,whatever the actual value of thisparameter can be captured. Now, we are comparing and if we compare now the sameconfidence level one is then.So,both are 99 percent confidenceinterval that we are talking.Once we are using it for the large sample, which is greater than 30 and we have used that normal standard normal distribution.Therethat confidenceinterval, if you will see and if you compare it with that what we have seen now for the25samplessize. So, this interval is more than what we got in the earlier and this is because that we are having the less sample.So, ourdetermination is less, so we have togive a wider range to take care that with the same probabilitylevel.This can be that actual value and can be captured if the sample size even becomes lesser, so at the same confidence level, thisinterval will be even wider. Now,so far what we have seen is that basically that is called the two-sidedconfidencelimit of the mean.So,what is there that we aretalking here is that wheneverwe are talking about the confidence interval and then we are just comparing with respect to thiscentralvalue and whatever the confidence level that we have just put, is basically what is symmetrical with respect to the central value.So, this area what we have declared as my confidence and this is equated to what is my desireconfidence level. In many cases, we may not be interested of this both the sides of this confidence interval.I may be interested and particularly there are manyexample that we will discuss now is that we are interested only one side of this confidence.Now, if you want to know that what is the upper limit of theconfidence level and at this confidence on the lower side I am not interested.So, only the upper side and what is thisone?That is called the one sidedconfidence. So, the same thing what we have topresent here is that I need to know what is that point if Iuse thatsame confidence level,which is from this minusinfinity.So, from this minus infinity, this is my confidence zone where it is 0.99.Now, this area in earlier case it was divided both sides,so it is 0.005 here and 0.005here.In this case,when it is one side it is 0.1, when we are interested for the upper limit.If you are interested for the lower limit thenthe same thing we can plot andwe just interested to know,so this side from this one to this plus infinity up tohere, so this is your 0.99.This area is your 0.01, so this is the lower limit and this is the upper limit. This thing is common inalmost all theapplication field including civil engineering.In many real life problems, in civil engineering,only one of the confidence limit is of concern.Sayfor example, the upper limit of the mean wind velocity encountered at the top of this building, so I may not be interested what is the lower limit of this wind velocity. So,I want to knowthe upper limit of the mean wind velocity, because I need the maximum loadwhat the building canface and that isthe upper limit of the mean traffic volume capacity of a highway.So, this mean traffic volume if it is the lower side then obviously is not of my interest.I want to know that what should be the maximummean traffic volume that canbe expected over ahighway, the lower limit of the mean stress that can cause the failure in the steel specimen. So, in this case when thefailure is my concern, so I just want to know that what is the meanstress?Now, what is the lower low limitat which the specimen can fail?So, here we are interested only in the lower limit, upper limit is of no interest to us. The lower limit of the meandissolve oxygen,DO,in a streamfor sustaining aquatic life, so here the lower limit of this DO that is whether theminimum requirement for this aquatic life is maintained in the stream or not, thatI am interested.So, the upper limitof DO, in such case is not of my interest. So, I need only one side of this confidence interval. So, these are some of thisexample where we are interested toknow that what should be their limits; either I am interested in the upper limit or in the lower limit. So, whatthe change in this respectiveconcept is that only thing is that from whatever if you are looking for this upper limit thenyou have to start from this minus infinity to that point that is my confidencelevel, so that remaining part is 0.01; this one minus this confidence level,similarlyfor the lower limit.As contrast to the earlier case,when the confidence interval is symmetrical with respect to the central value, so where this the remainingprobability,is equally divided in the upper side as well as from the lower side in the earlier cases. We will justsee for such cases how we can get those confidence limits.Let that 1 minus alphabe the specified confidence level andthe standard deviation of the population be sigma.Therandom variable, x bar minus mu,is divided by sigma square root n follows the normal distribution.Hence, this 1 minus alpha is lower confidence limit of this mean mu is z minus z alpha sigma by square root n. So, you see that here also we know that standard deviation that is standard deviation of the population is known, which is sigma.So, it will follow that standard normal distribution and instead of using that z alpha by 2, we are using the alpha. the upper limit, if you are interested then that mean plus that z alpha sigma by square root n, as we have just now shown through a pictorial representation,where this z alpha is obtain from the standardnormal tables. If thevariance is unknown and this is basically for this smaller sample then this1 minus alpha be the specified confidence level.The number of samples be n and the sample standard deviation is this s.Thenthisx bar minus mu divided by s by square root n,obviously this is from the sample estimate.Hence, s by square root n follows t-distribution with degrees of freedom n minus 1 and that 1 minus alpha, the lower confidence limit of the mean mu is x bar minus t alpha n minus 1 s by square root n. Here you can see that it is instead of in the earlier case, we are using this alpha by 2, so it is t alpha and with this n minus 1 are the degrees of freedom. Similarly, for the upper confidence limit is that x bar plus that this quantity t alpha n minus1s by square root n.So, when this t alpha n minus 1 is in both of this expression from here; this t alpha n minus 1 and t alpha n minus 1 here that is used. This is basically is this value here that you can see,if it is the t- distributionwith degrees of freedom n minus 1; df,degrees of freedom equals to n minus 1.So, this we are looking for this value is your t alpha n minus 1 and this one is basically1 minus alpha,this0.99.So, this value we are interested, so this one we are just taking thisdeducted from this mean and this is added to the mean. Now, the confidence limit of this variance,so I hope that this one that we havediscussed for the upper limit or lowerlimit just asthe small change that we are doing that you can see and the respective valuesshould be picked up from this standardtable.Thenwe can determine whether the upper limit or the lower limit whatever is desired that we canpickup. Next we will move to the confidence interval of the variance.So, mean we havediscussed, now the variance is alsoonce we aregetting from the sample, so that also should follow some distribution.We should know what distribution it follows and should know what should be that confidence limit for the variance as well. For a normal population, so this is the background assumption that you can say thatfor a normal population, ifthe sample size n is small, then the exact confidence limit of the population variance sigma square can be determined as follows.So, the sample variance s square is you know that this is estimated that we have discussed last class. That is s square is equal to 1 by n minus 1,i equals to 1 to n, xi minus x bar square.this x bar is the sample mean and in xi there are n samples are there; x1, x2, x3, up to xn.This is the way we get that underestimate of thissample variance. So, this one as we get and if we just take this n minus 1, just after algebraic manipulation if we do then we can see that n minus 1of s square is equal to summation ofthese two quantity, which is xi minus mu minus x bar minus mu whole square. After somestep that we can express that this n minus 1of s square divided by sigma square, can be expressed through this that is i equals to 1 to n, xi minus mu by sigma whole square, minus x bar minus mu, divided by sigma by square root n whole square.As xi is normal, so thisx bar is also a normal distribution, so you know thatthis we discuss in the last class that is x bar is also onerandom variable, which follows a normal distribution with mean and this standard deviation of this sigma y square root n. Even though this xi isnot normal,then also thisdistribution of thisx bar is approximately correct.But,in case of the variance that assumption ismore crucial that is the background distribution of the population is normal.So,here you can see that this xi is normal,so x bar is also normal.If you just see this right hand side,so this is the left hand side of this expression and this right hand side.There are two components.What is the first term?If you just see, the first term is the sum of the square of n independent standard normal variants.So, xi is the normal distribution with mean mu and sigmais the standard deviation, so this x minus mu by sigma is a standard normal distribution. Hence, this is squareand we are summing up for thisn such standard normal distribution, we have seen in the earlier module of this function of this randomvariable that summation of this normal distribution.Ifit is square then andwe sum then for the n distribution, then the resulting distribution is a chi square distribution with n degrees of freedom.The second term again is also the square of the standard normal variate and it follows a chi square distribution with one degrees of freedom. Now, the summation of two chi square distribution having two different degrees offreedom is also another chi square distribution.So, what we can say from here is that this full quantity is one chi square distribution with degrees of freedom equals to n minus 1. That is why this n minus1 s square by sigma square, where s is thesample estimate of the variance and sigma square is the population variance.Their ratio multiplied by this n minus 1 is a chi square distribution with degreesof freedom n minus 1. This is what is written here.So, this n minus 1 s square by sigma square is a chi square distribution with n minus 1, degrees of freedom.Once we know the distribution then whatever the confidence limit and all we should get it from the chi square distribution.Hence,if you are interested to know the upper confidence limit of this population,the variance of sigma square is given by n minus 1s square by sigma square and it should be greater than equal to c alpha n minus 1 is equals to 1 minus alpha. Now, you see here this 1 minus alpha,this is the confidence that we are looking for and this 1 this c alpha n minus 1, is the value of chi square variate with degrees of freedom n minus 1 at cumulative probability alpha. Thesevalues we can get from these standard chi square distribution tables,which is available in mostof the text book.Thus, the 1 minus alpha upper confidence limit of the population variance sigma square of a normal population is n minus 1s square divided by c alpha n minus 1.So, once we get this value from thischi square table then we can and this is from the sample estimate.So, we can get this value, which is the upper confidence limit at 1 minus alpha confidence level. We will take one example that is of thedaily dissolved oxygen,DO; concentration at a particular location on a stream has been recorded for 20 days.The sample variance is found to be s square equals to 4.5milligram per liter.What is the 95 percent upper confidence limit of the population variance sigma square?So, the sample is known from the sample, we have estimatedthevariance and that estimation is 4.5 milligram per liter.The sample size is 20 days here.So, remember one thing here before I go to this solution that when we discuss that this is the mean and we have shown that this is more thanthe sample size, more than 30can be treated to be the large sample. When we are talking about thisvariance, even the 30 is not sufficient to declare that this is a largesample.So, generally for even the sample size is more than 30 also,herethe degree of freedom is always will be associated to this.Anyways,this is the chi square distribution,which we should get with respect to thatrespect to thosedegrees offreedom. Here, the sample size n equals to 20, as we have seen in thisexample problem and the samplevariance s square is the 4.5milligram per liter.So, this n minus 1 s square by sigma squarewill have chi square distribution with n minus 1 is equals to 19 degrees of freedom. So, we have to refer to thetable of this chi square, where this degrees of freedom.Generally, thetables are provided with different degrees of freedom starting from0 to at least somevalues up to 50 or so.We have to refer to that particulardistribution, for this degrees freedom equals to 19.Now, for whichcumulative probability level that you are interested that depends on what confidence level that you are looking for. So, here the 95 percentconfidence that we arelooking, here the alpha value is equal to 0.05 and the n equals to 20.So, this degrees of freedom is 19, so for this one,so c alpha n minus 1, that is c is 0.0519. If we see it from table we will see that it is value is 10.1. So, if we use this value then we can see that this n minus 1s square divided by c alpha n minus 1is equals to 19 multiplied by 4.5 divided by 10.1.So,7.99milligram per liter is the 95 percentconfidence limit.This isupper confidence limit of thispopulation variance is 7.99mg. So, if we are now interested for this lowerconfidence level for this 95 percentconfidence level, then this value will change.You have to find out the chi square value for the same degrees of freedom at the cumulative probability equals to0.95.This will be obviouslyhigher than this 10.1,so that the lower limit will be the lower than this value,what we have seen from this sample estimate.This value will behigher than whatever we have seen it for this 0.05 cumulative probability level. Next one of this parameter that is also very useful in many distributions is theproportion, so estimation of proportion is required in the situation where the probability is estimated as the proportion of the occurrence in a Bernoulli sequence.We have seen that in the Bernoulli distribution, this one the parameter that we use is the probability of success and that is denoted as p. So, there we need to know what the proportion is.Here, the examples are the proportionof the productin manufacturing units that meet the specific quality standard.What should be the proportion?I can saythismuchpercentage of this product arepass in the quality test. Second is that the proportion of the traffic taking,the left turn at a particular intersection.So, there are total numbers of vehicle you can count and out of which how many vehicles are taking the left turn.So, in that way we can estimate what should be the proportion that is taking the left turn at a particular intersection.Just to design thethat traffic at that particular junction. Now, let us consider that a sequence of n Bernoulli’s trialsX1,X2up to Xn, where the resultsof every trial can be either success or failure that is 1 or 0 respectively.Here theprobabilityp of occurrence of an event in such a Bernoulli trialis the parameter in the binomial distribution. This p,if we have to estimate, so thatestimate is xi and can take the values asyou can take the either 1 or0, so thatwhichever issuccess that we aredenoting as 1, and which one is failure we are denoting as 0.So, here again thatI mentioned many times earlier that success and failure are the arbitrarilyselected.So, even that tossing a coin head may be the success and tail may be the failure,so that when we are calculating the proportion it is basicallynothing but, of that sequence that 1,0, this binary numbers. The arithmetic mean of those numbers will give you the proportion that is what is estimated here that is summation of all these xi and divided by n will give you that estimate of thisproportion. Now, the expectation of this proportion, this estimate is that expectation of 1 by n and summation of these values, which we can write that expectation of these each and every entire outcome, which are also again a random variable.We know that for this Bernoulli’strial,each and every outcome is independent and having the same probability of the success, which we can just get thatxi is equals to again that p and which can be shown from this equation.So, this is p and this expectation of thisestimate of this P cap is 1 by n multiplied by np.So, there are n numbers of expectation,all the expectation of p, so this quantity is np.This expectation of this estimate also is equals to p. Thevariance, if we calculate then 1 by n square of this variance of this xi, and this expectation of x square is also that p.So, this is1 by n square andthis multiplied by this n of this one is pand this one is p square.So, the p into 1 minus p divided by n, this is the variance of the estimate of this proportion. Thus, the variance of the estimatordecreases with the increase in the sample size n that you can see.So, this is the variance of the P cap, not the P bar, this is the proportion and it is centered about thepopulation proportion p. Now, whenn is large that P cap follows the Gaussian distribution withmean p and variance is that pminus 1 minus p cap by n.So, we have that mean also,it is from this sample.This we can,where this p cap is the observed proportion from the sample and theconfidence interval of this p and once we know that this is having thatnormal distribution then this p cap minus this mean, divided by its standard deviation, which is nothing but p cap minus 1 minusp cap divide by n, that is the square root of the variance. This should have the confidence limit of this minus z alpha by 2up to z alpha by 2 with theconfidence level 1 minus alpha.So, this we have discussed in the last lecture also.We just taken for the standard normal distribution and this value we are looking for that particularquintilevaluewhere at the confidence level,at the cumulative probability level is 1 minus alpha by 2. So, thus the confidence interval of theproportion is just after from thisthing.We have to just see the confidence interval of this p and then we will just multiply this quantitywith this value and added to this one.So,we will get that p cap minus z alpha by 2 square root of this variance that is standard deviation and the upper limit is the p cap plusz alpha by 2 square root of p cap into1 minus p cap by n. Now,you take one example of thisproportion and its confidence intervalduring the inspection of quality of the soil compaction in a highwayproject. Then45 out of 60specimens that were inspected could pass the CBRrequirement.So, here again this is aBernoulliprocess, where the one particular specimen may or may not pass theCBR test andout of 60 specimen 45 specimen is as passed. So,the first question is what is the proportion p of the embankmentthat will be wellcompacted that is pass the CBR’s test.So, this is a point estimation and that what you have seen the point estimation is justa justratio of whatever thenumber of success divided by total number of specimen.That we can estimate and the second thing is that what is the 95 percent confidence interval of that p?That we will see and how we have to use this standard normal distribution. Now, you can see that thepoint estimate of proportion p of the embankment that will be well compacted that is p cap as I told that it should be the 45 divided by 60is equals 0.75.Thisis straight forward.Now, the 95 percent confidence interval of this p is that this estimate minus this z alpha by 2 into this theirstandard deviation and p cap plus z alpha by 2 multiplied by this standard deviation.So, this alpha by2, in case of 95 confidence level that we have seen in this last lecture, also which is 1.96, so 0.75 minus 1.96 square root of this and 0.75 plus 1.96 square root of thismagnitude, which is the variance of this estimate. So, if we do this one then we will get that this quantity becomes 0.64 and other one become 0.86.So, theconfidence interval for the estimate of this proportion t cap is 0.64and 0.68. Also, you can see that it is symmetrical with respect to that 0.75.If we increase thisconfidence intervalto for say for example, from 95 percent to 99 percent then it will go wider.This one will be even become lower and this one even become higher.At95 percent confidence interval this confidencelimits is0.65 and 0.86. So farwe have discussed about thepoint estimation and in this lecture we have discussed about this interval estimation of this parameter.Now, basically what happens is this estimation when we are doing with the respect to one data set that is available to us and now,suppose that there are two data sets are available to us.We may sometimes interest to know thatwhether the mean of that population can be like this or it cannot. I can even say that whetherI can say that two samples are coming from the same population or not that means the parameters that isassociated with the populationand what we are getting from thosesamples those are same or not. Toanswerthis type ofquestion what we need that is known as this Hypothesis Testing.That iswhat ournextinterest that we will discuss now.In this hypothesis testing, in real lifedecision making,it is often necessary to decidewhether a statement concerning a parameter of a probability distribution is true or false. This hypothesis test is used to checkthe validity of a possibility or guessmeans like that kind of English word that we are using which is basically is known as this hypothesisof one possibility. This just mayhappen like thatabout the population where the necessary decision can betakendepending on the test result. Suppose thatfor exampleif I take on this point one civil engineeringreal life scenario likesuppose that I am estimating the strength of a concrete.So,whatwe are supposed to get is thatI need say for examplefor a specific requirement,I need the strength should be greater than some threshold value that we know. Now, how to test that one,so we have to take some sample and we will get say thatsome sample mean.Now,this sample mean may or may not cross that particular thresholdvalue or it may be the difference from whatever our expectation and what we are getting from thissample,obviously will not matchas soon as we are changing the sample also that will change. So, depending on that,now we have to test whether thatdifference that we are getting is it by chance or there is some problem in the constructionitself.So, depending on that after this testresult of this hypothesis testing that we have to decide on what is that; whether it is by chance that the difference that we got or there is some problem in this problem.What we are targeting or whatwe are trying to test that is whether the strength can be assumed toexceed thatthreshold value.That possibility that guess is known as thehypothesis here.So, that hypothesis we have to testand depending on that test result, we have to take somenecessary decision whatever we can take in that.These decisions are basically based on this statistics and known as this the statistical inference that we can draw from whatever the sample data that we are having. When a probabilistic model is developedto describe some process, say for example, we will takein these coming lectures,afterfew lectures, we will take thatregression, which is one of theveryimportant models.There if we just express thoseexpressions then that model whetherthat is exactly matching with this data or not, so that probabilistic model which we have developed it may be found that the observed dataset matches partly with the model. In such cases,the deviations from the observations from the model may be due to two cases. One is that actual inadequacy of the model or some chance variation.So, these two are important.The actual inadequacy of the model, if we declare this one,then we have to seriously think about that we have to change the model.Those modelsshould be changed so that we canbetter explain the process. On the other hand,even though the model is correct,it will never match with the observed data set.So, that variation is called the chance variation.In such cases that is whether it is varyingby chance or there aresome real inadequacies there in the model.Forthat case, this hypothesis testing is used as statistical test to determine whether the devised model is adequate or not. There are two types of errors in the hypothesis testing.These are known as the type one error and type two errors.So, the true hypothesis gets rejected, sothe hypothesis that we are first assuming say that is actually correct and through this hypothesis testing,if we rejected that means that we arefacing one type of error and that is known as this type one error.The probability of this type one error is denoted by alpha. Similarly, when the hypothesis is false, butwe fail to reject that hypothesis that time also, we aredoing another error and that is known as this type two error.Probability of type twoerrors is denoted by beta.So, if I just see through this table thatthe true situation,the hypothesis can be true or the hypothesis can be false.The decision that is taken after the hypothesis testing that we fail to reject thehypothesis. So, if we fail to reject the hypothesis, in case when the hypothesis is true, then obviously there is no error.So, we have taken with the right thing that has occurred.Now, if we fail to reject the hypothesis and the hypothesis is false then that is the type two error as we have explained here. On the other hand,if we reject the hypothesisand the hypothesis is true then also this is one type of error which is known as the type one error.If we reject the hypothesis and hypothesis is false,then also there is no error occurred and the probability of this type one error is alpha. It is denoted by alpha and probability of type two error is denoted by beta. Now,you can see this type one error when the hypothesis istrue and we reject the hypothesis.Basically, from the manufacturer point of view,this type of error is very critical and we generally allow very low value for thistype one error.After giving this one we try to minimize as much as we can for this type two error. The task of this hypothesis testing is there are the followingsystematic steps we can follow in this hypothesis testing,the formulation of a null hypothesis and appropriate alternative hypothesis, which is accepted if the null hypothesis has to be rejected. So, the definition of this null hypothesis such as what is null hypothesis and what is alternative hypothesis we will see in a minute.Onlyone point that we should note here that whatever we are putting in this null hypothesis,we can say thateither we should reject that null hypothesis or we should say that based on the data available, we fail to reject the null hypothesis. Generally, it will be wrong to state that this null hypothesis is accepted.So,aswe cannot declare this null hypothesis is accepted,maximum what we can say that we fail to reject this null hypothesis.So, on the other hand what you can say that null hypothesisis rejected.When you say that null hypothesis is rejected that time we can accept this alternative hypothesis.So, our in this hypothesis test whatever the hypothesis that we want to test is generally put in this alternative hypothesis. So, at the end of this test, if the null hypothesis is rejected, we can say that the alternative what basically we are trying to test, we can accept thathypothesis.So, whatever our goalor whatever we are trying to test is generally put in the alternative hypothesis,specifying the level of significance.This is the second step.Once we haveclearly defined what my null hypothesis is and what my alternative hypothesis,I have to define one level of significance. That is the probability alpha of the type one error.As we just now told that our goal is that for this type one error,we should minimize as much as we can, so generally this alpha can assign the value of5 percent or 1 percent like that, so alpha equals to 0.05 alpha equals to 0.1.Theseare the typical values of this alpha thatwe can fix. Of course, in some alternatives, the probability of beta that is probability beta of type two error may also be specified.But, in general casewhat wedo is the level of significance isgenerallypredetermined before I go for this hypothesis testing. Then construction of criteria is based on the sampling distribution of a suitable statistic for testing the null hypothesis.That suitable statistics that we have todevelop and we should know what is the probabilistic nature of the statistics.In theearlierdiscussion, of this parameter estimation there also we have developed somestatistics and we have seen that what the sampling distribution of different statistics is.Similarly,here also the suitablestatistics for this test, we have toconstruct anddepending on their probabilistic distributionwe have tofind out what is thecriticalvalue for that. Thissuitable statistic has to be constructed.Then a calculation of this value of this statistics from the data, so whatever the statisticsthat you have decided, for the hypothesis test inhand.For that one whatever the observed value that we are having,based on which we are testing this hypothesis that statistical value should beobtained. After thatthe final step is for this decision making.That is rejecting the null hypothesis or we can say that fail to reject the null hypothesis.So, as we are telling that using the two words, this null hypothesis and the alternative hypothesis, the null hypothesis is any hypothesis that is tested to see if it can be rejected; it is denoted by H naught. The alternative hypothesisis that which is accepted, if the null hypothesis has to be rejected.It is denoted by Ha.Alternative hypothesis may be one sided or two sided.We willsee what is one sided and what is two sided here.So, what we have just discussing that in case we can reject the null hypothesis then we accept this alternative hypothesis.We never say that the null hypothesis is accepted, so that is why whatever our goal is we generally put it in the alternative hypothesis. Now, the example of this one sided and two sided that we will see is let us consider that a mix ofconcrete is acceptable for a certain construction, only ifthe mean strength is greater than 25kilo newton per meter square.So, if this is the threshold value and I say that only if it can be greater than this one then only it isaccepted. Whether based on this sample data, we should decide or we should accept that yes the strength is greater than 25kilo newton meter square or not that we will see.So, that is why in the alternative hypothesis, we put that mu is greater than 25.So, the mu is greater than 25, is in this alternative hypothesis.So, in the null hypothesis, the remaining thing that is less than equals to 25 is put there in thisnull hypothesis. Remember that hereit is acceptable only if the mean strength is greater than 25 kilo newton per meter square is written.If it is written that it is greater than equal to 25 kilo newton meter square then in this one sidealternative hypothesis,theequality sign we put here that is mu greater than equal to 25.Here, in this null hypothesis, we write that mu less than25.So, depending on this statement what we are trying to testthat will be exactlywritten in this alternative hypothesis. Similarly,the example of two sided alternative is the let us consider that in a cement plant, the average amount of cement in one bag should be 50 kg.Now, if we want to check whether the average amount is not much greater than 25 kg, so if it is more than 25 kg, this will be the loss for the manufacturer.It should not be much less than 50 fifty kg as well because this will dissatisfy the customers. Then what we are trying to test is that whetherwe can accept that this check is whetherper bag the amount is 50 kg or not.So, in this null hypothesis, it is written that mu is equals to50 and two sided alternative hypothesisthat we write is the mu is not equal to 50.So, in this case,if we fail to reject that null hypothesis then the job is done,in the sense that manufacturer is happy and yes the mu equals to 50 kg that is 50 kg per bag whatever the sample that we have taken cannot be rejected. Level of significance says hypothesis testing is always associated with the level of significance.It is equal to the probability alpha of this type one error,so as we have discussed earlier.This value of this alpha is generally fixed at 0.051or 0.01, butmay vary depending on the consequence of committing the type one error.If the chosen value of alpha is too small then the probability of type 2 error is generally increases. TheP-value for a given test statistics and the null hypothesis is the probabilityof obtaining atest statisticsvaluethat is equally extreme or more extreme than the observed. For a right side alternative hypothesis H1 equals to say mu is greater than some threshold value mu naught, only the values greater than the observed values of the test statistic are considered more extreme.For a left sided alternative hypothesis, when mu is less than some threshold value, only the values lesser than observed values for the test statistics are considered more extreme.For a two sided alternative hypothesis the values on both the tails are considered more extreme. So, basically if you see this P-value,suppose this is the distribution of this statisticsthat we are considering.Now, in this if you just see that the test statistics comes to this value then this is basicallythe test ofsignificance at this level. So, which we generally fixed as0.05 or 0.01, as the case may be.Now, suppose that thestatistic is falling in this zone, then the P-value whatever is remaining here for the one sided test, the P-value is this green highlighted area.So, this area is known as this P-value.This is for this one sided.Obviously, the upper side,similarly, it can happen for this lower side also.Or if it is the two sided test then the symmetrical value we have to take here and this area also should be included for that for the calculation of this P-value. So, once we set this significance level, then we get what is the zone.So, if the test statistics fall in this zone then we should reject thenullhypothesis.If itdoes not fall in this criticalregion, the blue highlighted is the critical regionfor this any significance level then we cannot reject this null hypothesis. We will continue with this one, with thisdiscussion of this hypothesis testing in the next lecture. Alsowe will take up some problems also and various hypotheses concerning two means, one variance or two variance.This type of problems we will take in the next lecture.