Tip:
Highlight text to annotate it
X
Welcome to this second lecture of this module and you know that this module is onprobability
and statistics.We, in the last class, have started thediscussion on the sampling distribution
and parameter estimation and we have seen there are two different methods of this point
distribution and we started thatinterval estimation.We completed that interval estimation forthe
meanwith known variance. So, basically in today’s lecture, we will
continuefrom the samepointandwhatever was there for the samplingdistribution for the
otherestimation that is for thevariance and proportion and those we will see,both for
their point estimation and for their interval estimation.
So, our today’slecture outline is thatinterval estimation of this mean.Basically, this we
started and started in the sense that we haveused thisfor thecertain when thestandard deviation
for the population is known.In that case, we have just seen and we have also discussed
the theory whenthestandard deviation is not known.So, we will start from herewherein which
case thatthe standard deviation for the population is not known.We will start from that point
and we will start with that example. And after that we will see what is interval
estimation for thevariance and then we will see that boththe point estimation as well
as for the interval estimation for the proportion.You know the proportion is generally used forsome
of this distribution.For example,the Bernoulli distribution that we use there,so we will
seethose estimates.If time permits, we will also discuss about this test of the hypothesis,which
we have mentioned in the last class that when we are interested about the two different
samples and we want to infer somethingabout their respective population then this test
of hypothesis isimportant.
So, to start with,we will start with thisinterval estimation of mean, basically one problem
where thestandard deviation for the population is not known.Random samples of 25 concrete
cubes are selected from the batch of concrete cubes prepared under certain process.The sample
mean of25 concrete cubes found to be 24 kilo newton per meter cube and the sample standard
deviation is 4kilo newton per meter cube.Determine the 99 percent and 95 percent confidence interval
of the mean strength of thisconcrete cube.So,this is mean density of thisconcrete cubes.
Now, if you justcompare this problem with ourlast class problem then we will see thatwhatever
the data is given that is whatever the sample mean and this standard deviation, these arenumerically
same to the whatever the problem that we have taken.In the last class, butonly thing here
is that this is both.This isbecause, in the last class also, the mean was the sample mean
butthe standard deviation was the standard deviation for the population, buthere we are
using it is to be the sample standard deviation. Because you can see that thisrandom sample
of 25 concrete cubes, earlier it wasmore than30,which can be considered to be the large sample and
you know that we have discussed in the last class lecture that when it is more than 30,
we can assume it to be the large sample,particularly for the estimation for this mean.When it is
less than 30 that is generally a small sample,so you have to estimate the standard deviation
from thesample itself. We have to use that standard t-distribution
with n minus 1degrees of freedomto calculate whatever the confidence interval for such
cases.So,basically here as you can see that the 24 kilo newton per meter cube and 4 kilo
newton per meter cube,as thismean and standard deviation is given.What can we have is that
basically that 25 concrete cubes are there, so 25values are can be obtained.From this
one and from there, we can calculate what this mean isand from the estimator of this
mean that we discuss in the last class.That is that x bar is equals to summation of all
thesevalues divided by n; n is here 25, so that we can estimate and similarlywe can usethe
estimator for the standard deviation to get this one.
So, herethese values are already given just for this, because we arein this problem what
we are mainly discussing is that to find out theirconfidence interval.So,basically the
background to this data that isgiven to us, this 24 as well as this 4, is basically obtained
from this 25 differentvalues that is obtained from the experiment.
Well, here the n that isthe sample size is 25 and we know from our last lecture is that
x bar minusthis mu divided by sand by square root n.Now, this is the s.Earlier, it was
thepopulation, generally you know thatwhen we use thatEnglish letters then it is the
sample, it is related to the sample.When we use thatthe Greekletters then this is generally
to the population. So, this s when we are using,in earlierexample, we use the sigma,
which is from the population and here the s it is from the sample that denotes.So,this
x bar minus mu divided by sigma and by square root n, has a t-distribution with degrees
of freedom n minus 1;n is25 here so n minus 1 is 24 degrees of freedom.
Now, for the 99 percent confidence interval, here the alpha by 2 is equals to 0.005, so
this is where from we have discussed in this last class that where you are getting thisvalue
is that whenever we are talking about this confidence interval.So, if this is the distribution
table thenso this area is your 99 percent.So,0.99 and so that whatever is remaining here and
whatever is remaining here from the symmetry is same. So, this is your 0.005 and this is
your 0.005.So, that makes total equals to 1. So, here we can have this value and that
value multipliedby your s by square root nand added to that one, will give you the upper
limit and subtracted from the mean will give you thelower limit.
So, here that99 percent confidence interval, the alpha by 2 is equals to 0.005. From the
t distributiontable we get the values of t 0.005 comma 24.This value when we are taking
that cumulativeprobability equals to 0.995, and for f equals to,which is degrees of freedom
that equals to 24.So, this p equals to 0.995 means if you refer to this one, that is from
the minus infinity, this to support of t-distribution also you know that minus infinity plus infinity.From
minus infinity to this point, the totalprobability that is covered is0.99 plus0.005 that is 0.995,sothatthis
value we areinterested.What is this value?This is the value at which the cumulative probability
is 0.995, so this is the value that we are looking for.
This t-distribution safe changes with respect to f degrees of freedom.So, for the degrees
of freedom 24, this value is equals to 2.797.Now, if you recall thatfrom our earlierexample
that if it is a standard normal distribution,thenat this point99 percent confidence interval,
this value was 2.575.So, this is changed and this change that you can see.This is basically
for these degrees of freedom and that is 24. So,the lesser thesample size, lesser the degrees
of freedom and this difference from this standard normalto this t will be more.Now, if this
is increased and it goes beyond 30 then we can say that this value and the corresponding
value for this standard normal distribution are essentially same.That is why that the
30 can be used as a judicial cut off to declare that greater than 30 is the large sample and
less than 30 is the smaller sample for which we have to use thatt-distributionand that
is you have done here. So, nowthis will be s,as just now I discussed that this is a sample
estimate.The s divided by square root n multiplied by t alpha by 2n minus 1, so this quantity
now become 4 by square root 25multiplied by this valuewill give you that of 2.24.
If we justtake that sothis 99 percent confidenceinterval will be thedifference between whatever the
sample mean, which is 24.So,24 minus2.24 and 24 plus 2.24 that is 99 percent confidenceinterval
is 21.76 and 26.24.Note that this interval is larger,when compared to that where the
standard deviation of the population was known.This is expected, because uncertainty is greater
whenthe standard deviation is unknown. So, this was also we are mentioning in the last
class, when we aretalking about this large sample and this small sample.So, you can see
thatif we have this,if the sample size is small that means that time yourconfidence
to infer something will be lower.So, as this is lower,so you have to specify a widerah
interval to declare thatat this confidence level,whatever the actual value of thisparameter
can be captured. Now, we are comparing and if we compare now
the sameconfidence level one is then.So,both are 99 percent confidenceinterval that we
are talking.Once we are using it for the large sample, which is greater than 30 and we have
used that normal standard normal distribution.Therethat confidenceinterval, if you will see and if
you compare it with that what we have seen now for the25samplessize. So, this interval
is more than what we got in the earlier and this is because that we are having the less
sample.So, ourdetermination is less, so we have togive a wider range to take care that
with the same probabilitylevel.This can be that actual value and can be captured if the
sample size even becomes lesser, so at the same confidence level, thisinterval will be
even wider.
Now,so far what we have seen is that basically that is called the two-sidedconfidencelimit
of the mean.So,what is there that we aretalking here is that wheneverwe are talking about
the confidence interval and then we are just comparing with respect to thiscentralvalue
and whatever the confidence level that we have just put, is basically what is symmetrical
with respect to the central value.So, this area what we have declared as my confidence
and this is equated to what is my desireconfidence level.
In many cases, we may not be interested of this both the sides of this confidence interval.I
may be interested and particularly there are manyexample that we will discuss now is that
we are interested only one side of this confidence.Now, if you want to know that what is the upper
limit of theconfidence level and at this confidence on the lower side I am not interested.So,
only the upper side and what is thisone?That is called the one sidedconfidence.
So, the same thing what we have topresent here is that I need to know what is that point
if Iuse thatsame confidence level,which is from this minusinfinity.So, from this minus
infinity, this is my confidence zone where it is 0.99.Now, this area in earlier case
it was divided both sides,so it is 0.005 here and 0.005here.In this case,when it is one
side it is 0.1, when we are interested for the upper limit.If you are interested for
the lower limit thenthe same thing we can plot andwe just interested to know,so this
side from this one to this plus infinity up tohere, so this is your 0.99.This area is
your 0.01, so this is the lower limit and this is the upper limit.
This thing is common inalmost all theapplication
field including civil engineering.In many real life problems, in civil engineering,only
one of the confidence limit is of concern.Sayfor example, the upper limit of the mean wind
velocity encountered at the top of this building, so I may not be interested what is the lower
limit of this wind velocity. So,I want to knowthe upper limit of the mean wind velocity,
because I need the maximum loadwhat the building canface and that isthe upper limit of the
mean traffic volume capacity of a highway.So, this mean traffic volume if it is the lower
side then obviously is not of my interest.I want to know that what should be the maximummean
traffic volume that canbe expected over ahighway, the lower limit of the mean stress that can
cause the failure in the steel specimen. So, in this case when thefailure is my concern,
so I just want to know that what is the meanstress?Now, what is the lower low limitat which the specimen
can fail?So, here we are interested only in the lower limit, upper limit is of no interest
to us. The lower limit of the meandissolve oxygen,DO,in a streamfor sustaining aquatic
life, so here the lower limit of this DO that is whether theminimum requirement for this
aquatic life is maintained in the stream or not, thatI am interested.So, the upper limitof
DO, in such case is not of my interest. So, I need only one side of this confidence interval.
So, these are some of thisexample where we are interested toknow that what should be
their limits; either I am interested in the upper limit or in the lower limit.
So, whatthe change in this respectiveconcept is that only thing is that from whatever if
you are looking for this upper limit thenyou have to start from this minus infinity to
that point that is my confidencelevel, so that remaining part is 0.01; this one minus
this confidence level,similarlyfor the lower limit.As contrast to the earlier case,when
the confidence interval is symmetrical with respect to the central value, so where this
the remainingprobability,is equally divided in the upper side as well as from the lower
side in the earlier cases.
We will justsee for such cases how we can get those confidence limits.Let that 1 minus
alphabe the specified confidence level andthe standard deviation of the population be sigma.Therandom
variable, x bar minus mu,is divided by sigma square root n follows the normal distribution.Hence,
this 1 minus alpha is lower confidence limit of this mean mu is z minus z alpha sigma by
square root n. So, you see that here also we know that standard
deviation that is standard deviation of the population is known, which is sigma.So, it
will follow that standard normal distribution and instead of using that z alpha by 2, we
are using the alpha. the upper limit, if you are interested then that mean plus that z
alpha sigma by square root n, as we have just now shown through a pictorial representation,where
this z alpha is obtain from the standardnormal tables.
If thevariance is unknown and this is basically for this smaller sample then this1 minus alpha
be the specified confidence level.The number of samples be n and the sample standard deviation
is this s.Thenthisx bar minus mu divided by s by square root n,obviously this is from
the sample estimate.Hence, s by square root n follows t-distribution with degrees of freedom
n minus 1 and that 1 minus alpha, the lower confidence limit of the mean mu is x bar minus
t alpha n minus 1 s by square root n. Here you can see that it is instead of in
the earlier case, we are using this alpha by 2, so it is t alpha and with this n minus
1 are the degrees of freedom. Similarly, for the upper confidence limit is that x bar plus
that this quantity t alpha n minus1s by square root n.So, when this t alpha n minus 1 is
in both of this expression from here; this t alpha n minus 1 and t alpha n minus 1 here
that is used.
This is basically is this value here that you can see,if it is the t- distributionwith
degrees of freedom n minus 1; df,degrees of freedom equals to n minus 1.So, this we are
looking for this value is your t alpha n minus 1 and this one is basically1 minus alpha,this0.99.So,
this value we are interested, so this one we are just taking thisdeducted from this
mean and this is added to the mean.
Now, the confidence limit of this variance,so I hope that this one that we havediscussed
for the upper limit or lowerlimit just asthe small change that we are doing that you can
see and the respective valuesshould be picked up from this standardtable.Thenwe can determine
whether the upper limit or the lower limit whatever is desired that we canpickup.
Next we will move to the confidence interval of the variance.So, mean we havediscussed,
now the variance is alsoonce we aregetting from the sample, so that also should follow
some distribution.We should know what distribution it follows and should know what should be
that confidence limit for the variance as well. For a normal population, so this is
the background assumption that you can say thatfor a normal population, ifthe sample
size n is small, then the exact confidence limit of the population variance sigma square
can be determined as follows.So, the sample variance s square is you know that this is
estimated that we have discussed last class. That is s square is equal to 1 by n minus
1,i equals to 1 to n, xi minus x bar square.this x bar is the sample mean and in xi there are
n samples are there; x1, x2, x3, up to xn.This is the way we get that underestimate of thissample
variance. So, this one as we get and if we just take this n minus 1, just after algebraic
manipulation if we do then we can see that n minus 1of s square is equal to summation
ofthese two quantity, which is xi minus mu minus x bar minus mu whole square.
After somestep that we can express that this n minus 1of s square divided by sigma square,
can be expressed through this that is i equals to 1 to n, xi minus mu by sigma whole square,
minus x bar minus mu, divided by sigma by square root n whole square.As xi is normal,
so thisx bar is also a normal distribution, so you know thatthis we discuss in the last
class that is x bar is also onerandom variable, which follows a normal distribution with mean
and this standard deviation of this sigma y square root n.
Even though this xi isnot normal,then also thisdistribution of thisx bar is approximately
correct.But,in case of the variance that assumption ismore crucial that is the background distribution
of the population is normal.So,here you can see that this xi is normal,so x bar is also
normal.If you just see this right hand side,so this is the left hand side of this expression
and this right hand side.There are two components.What is the first term?If you just see, the first
term is the sum of the square of n independent standard normal variants.So, xi is the normal
distribution with mean mu and sigmais the standard deviation, so this x minus mu by
sigma is a standard normal distribution. Hence, this is squareand we are summing up
for thisn such standard normal distribution, we have seen in the earlier module of this
function of this randomvariable that summation of this normal distribution.Ifit is square
then andwe sum then for the n distribution, then the resulting distribution is a chi square
distribution with n degrees of freedom.The second term again is also the square of the
standard normal variate and it follows a chi square distribution with one degrees of freedom.
Now, the summation of two chi square distribution having two different degrees offreedom is
also another chi square distribution.So, what we can say from here is that this full quantity
is one chi square distribution with degrees of freedom equals to n minus 1. That is why
this n minus1 s square by sigma square, where s is thesample estimate of the variance and
sigma square is the population variance.Their ratio multiplied by this n minus 1 is a chi
square distribution with degreesof freedom n minus 1.
This is what is written here.So, this n minus 1 s square by sigma square is a chi square
distribution with n minus 1, degrees of freedom.Once we know the distribution then whatever the
confidence limit and all we should get it from the chi square distribution.Hence,if
you are interested to know the upper confidence limit of this population,the variance of sigma
square is given by n minus 1s square by sigma square and it should be greater than equal
to c alpha n minus 1 is equals to 1 minus alpha. Now, you see here this 1 minus alpha,this
is the confidence that we are looking for and this 1 this c alpha n minus 1, is the
value of chi square variate with degrees of freedom n minus 1 at cumulative probability
alpha.
Thesevalues we can get from these standard chi square distribution tables,which is available
in mostof the text book.Thus, the 1 minus alpha upper confidence limit of the population
variance sigma square of a normal population is n minus 1s square divided by c alpha n
minus 1.So, once we get this value from thischi square table then we can and this is from
the sample estimate.So, we can get this value, which is the upper confidence limit at 1 minus
alpha confidence level.
We will take one example that is of thedaily dissolved oxygen,DO; concentration at a particular
location on a stream has been recorded for 20 days.The sample variance is found to be
s square equals to 4.5milligram per liter.What is the 95 percent upper confidence limit of
the population variance sigma square?So, the sample is known from the sample, we have estimatedthevariance
and that estimation is 4.5 milligram per liter.The sample size is 20 days here.So, remember one
thing here before I go to this solution that when we discuss that this is the mean and
we have shown that this is more thanthe sample size, more than 30can be treated to be the
large sample. When we are talking about thisvariance, even
the 30 is not sufficient to declare that this is a largesample.So, generally for even the
sample size is more than 30 also,herethe degree of freedom is always will be associated to
this.Anyways,this is the chi square distribution,which we should get with respect to thatrespect
to thosedegrees offreedom.
Here, the sample size n equals to 20, as we have seen in thisexample problem and the samplevariance
s square is the 4.5milligram per liter.So, this n minus 1 s square by sigma squarewill
have chi square distribution with n minus 1 is equals to 19 degrees of freedom. So,
we have to refer to thetable of this chi square, where this degrees of freedom.Generally, thetables
are provided with different degrees of freedom starting from0 to at least somevalues up to
50 or so.We have to refer to that particulardistribution, for this degrees freedom equals to 19.Now,
for whichcumulative probability level that you are interested that depends on what confidence
level that you are looking for. So, here the 95 percentconfidence that we arelooking, here
the alpha value is equal to 0.05 and the n equals to 20.So, this degrees of freedom is
19, so for this one,so c alpha n minus 1, that is c is 0.0519. If we see it from table
we will see that it is value is 10.1.
So, if we use this value then we can see that this n minus 1s square divided by c alpha
n minus 1is equals to 19 multiplied by 4.5 divided by 10.1.So,7.99milligram per liter
is the 95 percentconfidence limit.This isupper confidence limit of thispopulation variance
is 7.99mg.
So, if we are now interested for this lowerconfidence level for this 95 percentconfidence level,
then this value will change.You have to find out the chi square value for the same degrees
of freedom at the cumulative probability equals to0.95.This will be obviouslyhigher than this
10.1,so that the lower limit will be the lower than this value,what we have seen from this
sample estimate.This value will behigher than whatever we have seen it for this 0.05 cumulative
probability level.
Next one of this parameter that is also very useful in many distributions is theproportion,
so estimation of proportion is required in the situation where the probability is estimated
as the proportion of the occurrence in a Bernoulli sequence.We have seen that in the Bernoulli
distribution, this one the parameter that we use is the probability of success and that
is denoted as p. So, there we need to know what the proportion is.Here, the examples
are the proportionof the productin manufacturing units that meet the specific quality standard.What
should be the proportion?I can saythismuchpercentage of this product arepass in the quality test.
Second is that the proportion of the traffic taking,the left turn at a particular intersection.So,
there are total numbers of vehicle you can count and out of which how many vehicles are
taking the left turn.So, in that way we can estimate what should be the proportion that
is taking the left turn at a particular intersection.Just to design thethat traffic at that particular
junction.
Now, let us consider that a sequence of n Bernoulli’s trialsX1,X2up to Xn, where the
resultsof every trial can be either success or failure that is 1 or 0 respectively.Here
theprobabilityp of occurrence of an event in such a Bernoulli trialis the parameter
in the binomial distribution.
This p,if we have to estimate, so thatestimate is xi and can take the values asyou can take
the either 1 or0, so thatwhichever issuccess that we aredenoting as 1, and which one is
failure we are denoting as 0.So, here again thatI mentioned many times earlier that success
and failure are the arbitrarilyselected.So, even that tossing a coin head may be the success
and tail may be the failure,so that when we are calculating the proportion it is basicallynothing
but, of that sequence that 1,0, this binary numbers.
The arithmetic mean of those numbers will give you the proportion that is what is estimated
here that is summation of all these xi and divided by n will give you that estimate of
thisproportion. Now, the expectation of this proportion, this estimate is that expectation
of 1 by n and summation of these values, which we can write that expectation of these each
and every entire outcome, which are also again a random variable.We know that for this Bernoulli’strial,each
and every outcome is independent and having the same probability of the success, which
we can just get thatxi is equals to again that p and which can be shown from this equation.So,
this is p and this expectation of thisestimate of this P cap is 1 by n multiplied by np.So,
there are n numbers of expectation,all the expectation of p, so this quantity is np.This
expectation of this estimate also is equals to p.
Thevariance, if we calculate then 1 by n square of this variance of this xi, and this expectation
of x square is also that p.So, this is1 by n square andthis multiplied by this n of this
one is pand this one is p square.So, the p into 1 minus p divided by n, this is the variance
of the estimate of this proportion. Thus, the variance of the estimatordecreases with
the increase in the sample size n that you can see.So, this is the variance of the P
cap, not the P bar, this is the proportion and it is centered about thepopulation proportion
p.
Now, whenn is large that P cap follows the Gaussian distribution withmean p and variance
is that pminus 1 minus p cap by n.So, we have that mean also,it is from this sample.This
we can,where this p cap is the observed proportion from the sample and theconfidence interval
of this p and once we know that this is having thatnormal distribution then this p cap minus
this mean, divided by its standard deviation, which is nothing but p cap minus 1 minusp
cap divide by n, that is the square root of the variance.
This should have the confidence limit of this minus z alpha by 2up to z alpha by 2 with
theconfidence level 1 minus alpha.So, this we have discussed in the last lecture also.We
just taken for the standard normal distribution and this value we are looking for that particularquintilevaluewhere
at the confidence level,at the cumulative probability level is 1 minus alpha by 2. So,
thus the confidence interval of theproportion is just after from thisthing.We have to just
see the confidence interval of this p and then we will just multiply this quantitywith
this value and added to this one.So,we will get that p cap minus z alpha by 2 square root
of this variance that is standard deviation and the upper limit is the p cap plusz alpha
by 2 square root of p cap into1 minus p cap by n.
Now,you take one example of thisproportion and its confidence intervalduring the inspection
of quality of the soil compaction in a highwayproject. Then45 out of 60specimens that were inspected
could pass the CBRrequirement.So, here again this is aBernoulliprocess, where the one particular
specimen may or may not pass theCBR test andout of 60 specimen 45 specimen is as passed. So,the
first question is what is the proportion p of the embankmentthat will be wellcompacted
that is pass the CBR’s test.So, this is a point estimation and that what you have
seen the point estimation is justa justratio of whatever thenumber of success divided by
total number of specimen.That we can estimate and the second thing is that what is the 95
percent confidence interval of that p?That we will see and how we have to use this standard
normal distribution.
Now, you can see that thepoint estimate of proportion p of the embankment that will be
well compacted that is p cap as I told that it should be the 45 divided by 60is equals
0.75.Thisis straight forward.Now, the 95 percent confidence interval of this p is that this
estimate minus this z alpha by 2 into this theirstandard deviation and p cap plus z alpha
by 2 multiplied by this standard deviation.So, this alpha by2, in case of 95 confidence level
that we have seen in this last lecture, also which is 1.96, so 0.75 minus 1.96 square root
of this and 0.75 plus 1.96 square root of thismagnitude, which is the variance of this
estimate. So, if we do this one then we will get that
this quantity becomes 0.64 and other one become 0.86.So, theconfidence interval for the estimate
of this proportion t cap is 0.64and 0.68. Also, you can see that it is symmetrical with
respect to that 0.75.If we increase thisconfidence intervalto for say for example, from 95 percent
to 99 percent then it will go wider.This one will be even become lower and this one even
become higher.At95 percent confidence interval this confidencelimits is0.65 and 0.86.
So farwe have discussed about thepoint estimation and in this lecture we have discussed about
this interval estimation of this parameter.Now, basically what happens is this estimation
when we are doing with the respect to one data set that is available to us and now,suppose
that there are two data sets are available to us.We may sometimes interest to know thatwhether
the mean of that population can be like this or it cannot. I can even say that whetherI
can say that two samples are coming from the same population or not that means the parameters
that isassociated with the populationand what we are getting from thosesamples those are
same or not.
Toanswerthis type ofquestion what we need that is known as this Hypothesis Testing.That
iswhat ournextinterest that we will discuss now.In this hypothesis testing, in real lifedecision
making,it is often necessary to decidewhether a statement concerning a parameter of a probability
distribution is true or false. This hypothesis test is used to checkthe validity of a possibility
or guessmeans like that kind of English word that we are using which is basically is known
as this hypothesisof one possibility. This just mayhappen like thatabout the population
where the necessary decision can betakendepending on the test result.
Suppose thatfor exampleif I take on this point one civil engineeringreal life scenario likesuppose
that I am estimating the strength of a concrete.So,whatwe are supposed to get is thatI need say for
examplefor a specific requirement,I need the strength should be greater than some threshold
value that we know. Now, how to test that one,so we have to take some sample and we
will get say thatsome sample mean.Now,this sample mean may or may not cross that particular
thresholdvalue or it may be the difference from whatever our expectation and what we
are getting from thissample,obviously will not matchas soon as we are changing the sample
also that will change. So, depending on that,now we have to test
whether thatdifference that we are getting is it by chance or there is some problem in
the constructionitself.So, depending on that after this testresult of this hypothesis testing
that we have to decide on what is that; whether it is by chance that the difference that we
got or there is some problem in this problem.What we are targeting or whatwe are trying to test
that is whether the strength can be assumed toexceed thatthreshold value.That possibility
that guess is known as thehypothesis here.So, that hypothesis we have to testand depending
on that test result, we have to take somenecessary decision whatever we can take in that.These
decisions are basically based on this statistics and known as this the statistical inference
that we can draw from whatever the sample data that we are having.
When a probabilistic model is developedto describe some process, say for example, we
will takein these coming lectures,afterfew lectures, we will take thatregression, which
is one of theveryimportant models.There if we just express thoseexpressions then that
model whetherthat is exactly matching with this data or not, so that probabilistic model
which we have developed it may be found that the observed dataset matches partly with the
model. In such cases,the deviations from the observations
from the model may be due to two cases. One is that actual inadequacy of the model or
some chance variation.So, these two are important.The actual inadequacy of the model, if we declare
this one,then we have to seriously think about that we have to change the model.Those modelsshould
be changed so that we canbetter explain the process.
On the other hand,even though the model is correct,it will never match with the observed
data set.So, that variation is called the chance variation.In such cases that is whether
it is varyingby chance or there aresome real inadequacies there in the model.Forthat case,
this hypothesis testing is used as statistical test to determine whether the devised model
is adequate or not.
There are two types of errors in the hypothesis testing.These are known as the type one error
and type two errors.So, the true hypothesis gets rejected, sothe hypothesis that we are
first assuming say that is actually correct and through this hypothesis testing,if we
rejected that means that we arefacing one type of error and that is known as this type
one error.The probability of this type one error is denoted by alpha. Similarly, when
the hypothesis is false, butwe fail to reject that hypothesis that time also, we aredoing
another error and that is known as this type two error.Probability of type twoerrors is
denoted by beta.So, if I just see through this table thatthe true situation,the hypothesis
can be true or the hypothesis can be false.The decision that is taken after the hypothesis
testing that we fail to reject thehypothesis. So, if we fail to reject the hypothesis, in
case when the hypothesis is true, then obviously there is no error.So, we have taken with the
right thing that has occurred.Now, if we fail to reject the hypothesis and the hypothesis
is false then that is the type two error as we have explained here. On the other hand,if
we reject the hypothesisand the hypothesis is true then also this is one type of error
which is known as the type one error.If we reject the hypothesis and hypothesis is false,then
also there is no error occurred and the probability of this type one error is alpha. It is denoted
by alpha and probability of type two error is denoted by beta.
Now,you can see this type one error when the hypothesis istrue and we reject the hypothesis.Basically,
from the manufacturer point of view,this type of error is very critical and we generally
allow very low value for thistype one error.After giving this one we try to minimize as much
as we can for this type two error.
The task of this hypothesis testing is there are the followingsystematic steps we can follow
in this hypothesis testing,the formulation of a null hypothesis and appropriate alternative
hypothesis, which is accepted if the null hypothesis has to be rejected. So, the definition
of this null hypothesis such as what is null hypothesis and what is alternative hypothesis
we will see in a minute.Onlyone point that we should note here that whatever we are putting
in this null hypothesis,we can say thateither we should reject that null hypothesis or we
should say that based on the data available, we fail to reject the null hypothesis.
Generally, it will be wrong to state that this null hypothesis is accepted.So,aswe cannot
declare this null hypothesis is accepted,maximum what we can say that we fail to reject this
null hypothesis.So, on the other hand what you can say that null hypothesisis rejected.When
you say that null hypothesis is rejected that time we can accept this alternative hypothesis.So,
our in this hypothesis test whatever the hypothesis that we want to test is generally put in this
alternative hypothesis. So, at the end of this test, if the null hypothesis
is rejected, we can say that the alternative what basically we are trying to test, we can
accept thathypothesis.So, whatever our goalor whatever we are trying to test is generally
put in the alternative hypothesis,specifying the level of significance.This is the second
step.Once we haveclearly defined what my null hypothesis is and what my alternative hypothesis,I
have to define one level of significance. That is the probability alpha of the type
one error.As we just now told that our goal is that for this type one error,we should
minimize as much as we can, so generally this alpha can assign the value of5 percent or
1 percent like that, so alpha equals to 0.05 alpha equals to 0.1.Theseare the typical values
of this alpha thatwe can fix. Of course, in some alternatives, the probability of beta
that is probability beta of type two error may also be specified.But, in general casewhat
wedo is the level of significance isgenerallypredetermined before I go for this hypothesis testing.
Then construction of criteria is based on the sampling distribution of a suitable statistic
for testing the null hypothesis.That suitable statistics that we have todevelop and we should
know what is the probabilistic nature of the statistics.In theearlierdiscussion, of this
parameter estimation there also we have developed somestatistics and we have seen that what
the sampling distribution of different statistics is.Similarly,here also the suitablestatistics
for this test, we have toconstruct anddepending on their probabilistic distributionwe have
tofind out what is thecriticalvalue for that. Thissuitable statistic has to be constructed.Then
a calculation of this value of this statistics from the data, so whatever the statisticsthat
you have decided, for the hypothesis test inhand.For that one whatever the observed
value that we are having,based on which we are testing this hypothesis that statistical
value should beobtained.
After thatthe final step is for this decision making.That is rejecting the null hypothesis
or we can say that fail to reject the null hypothesis.So, as we are telling that using
the two words, this null hypothesis and the alternative hypothesis, the null hypothesis
is any hypothesis that is tested to see if it can be rejected; it is denoted by H naught.
The alternative hypothesisis that which is accepted, if the null hypothesis has to be
rejected.It is denoted by Ha.Alternative hypothesis may be one sided or two sided.We willsee what
is one sided and what is two sided here.So, what we have just discussing that in case
we can reject the null hypothesis then we accept this alternative hypothesis.We never
say that the null hypothesis is accepted, so that is why whatever our goal is we generally
put it in the alternative hypothesis.
Now, the example of this one sided and two sided that we will see is let us consider
that a mix ofconcrete is acceptable for a certain construction, only ifthe mean strength
is greater than 25kilo newton per meter square.So, if this is the threshold value and I say that
only if it can be greater than this one then only it isaccepted.
Whether based on this sample data, we should decide or we should accept that yes the strength
is greater than 25kilo newton meter square or not that we will see.So, that is why in
the alternative hypothesis, we put that mu is greater than 25.So, the mu is greater than
25, is in this alternative hypothesis.So, in the null hypothesis, the remaining thing
that is less than equals to 25 is put there in thisnull hypothesis.
Remember that hereit is acceptable only if the mean strength is greater than 25 kilo
newton per meter square is written.If it is written that it is greater than equal to 25
kilo newton meter square then in this one sidealternative hypothesis,theequality sign
we put here that is mu greater than equal to 25.Here, in this null hypothesis, we write
that mu less than25.So, depending on this statement what we are trying to testthat will
be exactlywritten in this alternative hypothesis.
Similarly,the example of two sided alternative is the let us consider that in a cement plant,
the average amount of cement in one bag should be 50 kg.Now, if we want to check whether
the average amount is not much greater than 25 kg, so if it is more than 25 kg, this will
be the loss for the manufacturer.It should not be much less than 50 fifty kg as well
because this will dissatisfy the customers. Then what we are trying to test is that whetherwe
can accept that this check is whetherper bag the amount is 50 kg or not.So, in this null
hypothesis, it is written that mu is equals to50 and two sided alternative hypothesisthat
we write is the mu is not equal to 50.So, in this case,if we fail to reject that null
hypothesis then the job is done,in the sense that manufacturer is happy and yes the mu
equals to 50 kg that is 50 kg per bag whatever the sample that we have taken cannot be rejected.
Level of significance says hypothesis testing is always associated with the level of significance.It
is equal to the probability alpha of this type one error,so as we have discussed earlier.This
value of this alpha is generally fixed at 0.051or 0.01, butmay vary depending on the
consequence of committing the type one error.If the chosen value of alpha is too small then
the probability of type 2 error is generally increases.
TheP-value for a given test statistics and the null hypothesis is the probabilityof obtaining
atest statisticsvaluethat is equally extreme or more extreme than the observed.
For a right side alternative hypothesis H1 equals to say mu is greater than some threshold
value mu naught, only the values greater than the observed values of the test statistic
are considered more extreme.For a left sided alternative hypothesis, when mu is less than
some threshold value, only the values lesser than observed values for the test statistics
are considered more extreme.For a two sided alternative hypothesis the values on both
the tails are considered more extreme.
So, basically if you see this P-value,suppose this is the distribution of this statisticsthat
we are considering.Now, in this if you just see that the test statistics comes to this
value then this is basicallythe test ofsignificance at this level. So, which we generally fixed
as0.05 or 0.01, as the case may be.Now, suppose that thestatistic is falling in this zone,
then the P-value whatever is remaining here for the one sided test, the P-value is this
green highlighted area.So, this area is known as this P-value.This is for this one sided.Obviously,
the upper side,similarly, it can happen for this lower side also.Or if it is the two sided
test then the symmetrical value we have to take here and this area also should be included
for that for the calculation of this P-value. So, once we set this significance level, then
we get what is the zone.So, if the test statistics fall in this zone then we should reject thenullhypothesis.If
itdoes not fall in this criticalregion, the blue highlighted is the critical regionfor
this any significance level then we cannot reject this null hypothesis. We will continue
with this one, with thisdiscussion of this hypothesis testing in the next lecture. Alsowe
will take up some problems also and various hypotheses concerning two means, one variance
or two variance.This type of problems we will take in the next lecture.