Tip:
Highlight text to annotate it
X
Good morning and welcome to this the fifth lecture in the course stochastic hydrology.
If you recall in the last lecture, we introduced the moments of a distribution. First we took
the moments about the origin, and then we started taking moments about the expected
value or the mean value. Then we introduce measures of central tendency, specifically
the mean mode and the median, and then measure - measures of dispersion the standard deviation,
and the variance, and the coefficient of variation c v, measures of symmetry where we introduced
the coefficient of skewness, and the measures of peakedness, the kurtosis. Then we also
went on to discuss the normal distribution, I just introduced the normal distribution.
So, today we will progress further on normal distribution. What we will do is to make sure
that you understand one distribution correctly, we will solve several numerical examples related
with normal distribution before going on to other distributions such as log normal, gamma
distribution, etcetera.
So, in the last class we mentioned that if x has normal distribution, a linear function
y is equal to a plus b X also has a normal distribution, and its parameters are given
by a plus b mu that is Y will have the mean of a plus b mu, and it will have a variance
of b square sigma square. We use this result to look at so called standard normal distribution,
Z is equal to x minus mu over sigma as you can see this is a linear transformation of
on X, and then Z will therefore have the normal distribution, and we can see that Z has a
normal distribution with mean 0 and unit variance. This result becomes extremely handy in dealing
with normal distribution and we use this pdf of z defined as F of z is equal to 1 over
root 2 pi e to the power minus z square by 2, because it has a variance of 1 and mean
of 0. The f of z turns out to be this expression and for cdf of z which is probability of z
being less than equal to a given value of z we get 1 over root 2 pi integral between
minus infinity to z e to the power minus z square by 2 d z as mentioned last time this
integral is not analytically this expression cannot be analytically integrated by normal
means by the usual methods that we have and therefore, we adopt the numerical integration
for this.
So, the f (z) is referred to as the standard normal density function this has a 0 mean
and the distribution is symmetrically distributed around z is equal to 0 and we also saw that
the plus 1 standard deviation plus 1 to minus 1 standard deviation contains about 68 percent
of this area then 2 standard deviations that is minus 2 to plus 2 contained about 95 percent
of the area and plus 3 to minus 3 contain about 99 percent of the area, which means
that about 99 percent of the area is contained within a deviation of plus or minus 3 sigma
because sigma is one here the values of capital f of z which is a cdf of z are tabulated.
So, we use a numerical integration and then tabulate these f of z, so that we can use
these tabulated values of our talking about the probabilities associated with any value
of z.
So, most of the text books and the tables that are available in the reference books
provide this area that is to the right of 0 so, for all positive values of z they provide
this area. So, what does this mean let us say we get this area from the table for a
given value of z you need to add 0.5 to that so, that you get the area between minus infinity
to z. So, whenever you are using normal distribution tables please ensure that you are looking
at this particular area before following the problems that we are discussing in this lecture,
if some of the books provide the total area z is equal to that is the area between minus
infinity to plus z they include this 0.5 area also in which case you do not have to add
0.5. What indicates here is probability of z being
less than equal to z. So, which means you are looking at this total area and this total
area includes 0.5 which is area up to z is equal to 0 and the area as obtained from your
standard tables. So, in this course we will adopt this particular method where the areas
are provided for positive values of z.
So, we will see an example of how the normal distribution tables look in fact, you know
these normal distribution tables can be readily obtained from standard software like micro
soft excel and so on. So, you can encourage you to just experiment to these and generate
these tables yourselves. So, essentially what it does is for a given value of z it integrates
the pdf of z between 0 and infinity 0 and z, and then provides the associated values
of capital f of z. So, for example, if we are looking at let us say z is equal to 0.24
let us say you are looking at z is equal to 0.24 what is it that we are looking at? We
are looking at an area under the standard normal distribution up to 0.24. So, from the
table you get area up to 0.24 as 0.0948. So, to that you add 0.5. So, 0.5 plus this is
not visible here. So, 0.5 plus 0.0948 that is what you will get 0.5 plus 0.0948.
Now, that gives you the area up to z is equal to 0.24. So, essentially what you are doing
is for positive values of z you will read the area up to that point and then add 0.5
to that to get probability of z being less than equal to z. So, this is it goes on like
this for various values of z you are enumerating the integral values here and you get the associated
tables. So, as you can see from here in about area of about 0.5 or of up to about 99 percent
if you add 0.5 to the left of z is equal to 0 you about 99 percent of area is contained
in about z is equal to plus 3 to minus 3 and up to 4 when you go it is almost equal to
1.
We will see how we obtain for z taking on negative value, let us say you are interested
in getting probability of Z being less than equal to minus 0.7 in all the problems dealing
with normal distribution we must always remember that the standard normal distribution is symmetrically
distributed about z is equal to 0. So, we use that fact and then obtain the areas from
the tables and convert that into the probabilities, that we are interested in let us say we are
interested in probabilities that z is equal to Z is less than equal to a certain value
on the negative side minus z, which means you are looking at this area and the total
area up to this point is 0.5. What we do is associated with this you read the area from
the table for positive value of z. Let say in this case minus 0.7 you read the
area corresponding to plus 0.7 what is the area that tables give? The table gives this
particular area up to this point then, because the total area right of a curve is 0.5 the
area to the right of plus z is 0.5 minus A 1 where A 1 is the area that you just read
from the tables and by symmetry this area will be equal to this area. So, 0.5 minus
A 1 which is the area to the right of this plus z is the same as 0.5 minus A 1 which
is the area to the left of minus z and this defines the probability that z is less than
equal to a negative value of z. So, for example, probability of z being less than or equal
to minus 0.7. So, what we do is from the table we read for plus 0.7 z is equal to plus 0.7
which is from the table 0.70 it comes out to be 0.258.
And then we take 0.5 minus a one which is 0.5 minus 0.258 which will be 0.242. So, probability
of z being less than equal to minus 0.7 is equal to 2, 0.242. So, in all these problems
we must first understand what is the area under the standard normal curve? That we are
looking at and then use the fact that the standard normal distribution is symmetrical
about z is equal to 0. Pick up associated areas from the tables and then convert them
into the associated probabilities.
We will do several examples on normal distribution today. So, that you are well versed with usage
of the tables for standard normal distribution, let us say we will first start with getting
the area between z is equal to minus 0.78 and z is equal to 0. So, we are interested
in probability of z lying between minus 0.78 and 0. So, this is the area that we are interested
in if you use numerical integration what we would have done we will integrate the cdf
of z which is one over root 2 pi e to the power minus z square by 2 we will integrate
between minus 0.78 to 0. We would have integrated the pdf of z that
the probability density function which is e to the power of minus z square by 2 divided
by root 2 pi that is a pdf we will integrate between the area between z is equal to minus
0.78 and 0 to obtain this area. So, if we do this numerically we get an area of 0.2823.
So, we write this integration as minus 0.78 to 0 which by symmetry we write as 0 to 0.78
and we integrate. It numerically using some standard software like mat lab or something
and then get this probability. So, the area of this turns out to be 0.2823.
Now, from the tables, because z is negative now what we do is corresponding to z is equal
to plus 0.78 which comes somewhere here plus 0.78 you read the area which comes out to
be 0.2823. So, this is the area between 0 and plus 0.78 which is 0.2823 which by symmetry
is also the area between minus 0.78 and 0. So, that is how we obtain the area between
area for z is equal to area between z is equal to minus 0.78 and 0.
Now let us look at the area under the standard normal curve for z being less than equal to
minus 0.98 that is we are looking at the area to the left of this as said this is also same
as area to the right of this where this particular, is 0.98 that is equal to plus 0.98. So, look
up for 0.98 you get 0.3365. So, you are getting the area up to this point. So, let us say
this is plus 0.98 by symmetry this is plus 0.98 and the area that we are getting here
is 0.3365 which is this area this is 0.3365. And we are interested in this area which is
also equal to this particular area. So, this area will be equal to this area this area
is what 0.5 minus 0.3365 which is 0.3365 that is how we obtain the area that is required
here in this particular case z is less than equal to minus 0.98.
Again now we will do a different type of example where we have specified the probability of
z being less than equal to z has been specified as 0.879 we are interested in getting the
associated value of z. So, we are asking the question what is that value of the z for which
probability of z being less than or equal to z is equal to 0.879 that is the question
that we are asking now, you must remember because this probability is greater than 0.5
we are looking at the positive side of z is equal; that means, to the right of z is equal
to 0 is what we are looking at. So, this total area has been specified to be 0.879. So, from
the tables what is it that we have to look at we look at this particular area which is
0.379, which is 0.879 minus 0.5. So, you go to the table and look at the value
that is closest to 0.379 you can also do numerical interpolation and if you want exact values.
So, in this particular case it turns out to be 1.17. So, for the area of 0.379 we get
z value of 1.17 we will now do an example where we are looking at a given sample value
let us say you have stream flows at a particular location.
We denote that as a random variable X, and that has a mean of 100 estimated mean of hundred
and the variants of 275100 square and we are interested in getting probability of x being
less than equal to 75. So, we convert x into z by using z is equal to x minus mu over sigma
and therefore, the right hand side which are the specific specified values of x are also
converted into the associated specified values of z as z is equal to x minus mu over sigma.
So, 75 we take it as 75 minus 100 which is the mean by 20, 5, 100, which is sigma this
is sigma square. So, this is sigma. So, that z is equal to minus0.01. So, from the table
you get you are looking at minus 0.01. So, you are looking at this area what you do is
you get this particular area associated with z is equal to plus 0.01, so 0.01 which is
0.04 and because you are looking at this particular area.
This area becomes point 5 minus 0.004 which is 0.496. So, that is how you get probability
of x being less than equal to 75 is also same as z being less than equal to minus 0.01 which
is equal to 0.496. We will do another example where specifying mu x this is similar to the
earlier example except that we are looking for probability of x being greater than equal
to x and we are looking at what is that particular value of x for which probability of x being
greater than equal to x is 0.73.
In fact, these kinds of problems come quite often in water resources where we are saying
that the flow value that is exceeded with 70 percent of probability 75 percent probability
and so, on. So, we are interested in that particular value of x which is exceeded in
this particular case 73 percent of time are probability that that particular value is
exceeded is equal to 0.73 and from the samples we have estimates for mean and standard deviation
that is why in this particular case you have mu x is equal to 650 and sigma x estimated
is 200. So, as you can see probability of x being
greater than equal to x converts itself into probability of x being less than equal to
x is equal to 1 minus 0.73 which is 0.27. So, you are looking at an area of 0.27 because
it is less than 0.5 it has to lie on the negative side of 0 on the left side of 0. So, we are
looking at minus z here. So, area between 0 to minus z is equal to area between 0 to
plus z here. So, we are looking at an area of 0.23. So, you look up the tables and go
to the area where you get closest to 0.23 you can do the numerical interpolation and
you get z is equal to minus 0.613. Once you get z you are actually looking at
the value of x here you are not interested in z that value. So, once you get the value
of z which is in this case minus 0.613 we use the fact that z is equal to x minus mu
over sigma and get the associated value of x in this case it turns out to be 5, 207,
now another similar type of example where we are dealing with x which is normally distributed
and we have two probabilities given.
Probability of X being less than equal to x is given and probability of X being less
than equal to 250 is given as 0.894 we are interested in getting the standard deviation
and the mean of x. So, what are we given we are given X being less than equal to 50 equal
to 0.106 which is this. So, we first convert this into probability of Z being less than
equal to z is equal to0.106. Since the probability is less than 0.5 the z has to be the z that
we are talking about has to be negative value of z. So, from the tables we get for an area
of 0.394 that is 0.5 minus 0.106 this area is 0.394. So, this is 0.394 here corresponding
to 0.394 here you get a z value of minus 125. So, 1.25 corresponds to this area and we are
interested in this particular area being 0.394. So, z value corresponds to 1.25 and that we
convert it as minus 1.25 here, because we are interested in this particular area. So,
z is equal to minus 1.25. So, we write one equation mu is equal to 50 plus 1.25 sigma
from this expression.
Similarly, we use the other one other condition that is probability of x being less than equal
to 250 is equal to 0.894 this is given. So, what are we given we are given that the area
up to this point including area to the left of that is equal to 0 is given as 0.894. So,
area to the right of 0 will be 0.394 and therefore, we go back to the tables and look at varies
at area which is 0.394. So, again we get 1.25. So, z becomes 1.25 remember because we are
talking about area being greater than 0.5 we are looking at the right side of z is equal
to 0. So, z becomes equal to1.25. So, from the earlier expression we write we use mu
is equal to 50 plus 1.25 sigma and then obtain sigma as a t and mu as 150 one more example
we will do where we are considering the annual rainfall in a particular basin this is normally
distributed with a mean of 1000 mm and a standard deviation of 400 millimeters.
Now, we have an expression which relates the runoff R with the precipitation P for the
rainfall P as given by R is equal to 0.5 P minus 150. Now P is normally distributed and
we are interested in getting the mean and standard deviation of the annual runoff, because
this is a linear function and P is normally distributed R is also normally distributed
and we can obtain the mean and standard deviation of R and from there we can get a the probabilities
associated with the annual runoff exceeding any given value for example, in this case
600 millimeters.
So, first we will start with this R is equal to minus 150 plus 0.5 P this is a linear function
of P. Since P is following a normal distribution with mean as 1000 and variance as 400 square
R follows normal distribution with mean as 350 and standard deviation as 200. So, we
obtain mean and standard deviation by using simply the fact that a linear function of
a normal random variable linear function depend on a normal random variable also follows normal
distribution with mean given by a plus b mu and variance given by b square sigma square.
So, we obtain the mean and standard deviation as 350 and 200 millimeters in this particular
case next we are interested in probability of R being greater than equal to 600. Once
we know that R follows normal distribution for example, in the previous case what did
we do we said that R follows normal distribution with the parameters mean as 350 and standard
deviation as 200.
We can start talking about probabilities associated with the variable R which is runoff in this
case. So, we obtain probability of R being greater than equal to 0 greater than equal
to 600 as 1 minus probability of R being less than equal to 600 and so, on. We use the same
procedure and obtain the probability of R the rainfall the runoff at a particular location
in the basin being greater than equal to 600 as 0.6056. So, in this example what did we
demonstrate we demonstrated the use of linear functions defined on a random variable, which
follows normal distribution with known parameters mu and sigma, now why is normal distribution.
So, popular not only in hydrology, but in many other applications many other scientific
fields normal distribution is extremely popular in fact, as a first cut analysis you generally
use normal distribution when you do not have any other inferences available to you.
That is mainly because of the central limit theorem now the central limit theorem states
if X 1 X 2, etcetera are independent and identically distributed random variables with mean mu
and variance sigma square, then the sum defined by S n is equal to X 1 plus X 2 plus X 3 etcetera,
up to X n where you have N number of such random variables. Approaches a normal distribution
with mean n mu and variance n sigma square as n becomes as N tends to infinity that is
we state S n is equal to S n follows a normal distribution which parameters mean as n mu
and variance as n sigma square. Now this is an important result and we use this abbreviation
iid to indicate that the random variables are independent and identically distributed
now look at the implications of this it does not put any restriction on whether X 1, X
2, X 3, etcetera have to follow normal distribution they can follow any distribution as long as
they are independent random variables. And as long as all of them follow same distribution,
let us say they are following exponential distribution with the same mean. So, they
should all have the same mean and the same standard deviation. So, by iid mean independent
and identically distributed as long as you satisfy these conditions the sum of the random
variables X 1 plus X 2 plus etcetera, X n this sum follows normal distribution with
mean given by n mu and the variance given by n sigma square. In many situations we can
approximate a particular random variable as having being constituted of a sum of several
random variables and if we can also make a assumption that they are independent and identically
distributed then this result becomes very handy let us say that you have you are looking
at a stream flow in a particular month. Let us say stream flow in month June is a we are
interested in the distribution of this. Can we approximate this to be a normal distribution
if we consider the stream flow in month June as having been constituted of several random
variables, let us say in this particular case 30 random variables, 31 random variables x
1 plus x 2 plus x 3 plus etcetera, where X is are the stream flows in day. So, the monthly
stream flow can be looked at as a sum of daily stream flows in that particular month, now
if the daily stream flows can be assumed to be independent and identically distributed
then we can say that the stream flow in that particular month follows a normal distribution
with the mean given by 30 into mu and the variance given by 30 into sigma square where
mu and sigma square are the mean and the variance of the individual days of random variables.
Now, the requirement that these be identically distributed that becomes slightly restrictive,
so far most hydrological applications under general some general conditions if X I are
all independent with expected value of X I given by mu I and variance of X I given by
sigma square; that means, what we are doing now is we are relaxing the requirement that
they are all identically distributed all of them have some distributions X 1 has a distribution
X 2 has its own distribution with different parameters X 3 has its own distribution with
different parameters and so on. But as long as we can assume X 1, X 2, X 3 etcetera, X
n as independent then the sum X 1 plus X 2 plus X 3 etcetera, X n approaches normal distribution
with expected value of s N given my sum of mu over all and variance of S n given by sum
of all the variances is equal to one to N now, one condition for this generalized central
limit theorem is that each of X I that we are considering here has a very limited or
very negligible effect on the total distribution of S n itself. So, individually they are not
contributing significantly to the distribution of S n, but together they are making sure
that it approaches a normal distribution. Now, this result this generalized condition
becomes extremely handy in hydrologic applications where we will be dealing with as just mentioned
several random variables which can be taken as sums of individual random variables. Again
for examples seasonal rainfall if we are looking at seasonal rainfall or seasonal stream flows
at a particular location if we are looking at now, this seasonal stream flow can be taken
as having being constituted of daily stream flow, let us say we are talking about stream
flow in a monsoon period which has 4 months approximately 120 days. So, X 1 X 2 etcetera,
up to X 1 20 there are several individual random variables as long as you can take them
as independent random variable then you can approximate the seasonal stream flow as with
a normal distribution and once you know the mean and standard deviations of the individual
random variables X 1, X 2, X 3, etcetera, up to 120.
Then you can also obtain the mean and standard deviation of the seasonal flow, now we will
see why normal distribution cannot be very generally applied in most of the hydrologic
situations although normal distribution is extremely elegant and extremely useful in
many applications, you can recall that normal distribution is defined from minus infinity
to plus infinity. So, irrespective of how high is you R mean how much to the right of
0 your mean there is always a probability finite probability associate with negative
values. So, even if you have the mean to extreme right and that is extremely high value of
mean, but there is always a probability associated with a negative value of the particular random
variable in most of the hydrologic situations we are dealing with non negative variables.
For example stream flow cannot be negative rainfall cannot be negative cannot be negative
and so, on. So, most of the hydrologic variables are non negative unless you are talking about
temperature as one of the variables and with the scale you are looking at you may have
negative values of temperature or reservoir levels around a particular threshold value
that can be negative. So, in only very specific cases specific applications you come across
negative variables it is for this reason normal distribution has a limitation in the sense
that there is a finite probability associated with negative values when we are dealing with
normal distribution and therefore, when you generate values of generate samples of this
particular variable using the normal distribution there will be always negative values that
are generated which we will see in subsequent classes of this particular course.
The other property of normal distribution is that it is perfectly symmetrical. So, it
is symmetrical about x is equal to mu, but most of the variables that we deal with hydrology
for example, rainfall or time between two events two critical events or the flood flows
at a particular location etcetera, these are generally skewed distribution these follow
generally skewed distributions with gamma as typically being positive in most situations.
So, whenever we have a significant skew we cannot use normal distributions. So, both
these limitations of normal distribution lead us to use the log normal distribution, which
we will introduce now the log normal distribution, if Y is equal to L N X that is Y is equal
to log of X log natural of X this follows normal distribution then X is said to follow
log normal distribution as simple as that. That is we take the transformation Y is equal
to and if y is equal to l n x follows a normal distribution then Y follows log normal distribution
now, the probability density function of the log normal distribution is given by F of X
is equal to this is obtained from the fact that Y is equal to l n x follows normal distribution
as you can see this is normal distribution for l n x, F of X is equal to 1 over root
2 pi x sigma x e to the power minus l n x minus mu X whole square divided by 2 sigma
X square and this is defined for greater than zero and mu x greater than 0 and sigma x greater
than 0 . So, all these are positive quantities now this has a property that the skewness
coefficient gamma S is given by three C v plus C v cube.
Where C v is the coefficient of variation of x which as you can recall is simply sigma
x divided by mu x so, coefficient of variation of x. So, as C v increases the skewness gamma
increases for the log normal distribution. So, while the normal distribution was a symmetrical
distribution the log normal distribution it has a positive skew as you can see here this
has a positive skew.
Or in most situations because here x bar is in hydrologic applications we are dealing
with x bar as positive and S x being standard deviation is always non negative now, the
parameters of y is equal to l n x may be estimated this is. So, specific to hydrologic applications
where zhou and han have demonstrated that the parameters of y is equal to l n x may
be approximated as mu Y is equal to half l n x bar square by 1 plus C v square and sigma
y square is equal to log of 1 plus C v square where C v is the coefficient of variation
of x, if we are given a sample let us say you have given 50 years of stream flows at
a particular location, now this constitutes a sample of the random variable x where the
random variable is the stream flow at that particular location.
From the stream flows samples you can estimate x bar and the standard deviation s x and if
x follows log normal distribution then y is equal to l n X follows normal distribution
and the parameters of y is equal to l n X are given by mu Y is equal to this and sigma
y square is equal to this. So, from the sample you would have estimated x bar as well as
x and therefore, you know C v and using this you can write mu Y and sigma Y square. Once
you specify mu y and sigma Y square the Y is equal to l n X which follows a normal distribution
is completely defined and therefore, you can start talking about probabilities on Y, because
it follows normal distribution and then its parameters are determined as mu Y and sigma
Y square in applications. Another way of doing this is if you are given a sample of on X,
let us say stream flows at a particular location as we did just now stream flows at a particular
location for the last 50 years monthly values of stream flows are available which means
50 into 12 , 600 values are available. And you want to approximate this with a log
normal distribution then a easy way although slightly irrelevant way of doing this will
be that you take log of X corresponding to each of the values for example, you have let
us say you have X 1, X 2, X 3, etcetera, observed values X let us say you have X 1, X 2, X 3,
and so on; these are the observed values. So, you simply take y is equal to l n X 50
you take log of 50 and then associate y. So, you generate another series y is equal to
l n X, if x follows normal distribution log normal distribution then y is equal to l n
X follows normal distribution and therefore, you work with the series on and associate
the probabilities on X . So, this is one easy way of doing this. So,
if you have a sample on X which follows log normal distribution simply take y is equal
to l n of X generate another sample and this sample follows now normal distribution you
can its mean and standard deviation by this from this sample then start talking about
probabilities on Y which are also related to probabilities on X .
Now, just look at how the log normal distribution appears this is for different mu for example,
mu X is equal to point 3 sigma X square is equal to one this is a shape that it takes
now as your C v increases; that means, C v is sigma by mu as your C v increases from
your previous location previous expression here 3 C v plus C v cube as your C v increases
the skewness increases. So, you just look at these. So, as your C
v is increasing for the same mu X of 1 point 5 the C v is increasing between this and this
because the S is smaller here and s is larger here. So, the C v is increasing as C v increases
the skewness increases; that means, you will have a longer tail to the right. So, log normal
distribution is positively skewed with long exponential tail on the right like this and
log normal distribution has many applications in hydrology is typically used for monthly
stream flows monthly or seasonal precipitation evapotranspiration and hydraulic conductivity
in a porous medium and so, on. So, log normal distribution is very popularly used in several
hydrologic applications let us consider one example, now let us say you are talking about
the annual peak runoff in a river this is modeled by a log normal distribution.
It has a mean of 500 and a standard deviation of 0.683. So, we want the probability that
the annual runoff exceeds 300 meter cube per second. So, we are interested in X being greater
than 300. So, we are interested in getting of probability of Z being greater than log
300 minus 5 these values of mean and the standard deviation that we have given are for Y that
is y is equal to ln X, and then you talk about because y is equal to l n X follows log normal
distribution, you can convert probability of X being greater than 300 as probability
of Z being log of 300 minus 5 this is the mean divided by 0.683 which is a standard
deviation. Then we use the normal distribution table
as we did in the previous examples on normal distribution table and get the probability
of X being greater than 300 as 0.6515, Similarly we will consider now X bar is equal to 135
million cubic meters, we are talking about the stream flow that location then the standard
deviation is 23.8 million cubic meters.
And from this we get C v as S by X bar which is 23 by 8 divided by 135 that comes out to
be 0.176, now if X follows log normal distribution we are interested in getting probability that
x being greater than equal to 150. So, we use the expressions and the fact that y is
equal to l n X follows normal distribution we first estimate Y bar and S y square which
is the mean of Y and the variance of Y using the expressions that just introduced X bar
square plus divided by C v square plus 1 remember both X bar as well as C v deal with relate
to the original variable X and we are getting the mean of the transformed variable y is
equal to l n X , similarly the variance of the transformed variable Y is equal to l n
of X . So, we get the mean as 4.89 and the variance as 0.0305 are the standard deviation
as0.1747. So, once we determine the mean and the standard
deviation and given the fact that Y follows normal distribution then we can talk about
X being greater than equal to 150. So probability of X being greater than equal to 150 means
probability of Y being greater than equal to log of 150, because Y is equal to l n X.
So, we then because Y follows normal distribution we simply use the fact that z is equal to
y minus y bar over S y. So, Y is l n 150 which is 5.011 minus 4.89 which is the mean of y
as we obtained just now 4.89 and the standard deviation is0.1747. So, using that we get
probability of Y being greater and equal to log 150 using the standard normal tables as
0.24117. So, log normal distribution is mostly used for as said monthly stream flows and
conductivity evapotranspiration and so on, but there are many situations where we would
be looking at time between critical events.
Let say that the time that has elapsed between high intensity rainfalls are the time that
has elapsed between two critical floods of a given magnitude. So, whenever we are talking
about such variables then log normal distribution is not generally suitable we have the exponential
distribution which is ideally suited for such purposes. Now, well introduce the exponential
distribution the probability density function of the exponential distribution is given by
a simply f of x is equal to lambda E to the power minus lambda X defined for X greater
than 0 and lambda greater than 0 this is a single parameter distribution, where lambda
is the only parameter you can easily verify that the integral between 0 to infinity of
lambda E to the power of minus lambda X turns out to be 1 indeed in the first few classes
when introduced to the pdf we would have considered a similar example the expected value of this
of X is can be shown to be 1 over lambda. So, mu is equal to 1 over lambda or lambda
can be estimated as 1 over mu and the variance of X which follows an exponential distribution
can be shown to be 1 over lambda square. The exponential distribution pdf looks like
this which is positively skewed there is a long tail to the right and then it approaches
asymptotically X is equal to infinity on the X axis asymptotically on the other side y
the cdf of F of x which by definition is integral between 0 to 0 and X, f of x d x will turn
out to be 1 minus lambda E to the power minus lambda X defined again for x greater than
0 and lambda greater than 0 .So, once you define cdf of x you can talk about associated
probabilities probability of x being less than equal to a given value of x and so on
remember both the normal distribution as well as log normal distributions had 2 parameters
mu and sigma and the exponential distribution has only 1 parameter lambda.
So, if you are given a sample that follows normal distribution from the sample you can
estimate the sample mean and from the sample mean you can estimate the parameter lambda
because mean is equal to one over lambda. Once you estimate this parameter your pdf
is completely defined then you can obtain the cdf and start talking about the probabilities
of the random variable taking on certain values also it is easy to integrate the particular
pdf here lambda E to the power minus lambda X. So, you can integrate this and obtain 1
minus lambda E to the power minus lambda X typically when we are talking about time to
failures we use the exponential distributions, now in many industries they talk about failures
of components let us say what is the time to failure of, let us say a bulb or a machine
component that is you start using the component and we are estimating the expected value of
the failure of that particular time to failure. Time to failure of that particular component,
but in hydrology in water resources we not do not so much talk about the component failures
we talk about functional failures say for example, we may be interested in hydro power
generation at a particular location, and then whenever it falls below a threshold hydropower
we call it as a failure. Then we will be interested in the distribution of the time between two
failures that is let us say in this particular month. We could not generate the power next
time when the failure occurs the time elapse between two such events is what we will be
interested and such a random variables are generally modeled using the exponential distribution,
another example will be that we may be interested in time between two critical events, let us
say that the flow the low flows we are interested in flow below a threshold value we are calling
it as low flow. And then we will be interested in the time
that elapses between two such low flows or time between two flooding events. So, whenever
we are talking about intervals of two critical events and the interval is a random variable
we generally use the exponential distribution. So, let us do some example on this as said
the exponential distribution is a positively skewed distribution and it is used for expected
time between two critical events such as floods of a given magnitude or time to failure of
hydrologic water resource systems components and so, on.
Again by this components do not mean the physical components. So, you may have functional components
the ability of the system to provide a certain, let us say demand in terms of hydro power
in terms of irrigation and so on. So, whenever it fails to achieve that objective then we
count it as a failure and then we are interested in time between two such failures we will
take a simple example here.
The mean time between high intensity rainfall as said a rainfall intensity above a specified
threshold events occurring during a rainy season is 4 days that is we are talking about
the mean time, mean time between high intensity rainfall is about 4 days in the rainy season
assuming that the mean time follows an exponential distribution obtain the probability of a high
intensity rainfall repeating within next 3 to 5 days or within and within next to 2 days
this kind of applications come typically when we are dealing with urban flooding. Let us
say we are interested in high intensity short duration rainfalls at a particular location
and we say that let us say for example, whenever the intensity of rainfall exceeds 90 millimeters
per day then we call it as a high intensity rainfall or for design purposes we may be
interested in 9 centimeters per hour in certain situations where you are talking about very
short durations of 15 minutes and so on. So, we are interested in very high intensities
of rainfall and the time duration between such events. So, the mean of that in this
particular case mean of such an event occurring is given to be 4 days and it follows a normal.
It follows a normal exponential distribution then we are interested in getting once the
event has occurred already what is the probability that it will again repeat within the next
3 to 5 days or within the next 2 days. So, we are interested in getting probability that
X lies between 3 to 5 days where X is the time between one event and the next and within
the next 2 days. So, we estimate the lambda which is the parameter required for the exponential
distribution as 1 over mu which is 1 by 4. So, once you get lambda the exponential distribution
is completely defined and from that we will be able to talk about F of that is probability
that X takes on value between 3 and 5 is given by F of 5 minus F of 3 from your fundamentals.
So, F of 5 is 1 minus e to the power minus 5 by 4 divided by 4 that is we are talking
about F of X is equal to 1 minus lambda e to the power minus lambda minus X. So, this
you get as 0.7135 and similar then we get probability of x lying between 3 and 5 is
equal to 0.1859 there is a correction here. So, this is F of x is equal to 1 minus e to
the power minus lambda x. So, there is a correction here well just go back here for a while and
see that your F of X is 1 minus lambda e to power minus lambda X. So, we get probability
of using this expression we get the probability associated probabilities. So, for today we
will close at this point what we started off today is with the normal distribution we defined
the standard normal density function and then solved several numerical examples sealing
with the standard normal distribution. And then went on to the log normal distribution
as mentioned the normal distribution is a very commonly used distribution; however,
the two limitations at the normal distribution has for hydrologic applications. Namely that
there is a finite probability associated with negative values and that the normal distribution
is a perfectly symmetrical distribution for most hydrologic applications these become
real limitations and therefore, we generally use the log normal distribution. Now log normal
distribution is a positively skewed distribution and if X follows a log normal distribution
then Y is equal to l n of X follows normal distribution and we solve the methods of estimating
parameters on Y given the parameters on X and then start talking about the probabilities
on X because we know that y is equal to l n X follows normal distribution. Then we also
introduce the exponential distribution and solved a numerical example. So, thank you
for your attention we will continue the discussion in the next class thank you.