Tip:
Highlight text to annotate it
X
Good morning and welcome to this the lecture number 30 of the course stochastic hydrology.
In the previous lectures we have been discussing about the intensity duration frequency relationships
that is the IDF relationships, which are essentially used for flood controlled designs typically
for urban flood drainage etcetera. From where you would like to pick up the design intensity
associated with a return period and associated with the particular design duration. We also
saw the methods by which we can construct the IDF relationships and we considered an
example of the Bangalore city rainfall using which we construct the idea of the relationships
for a given rainfall data series. Now, today is lecture we will take this discussion
forward and then look at once we look at what do we do with the design intensities that
we have. So, obtained from the IDF relationships that is we fix a return period based on the
type of design that we would like to make and then corresponding to that return period
and for a given design duration we pick the intensity of the rainfall from the IDF relationships.
Remember IDF relationships are obtained from use of extreme value distributions, typically
the gumbel’s extreme value type one distribution is what we used in that.
Now, starting with the intensities of the rainfall, we need to form what are called
as the design hyetographs, which will give the distribution of the rainfall for the design
duration. So, just do not lose sight of what we are doing from the design duration, we
went to the IDF relationship for the given return period we picked up the intensity of
the rainfall. Now we are coming back and then distributing that rainfall over the design
duration, because the intensity that we picked is in fact, the maximum intensity. So, how
this maximum intensity of rainfall is distributed over the design duration is what we will look
at now. So, this is the procedure by which we construct
the design hyetographs, most of you who would have gone through a basic course in hydrology
will know the difference between hyetograph and hydrographs. So, hyetograph is the time
distribution of the rainfall intensities, where as hydrographs are the time distribution
where they show the graph of discharge verses time. So, we will start with the IDF relationship
and then look at how we obtain the design hyetographs corresponding to the design intensities
and this hyetographs. In fact, are used to construct the hydrographs and from the hydrographs
we obtain the design dimensions.
So, the idea of curves from the idea of curves we would now like to obtain the design precipitation
hyetographs or design hyetographs from IDF relationship. As I mentioned the hyetograph
is typically, it shows the rainfall intensity on the y axis and the duration on the x axis.
We use what is called as a alternating block method, which is one of the methods to obtain
the hyetograph. So, this is a method for developing a design
hyetograph from IDF curve or IDF relationship, IDF in intensity duration frequency relationship.
It specifies the precipitation depth occurring in n successive time interval duration of
duration delta t each. So, let us say this is 10 minutes, this is 10 minutes, 10 minutes
and so on. So, of equal intervals delta t, how the precipitation depth is distributed
with respect to time and this is the total duration of the rainfall event. So, we are
talking about a storm of duration from this point to this point, let us say it is of 1
hour duration, it is of 50 minutes duration and so on. And we have obtained the maximum
intensity or the design intensity of the rainfall from the IDF relationship. How we distribute
this across time for the duration of the rainfall is what we will see now. And the particular
method we are discussing is called as the alternate, alternating block method.
What we do in this is, we know the maximum intensity of rainfall from the intensity we
get the precipitation depth corresponding to the duration t d let say we get i into
t d as a precipitation depth i is intensity which has millimeters per hour or millimeters
per time that is length per time is a units for that and time t d is the time. So, as
we are progressing in time let us say this is 10 minute duration, 20 minute duration,
30 minute duration etcetera. So, associated with these we get the depths and then this
depth we distribute in this time interval delta t. So, the precipitation that is occurring
in this duration will be simply the precipitation at this time minus precipitation at this time
that is what we do here.
Like this we get intervals of precipitation that is first 10 minutes there is a certain
precipitation in the next 10 minutes there is another certain precipitation and so on.
What we then do is we arrange these precipitation depths, such that the maximum precipitation
depth occurs at the center of the total duration, you have the total duration let us say of
one hour and then your delta t is of 10 minutes. So, 10 minutes, 20 minutes, 30 minutes, 40
minutes etcetera up to 60 minutes. We obtain the maximum precipitation by looking
at the t d 2 and t d 1 and so on. And then we arrange the precipitation in such a manner,
that the maximum precipitation occurs at the center of the total rainfall duration let
us say this is the total rainfall duration and we first fix the maximum value at the
center. Then on either side we pick up the values from the precipitation depths and then
put one value here, next value here, one value here, next value here, etcetera like this
it keeps on decreasing in this direction also keeps on decreasing in this direction. So,
let us say this was 1 hour duration, 10 minutes, 20 minutes, 30 minutes, 30 minutes may be
here and 40 minutes, 50 minutes, 60 minutes like this, we put the maximum value of the
precipitation at the 30 minutes and then keep decreasing on this direction and on this direction
that is, how we obtain the precipitation hyetograph, design hyetograph.
So, let us look at one example to drive home this point very simple procedure and except
that you do not forget that the basis for all of this is the design duration and then
from which we are picking up design intensity. So, let us look at a storm a 2 hour storm
in 10 minute increments for Bangalore city rainfall which I have discussed earlier and
we will again use the same rainfall data with a 10 year return period. So, you have 10 minute
intervals and then it is a 2 hour storm. So, 120 minutes is the total duration of the storm
and then we have 10 year return period. Now, we will just use the formula empirical formula
you can also pick this up from the IDF relationship that we derived in the previous lecture corresponding
to a 10 year return period and 10 minute 2 hour duration. So, 2 hour duration and associated
with 10 year return period you can go to the IDF relationship and pick up the intensity.
However we will use this empirical relationship here i is equal to KT to the power a divided
by t plus b to the power n. As I discussed in the previous lecture and for the Bangalore
city they have given these constants. So, we will use those constants exactly the same
way as we did in the previous class and then obtain for T is equal to 10 year duration
t is equal to ten minutes and which is recall to 0.167 hours, 10 minutes is in hours remember
the t that we are using here should be in hours. So, we get an i of 13.251 centimeters
per hour. So, in this case the units are centimeters per hour because this is a empirical relationship.
So, we must be always careful about the units that we are using.
Now starting with this intensity what we do is, what is it that we did, we took 10 minute
interval here sorry we took the 10 minute interval and for this 10 minute duration,
we obtain the intensity here. So, this is 13.251 centimeters per hour is a intensity
that is occurring in this duration. Similarly for 10 minutes to 20 minutes again, you obtain
the intensity and so on. So, at every interval of 10 minutes duration you calculate the precipitation
intensities. Now, the precipitation depth is obtained by simply taking intensity into
duration. So, 13.251 centimeters multiplied by the 10 minute interval 0.167 which is 2.208.
There is a point that you need to note now, let us say when you go to 20 minutes duration
you will put this as 20 the duration will be 20 minutes and it will not be uniformly
distributed because there is a non-linear expression here and therefore, you keep putting
t is equal to 10 minutes first 20 minutes, 30 minutes etcetera. So, you keep getting
the accumulated precipitation depths from here you get the intensities and then you
start getting the accumulated precipitation depths corresponding depths. So, you got for
10 minutes you got a precipitation depth of 2.208 centimeters that means, the first 10
minutes the precipitation is 2.208. Now we will go to t is equal to 20 minutes that means,
you have a for precipitation of 20 minutes you have you get 3.434 centimeters.
When use it for 20 minutes and then multiplied by the time duration which is 20 minutes you
get a precipitation depth of 3.434 which means what during the first 20 minutes the total
rainfall is 3.434, but during the first 10 minutes already 2.208 has occurred and therefore,
the remaining 10 minutes you will get a rainfall of 1.226 centimeters.
Like this we keep on computing what is the incremental rainfall that occurs during every
10 minutes in this particular case. Once we get that this is how we get let us say these
are the intensities 10 minute duration intensities 13.25, 20 minute 10. how did we get this,
this we get from the empirical relationship by putting duration is equal to 10 minutes,
20 minutes, 30 minutes etcetera. So, t d is put like this and then we obtain the intensities.
Then corresponding to this intensity we start getting the cumulative depths. So, this is
associated with the first 10 minutes, next 10 minutes, next 10 minutes etcetera. So,
these are the cumulative depths. From the cumulative depth we get the incremental
depth that means, during that ten minutes what has happened. So, let us say 0 to 10
minutes what is a incremental depth, now this is 2.208, 1.226 this will be this minus this,
this will be this minus this and so on. Like this we keep getting incremental depth 4.699
minus 4.194 that is what you will get here and so on. Like this you get the incremental
depth. Once we get this you look at the center point here, this is a center point put the
maximum value that has occurred at the center. So, essentially what we are doing is that
this value you look, this value is the maximum that has occurred here in the incremental
depth. So, put this at the center, center of the duration.
Then the next value we pick up 1.226 you put it here. So, this is 1.226 the next smallest
value which is 0.760 you put it here, put it here like this one by one you keep distributing
on either side of the maximum value until you reach the end points of the duration.
So, I repeat the maximum value you put at the center and then keep distributing the
remaining values in the decreasing order on either side of the maximum value and this
is how you get the hyetograph. So, for the first duration of 0 to 10 minutes you will
get a total precipitation of 0.069 as you can see the rainfall in precipitation depth
is slowly increasing it reaches the maximum and then starts decreasing and reaches the
minimum. So, that is how you obtain the design hyetograph.
So, this particular method is called as the alternating block method. We are alternating
we are taking one value here, one value here, one next year, next year and so on. So, it
is called as a alternating block method by far the simplest method of distributing the
design intensity that you obtain from the IDF relationship.
So, this is the hyetograph that you get. So, precipitation in centimeters on the y axis
time in minutes in the x axis this is the total duration is 2 hours. For the 2 hours
at 60 minutes you get the maximum value which is 2.208 and then you are distributing the
value. So, essentially this is the design hyetograph that you will use corresponding
to the design intensity of the rainfall. So, this is what we do with the IDF relationships
let us just quickly recapitulate what is the significance of the IDF relationship how we
obtain and then what is the use of the IDF relationship. IDF relationships are essentially
formed based on the extreme value distributions which means you are essentially looking at
the maximum precipitation depth, maximum flood volumes or the peak flood discharges and so
on. So, you are looking at maximum values now
these maximum values are used to construct the hydrographs by some procedure you construct
the hydrographs and then rout the hydrographs through your channel systems or rivers systems
etcetera to get hydrographs at various locations. So, any of your hydrologic designs essentially
requires the hydrographs, which are obtained from the peak flows. Now from the IDF relationships
you do not get the flows you what you get is the design intensities of the rainfall
how did we get we fix the duration which is a design storm duration corresponding to the
design storm duration corresponding to the return period you pick up the intensity.
Now, from the intensity of the rainfall we also know now how to distribute the rainfall
during the duration let us say that the storm duration was one hour and you picked up the
intensity of 30 millimeters per hour or some such thing then you use the alternate block
method and then construct the hyetograph, from the hyetograph you can move to hydrograph
and so on. So, this once you know the hyetograph how to construct the hyetograph etcetera are
not within the scope of this course stochastic hydrology we are essentially looking at only
how to use the probability concepts in hydrologic designs. So, we will leave that part aside
that will be covered in your basic hydrologic courses.
From the IDF relationship you get the peak intensities of the rainfall from the peak
intensities of the rainfall you know how to construct the design hyetographs from the
hyetographs you move on to hydrographs, hydrographs you rout through your channel networks and
then get your design criteria using your regular routing procedures and so on. Now, we will
come back to some of the topics that we discussed earlier. So, for the time being we will leave
aside the IDF relationships we have completed the portion on IDF relationship. Some time
ago in some previous lectures may be lecture number 21, 22 etcetera. We also discussed
about dependent and independent variables in hydrology and then how to formulate the
relationships between the two types of variables. For example, you may be considering runoff
at a particular location, which is dependent on rainfall during the entire catchment. Now,
we introduced the concept of linear regression at that time, where we build linear relationships
between the dependent variable runoff in that case and the independent variable rainfall.
So, typically we constructed equations of the type y is equal to a x plus b where x
is the independent variable and y is the dependent variable y can be runoff at location, at location
and x can be rainfall in the catchment.
Now, that was a simple linear regression simple because there is only one independent variable
and one dependent variable, linear because we are fitting linear equations there. Now
in hydrology there are situations where the dependent variable depends not on one, but
several independent variables let us look at the runoff itself. So, we were always talking
about runoff having being dependent upon the rainfall, but runoff will also depend on let
us say you look at a situation where you are looking at runoff at a particular location.
Let us say you are looking at a catchment here and then you are interested in the runoff
at this location and we were saying that the runoff will depend on the rainfall that is
falling on this catchment. So, we use this as the dependent variable and this as the
independent variable. It is known from your basic hydrology that
the runoff will not only depend on the rainfall, it will also depend on what kind of soil set
up there in terms of let us say antecedent moisture content, how much rainfall has already
fallen and how much soil moisture exists and what kind of vegetation is there, what kind
of slope is there and so on. So, and also on the evapotranspiration that takes place
and so on. So, the runoff at this location although simplistically we always relate it
with only one variable x and therefore, we were justified in using the linear regression
also the simple linear regression. Once we start looking at the details of it, we know
that it not only depends on the rainfall, but it depends on several other variables.
For example, it may depend on x 2 here x 1 may be rainfall x 2 may be soil moisture and
x 3 may be temperature in as much as it affects the evapotranspiration and so on.
x 4 may be the slope of the catchment itself and so on. So, y does not depend only on one
variable, but it depends on several variables. The concept of the simple linear regression
that we introduced earlier namely y is equal to a x plus b this kind of equation, we now
generalize and look at multiple linear regression, where the dependent variable y depends not
on one, but on several variables x 1, x 2, x 3 etcetera. So, we may write y is equal
to beta 1 x 1 plus beta 2 x 2 etcetera. Beta p x p there may be p variables on which the
dependent variable is dependent on. So, these are the p independent variables. From the
simple regression now, simple regression we are now graduating into multiple regression
where we are talking about p independent variables, but we still retain the linear structure of
the regression equation. So, this is a linear expression y is equal
to beta 1 x 1 etcetera. Beta 1 to beta p are all constants therefore, this becomes a linear
equation and therefore, we now introduce a multiple linear regression multiple because
there are multiple independent variables linear because it is a linear equation. So, that
is what we will do now.
So, we starting with this now we will go to multiple linear regressions. As I mentioned
you have a dependent variable and you have several independent variables. Now we should
use all these dependent variables, all these independent variables to model the independent
variable at this location this can be runoff at this location which is dependent on several
variables x 1, x 2, x 3, x 4 and so on. So, that is what we do. So, we develop a regression
equation for y in terms of x 1, x 2, x 3, x 4 etcetera up to x p.
How did we do the linear regression just let us recapitulate you had the observed data
on x and y. And we wanted to fit a equation of this type y i is equal to a plus b x i
and we put a y i we put a cap to y i to show that it is a predicted value there are observed
data x i and y i let us say rainfall and runoff both the observed data are there. So, this
is observed data, from the observed data you want to fit a best fit line here like this
and get a and b. So, your objective there was to obtain the parameters a and b.
How did we do that we formulated error this is the observed value y i and this is the
estimated value y i cap. So, the error is observed minus estimated we considered the
square errors and write it in terms of y i minus y i cap using y i cap expression a plus
b x i and so on. So, we formulate this expression and then minimize this m which is a sum of
square errors with respect to the parameters a and b and therefore, we differentiate this
m with respect to a and differentiate this with respect to b and set it equal to zero
to get the values a and b. So, that is what we did in our linear regression. So, we differentiate
with respect to a and differentiate with respect to b and get finally, a and b essentially
the principle remains the same. Please look up your earlier lecture notes,
lecture slides where we have discussed the simple linear regression we have also discussed
some examples on that when we go to multiple liner regression the principle still remains
the same except that we will start working with the matrices now, matrices and vectors.
Y is the observed value let us say you are looking at runoff at a particular location
and you have the observed series of runoff values and then you have x 1, x 2, x 3, etcetera.
These are independent variables for which the observed values are available let us say
x 1 is rainfall, x 2 is soil moisture, x 3 is temperature and so on. So, all these variables
which are the independent variables for these variables also you have the observed data.
Then you need to fix or estimate the parameters beta 1, beta 2 etcetera beta p such that,
the errors between the observed values and the predicted values predicted values are
predicted from the equation linear equation the error between the observed and the maximum
and the predicted is minimized in some overall sense that is where we considered the squared
errors and so on. So, precisely what we did in the simple linear regression we repeat
it for the multiple linear regression the principle remains the same the mathematics
also remain more or less the same.
So, we write the general linear model for the multiple regression as y is equal to beta
1 x 1 plus beta 2 x 2 etcetera beta p x p. x 1, x 2, x 3 etcetera x p are the independent
variables rainfall, soil moisture although rainfall and soil moisture may be related
we will consider that how to handle such situations in next topic where we were discussing how
do we handle the correlated variables right now we will assume that they are all uncorrelated.
So, x 1, x 2, x 3 etcetera these are all independent variables let us say rainfall, temperature,
catchment characteristics in terms of vegetation and so on.
The beta 1, beta 2 etcetera b p up to beta p are the unknown parameters for this particular
model. When we use this model to predict the dependent variable we put a cap. So, we say
that it is y cap and when we use that without the cap that is the observed values. So, we
also have observed values for runoff and we also have observed values for x 1 which is
rainfall, x 2 which is vegetation or let us say the catchment area x 3 which is temperature
and so on. So, we identify those particular independent variables which are influencing
the dependent variable, we have the historical or the observed data on these variables we
also have the historical or observed data on the dependent variable y.
Now, our task is to estimate the coefficients beta 1, beta 2, beta p etcetera beta up to
beta p. For every variable here y beta 1, x 1, x 2 etcetera. We have n observations
available with us let us say runoff at a particular location we have for the last 30 years we
have all the values available. Similarly concurrent values of rainfall are available, concurrent
values of catchment characteristics are available and so on. So, whatever are the independent
variables we have the observed values associated with the dependent variable.
Now, from these observed values then what we do is because there are n observed values
here we let us write n equations here. So, y 1, y 2 etcetera y n. What do we mean by
that y 1 may be the runoff during the first month of first year, y 2 may be runoff during
second month of first year and so on. So, like this we may have if you have 50 years
of data monthly data we have 12 into 50 which is 600 value. So, n becomes 600 corresponding
to each of these values we also have observed values on the independent variables. So, x
1 1 is the first value of the first variable, second value of the first variable, n-eth
value of the first variable similarly first value of the second variable, second value
of the second variable etcetera like this you have x one p up to x n p.
Let is say you have you are looking at 50 years of observed runoff data which means
you had 600 values for monthly runoff and x 1 1 is the first value of the first variable
which is rainfall, second value of the rainfall and six hundredth value of the rainfall and
this is the first value of let us say temperature, second value of temperature and so on. Like
this you get for different variables you get these values and therefore, you write n equations
of this form. Now, what we need to do is we want those p
variables beta 1, beta 2 etcetera beta p using this set of equations that we have so far.
Now n must be at least equal to p, p is a number of parameters which is generally much
smaller compare to n. So, a general guideline is that n must be at least three to four times
larger than a p. Now p is the number of parameters you know you may be typically talking about
two or three parameters in our hydrologic cases whereas, n is the number of data, data
values which can be which is generally significantly larger.
We write it in a slightly elegant form using the summation notation. So, we can write it
as y i is equal to j is equal to 1 to p beta j x i j that is, the i-eth equation we write
it in this form y i is equal to sigma j is equal to 1 to p beta j x i j then, we use
the matrix notation because this will be comfortable for this notation also converting the set
of equations in the matrix form will be convenient for us to derive the expressions for the constants
beta 1, beta 2 etcetera beta p. So, look at the matrix Y we have the n values.
So, it is a column vector Y is equal to n into 1 it has n rows here is equal to X each
of the p variables has n values. So, X is a matrix of n by p size into beta this is
capital beta. So, I write it as B which is a vector which is a matrix in this case it
is a row vector with I am sorry it is a column vector with p rows and one column. So, what
do you get you will get n into 1 here Y. Y is equal to X into beta and capital B denotes
the beta vector.
So, this is the expression that we use as the regression relationship. So, when we write
it in a matrix form in the long form y 1 to y n this is a n by 1 matrix 1 by n by one
vector and then you have n by p matrix and then p by 1 vector here. So, this is what
we write remember these are the observed values of the dependent variable and these are the
observed values for the independent variable and these are the parameters beta 1, beta
2 etcetera beta p.
Now, the when we use these equations we obtain the predicted values. So, we also write the
predicted values from the same equation y i is equal to beta j x i j summation j is
equal to 1 to p we write the predicted value as j is equal to 1 to p beta j x i j that
is when we use the expression we get the predicted value corresponding to that. So, there is
a observed value and there is a predicted value and therefore, the error is y i minus
y i cap. So, this becomes the error for the i-eth value.
Now, this is the error and then we consider the sum of square errors. So, this when we
write it in matrix notation let us say I have this Y as the observed values and my predicted
value is X beta. So, X into B is a predicted value. So, I write this as Y X beta cap which
is the estimated values estimated value and then I take it as a transpose e dash E because
I am squaring it. So, E dash E is what I take. So, Y minus X B cap dash Y minus X B cap now
this is the observed value and X B is the predicted value. So, this is a sum of square
errors now all right. So, the sum of square errors I differentiate with respect to beta
the matrix beta now and equated to 0 and that we will get it as when I equated to 0 this
for all i for all j is not necessary here if I write it to with respect to individual
j’s then this is necessary. So, we are writing it in terms of the matrix
notation. So, when I differentiate and equated it 0 we get the expression from here minus
2 X prime into Y minus X B cap this must be equal to 0. So, this we write it as X dash
Y is equal X dash X beta cap. So, this is what we get from the requirement that d by
d b sigma e i square is equal to 0 sigma e i square is written in matrix form as shown.
Now, look at this expression now X dash Y is equal to X dash X beta we are interested
in getting beat cap. So, from this I will write expressions for beta cap to do that
what we will do is use this expression and then bring it to a slightly more convenient
form by multiplying it with X dash X inverse on both sides why that because on this side
I have X dash X. So, I will make it one by taking the inverse of that and multiplying
it with this. So, X dash X inverse I will take and then pre multiplying with X dash
X inverse on both the sides I get X dash X inverse X dash Y is equal to X dash X inverse
X dash X beta cap from which I can write beta cap is equal to X dash X inverse X dash Y.
This capital B here indicates. In fact, that beta is a vector beta is a vector and therefore,
we are indicating this by B. Now X dash X is a p by p matrix and its rank must be p
for it to be inverted because we are interested in X dash X inverse and therefore, its rank
must be p for the matrix to be inverted. So, X dash X matrix when we write this is X which
is n by p matrix and we are getting a transpose of that and this is original n by p matrix,
we write it as i is equal to 1 summation x i square when you multiply this you will get
it like this x i 1 square and then x i to x i one etcetera. So, these are the terms
that you get for X dash X. So, once you get X dash X matrix you can obtain your beta which
is given by X dash X inverse X dash Y.
Recall that in our linear regression, we also obtained the best fit or the goodness of the
fit for the line that was given by the R square, which is the sum of square errors due to regression
and sum of square errors about the mean. So, that is what you get in multiple regression
also. So, we define some of squares of errors. So, we define R square as B dash X dash Y
minus this can be shown to be this expression B dash X dash Y minus n y bar square where
y bar is the mean of the y data. At X dash Y is what we have computed already X dash
Y is what we would have computed here and we get y cap which y bar which is the mean
of the observed data y is n by 1 matrix. So, you get one n by 1 column vector and you will
get y bar from that and Y dash Y is simply y square you are squaring the matrix and then
you get this value. Now, this is a scalar number. So, R square
you will get it as let us say 0.8, 0.9 and so on. So, this is a scalar number. So, when
you do this it will indicate the goodness of fit in some sense the higher the R square
value the better is the fit using those parameters. So, we know now how to estimate beta cap which
is the set of parameters and we know how to estimate the goodness of fit in terms of the
R square values. So, this completes the expressions for multiple linear regression. So, multiple
linear regression we use when there are multiple independent variables all affecting the single
dependent variable. So, you have a single dependent variable y and then x 1, x 2 etcetera
x p as independent variables, we have the expression y i is equal to or y cap is equal
to beta 1 x 1 1 plus beta 2 x 1 2 and so x 2 1 and so on. Like this we formulate the
expressions and then beta 1, beta 2 etcetera up to beta p.
To understand this better we will just solve a simple example here, we will take only two
independent variables and this can be generalized for any p number of independent variables.
We look at the mean annual flood in a particular watershed, what I mean by that is that let
us say we have collected data from several watersheds on the maximum discharge that has
occurred over previous. We are picking up the maximum discharge and then we want to
relate it with the area of the watershed and the rainfall. Remember these are different
watersheds and these are the rainfall that have caused this discharges in this particular
watersheds for example, a forty three centimeter rainfall occurring in a area of 324 hectares
in a watershed of 324 hectares has caused a peak discharge of 0.44 cubic meters per
second and so on. So, these are the interpretations.
So, these are different watersheds we are trying to relate the discharge peak discharge
with area of watershed and rainfall in centimeters. So, these two are independent variables rainfall
and the area are independent variables and the peak discharge is the dependent variable.
So, we formulate the expression Q is equal to beta 1 plus beta 2 A plus beta 3 R where
A is my area in hectares and Q I represent it as R that is the discharge. So, this can
be annual I am sorry Q is the dependent variable and A and R are independent variable. So,
I will express Q as a function of A and R. To obtain a intercept we make the first variable
as one that means, we wrote in our long form as beta 1 x 1 plus beta 2 x 2 etcetera.
If you make X 1 as 1 which means for all the n values you put X 1 as 1 then you get the
intercept associated with the first variable and that variable is beta 1 that parameter
will be beta 1. So, we get Q is equal to beta 1 plus beta 2 a plus beta 3 R. Now the task
is to obtain the values beta 1, beta 2, beta 3. So, we will express this in our usual notation
as Y as the dependent variable and X as independent variable matrix and then beta this is a capital
beta. So, B as the parameter vector. So, we have these values now these are the observed
values. So, this is y and this is x 1 and this is x 2. So, that is what we write here.
These are the observed values of y and as I said the first variable will make it as
1, we are writing essentially in terms of three variables now the first variable we
make it as 1. So, that we get beta 1 as the intercept itself and these are the second
variable which is area in this case and this is a third variable which is the rainfall.
So, we are writing y is equal to x into beta this is 12 by 1 and this is 12 by 3 and this
is 3 by 1 size of those matrices.
So, you look at this expression now we need to get X dash X inverse X dash Y. So, first
let us get independently all these variables all these matrices. So, we get X dash X inverse
X dash Y. So, X dash X is as I mentioned this is sigma x i 1 square when do this multiplication
you can write it in this is summation form we use this summations this is i is equal
to 1 to n of the first variable, i is equal to 1 to n and then you are multiplying here
the second variable and the first variable, i is equal to 1 to n you are multiplying here
third variable and the first variable and so on.
So, like this you get the summations you can use any of the spreadsheet programs to obtain
this and you get X dash X matrix. So, for this particular case you get the values X
dash X as 12, 5277 and so on then we take the inverse X dash X inverse you can use mat
lab program for this and then get the inverse of that. So, this is X dash X inverse from
this you obtain this.
Then you get X dash Y similar to what we obtain for X dash X you can express this in terms
of the summations. So, X dash Y will be obtained as summation i is equal to 1 to n y i and
so on like this. So, the X dash Y in this case turns out to be 8.06, 10642 and 417.
Once you get both X dash X inverse as well as say X dash Y you get B cap which is your
parameter vector. So, the parameter vector is obtained by this is X dash X inverse which
we obtain just now this is X dash X inverse this matrix and X dash Y, this is X dash Y.
So, this is 3 by 3 matrix, this is 3 by 1 matrix. So, you get a 3 by 1 matrix. So, beta
cap is obtained as 0.0351, 0.0014 and this 10 to the power minus 5.0135 into 10 to the
power minus 5 which means this is a beta 1, beta 2 and this is beta 3.
So, the expression that we can write in terms of our Q, A and R is Q is equal to beta 1
plus beta 2 into a plus beta 3 into R, R is the rainfall, A is the area in hectares. So,
from this expression we can estimate Q for any given A and R. So, let us look at how
these values look, that is using this expression now I want to examine what kind of errors
we are getting and so on individually.
Although remember when we apply the regression equations, we are interested in overall error
and that is why we are looking at the maximum fit which minimizes the sum of squared errors,
but let us also apply this equation for the actual data and then see how much error we
are getting. So, this is the observed value and this is the predicted value by predicted
value I mean I apply this equation for the given A and R and get the predicted value
and these are the errors. So, this is the type of error that I get if I use the expression
that we just derived. On of the Q, Q which is the peak discharge and these are the kind
of errors that we get.
Then we go next and then look at the goodness of fit in some sense that how good is this
particular regression. So, we will obtain R square which is given by B dash X dash Y
minus n y cap square by Y dash Y minus n y cap y bar square y bar is the mean. So, in
this case y bar is simply the mean of these values observed dependent variable values.
So, that is 0.672 and then in our case is twelve number of values. So, n y cap you can
y, y bar you can get and B dash we have obtained it, B we have obtained it as this vector 0.0351
etcetera. And B dash is the transpose of that. So, I write B dash as the transpose of that
and X dash Y we have obtained earlier and Y dash Y is simply Y transpose into Y which
comes out to be as scalar quantity which is 15.77. So, we know all of these values here,
that is B dash is given X dash Y is given you multiply these two you get 15.64 and when
n y cap y bar square. So, you will get 5.42 and so on. So, R square
transfer to be 0.99. Remember R square the closer to one the better is the fit. So, in
this particular case a good linear fit exist between the peak discharge, the area of the
watershed and the rainfall in the watershed. So, this is how we fit multiple linear regression
equations. Typically these situations arise when we are dealing with number of variables
for example, rainfall in a particular location dependent on several climatic variable for
example, mean sea level pressure it may depend on the geo potential height, it may depend
on the land pressures, it may also depend on the wind speed and so on. So, when we are
relating the hydrologic variables with climatic variables which quite often arises when we
are doing dealing with the climate change impacts.
We need to relate the hydrologic variables with the climate variables there are a large
number of variables which are affecting the hydrologic variables when we start looking
at the climatic variables. And therefore, we need to have statistical relationships
between the hydrologic variables and the climate variables and that is where we typically use
the multiple linear regressions and also some non-linear transforms also we use hopefully
in the towards the end of this course I will give some background on down scaling of climatic
variables. In that lecture we will discuss how we use multiple linear regression for
relating the rainfall in a particular location with the climatic variables and so on.
So, these are the situations where we use the multiple linear regression and this is
the background for that, the way we obtain the parameters and the way we assess how good
is a fit. Now a question arises, where we are dealing with a large number of variables
in the multiple linear regression. Let us say p is quite large let us say ten variables,
fifteen variables, twenty variables etcetera. And then some of them may be correlated with
each other let us say that one of the variables was soil moisture that you are using and then
another variable is evapotranspiration. Now evapotranspiration and soils moisture cannot
be treated as independent the evapotranspiration is in fact, dependent on the soil moisture
and then the more the evapotranspiration takes place the more the depletion in the soil moisture.
So, they are mutually dependent. So, if you are taking these kind of variables,
kinds of variables, where they are mutually dependent there is a correlation that exist
between correlation existing between the two variables and also you would not like to handle
large sizes. What do I mean by that let us say you have fourteen variables, fifteen variables
and each of them have data for hundreds of years then the size that you have to deal
with in the regression will be quite large. So, first to handle the dependence among the
variables and next to reduce the size of the problem. We divides certain methods and then
make both these possible and the one of the methods that I will be discussing in the next
lecture is the principle component analysis, which is a very powerful method for most of
the multiple linear regression techniques where you are interested in reducing the size
as well as addressing the dependence among the several variables.
So, in today is lecture essentially we saw how to start, how to obtain the rainfall hyetographs
starting with the IDF relationships. From the IDF relationships you obtain the intensity
and then we use the alternating block method to distribute this intensity across the duration
and , we want went on to the next topic which is the multiple linear regression starting
with the linear regression we have introduced several dependent variables in the linear
regression expression. And saw a method by which we obtain the parameters beta 1, beta
2, beta p etcetera. And also to obtain R square which gives the goodness of the fit for this
particular equation. So, in the next lecture we will continue this
discussion on multiple linear regression and introduce the principle component analysis
by which we will address the dependence of several variables and also address the issue
of the size of the problem itself can we reduce the size of the regression equations. So,
thank you very much for your attention will continue the discussion next time.