Mod - 07 lec - 30 multiple linear regression

Good morning and welcome to this the lecture number 30 of the course stochastic hydrology. In the previous lectures we have been discussing about the intensity duration frequency relationships that is the IDF relationships, which are essentially used for flood controlled designs typically for urban flood drainage etcetera. From where you would like to pick up the design intensity associated with a return period and associated with the particular design duration. We also saw the methods by which we can construct the IDF relationships and we considered an example of the Bangalore city rainfall using which we construct the idea of the relationships for a given rainfall data series. Now, today is lecture we will take this discussion forward and then look at once we look at what do we do with the design intensities that we have. So, obtained from the IDF relationships that is we fix a return period based on the type of design that we would like to make and then corresponding to that return period and for a given design duration we pick the intensity of the rainfall from the IDF relationships. Remember IDF relationships are obtained from use of extreme value distributions, typically the gumbel’s extreme value type one distribution is what we used in that. Now, starting with the intensities of the rainfall, we need to form what are called as the design hyetographs, which will give the distribution of the rainfall for the design duration. So, just do not lose sight of what we are doing from the design duration, we went to the IDF relationship for the given return period we picked up the intensity of the rainfall. Now we are coming back and then distributing that rainfall over the design duration, because the intensity that we picked is in fact, the maximum intensity. So, how this maximum intensity of rainfall is distributed over the design duration is what we will look at now. So, this is the procedure by which we construct the design hyetographs, most of you who would have gone through a basic course in hydrology will know the difference between hyetograph and hydrographs. So, hyetograph is the time distribution of the rainfall intensities, where as hydrographs are the time distribution where they show the graph of discharge verses time. So, we will start with the IDF relationship and then look at how we obtain the design hyetographs corresponding to the design intensities and this hyetographs. In fact, are used to construct the hydrographs and from the hydrographs we obtain the design dimensions. So, the idea of curves from the idea of curves we would now like to obtain the design precipitation hyetographs or design hyetographs from IDF relationship. As I mentioned the hyetograph is typically, it shows the rainfall intensity on the y axis and the duration on the x axis. We use what is called as a alternating block method, which is one of the methods to obtain the hyetograph. So, this is a method for developing a design hyetograph from IDF curve or IDF relationship, IDF in intensity duration frequency relationship. It specifies the precipitation depth occurring in n successive time interval duration of duration delta t each. So, let us say this is 10 minutes, this is 10 minutes, 10 minutes and so on. So, of equal intervals delta t, how the precipitation depth is distributed with respect to time and this is the total duration of the rainfall event. So, we are talking about a storm of duration from this point to this point, let us say it is of 1 hour duration, it is of 50 minutes duration and so on. And we have obtained the maximum intensity or the design intensity of the rainfall from the IDF relationship. How we distribute this across time for the duration of the rainfall is what we will see now. And the particular method we are discussing is called as the alternate, alternating block method. What we do in this is, we know the maximum intensity of rainfall from the intensity we get the precipitation depth corresponding to the duration t d let say we get i into t d as a precipitation depth i is intensity which has millimeters per hour or millimeters per time that is length per time is a units for that and time t d is the time. So, as we are progressing in time let us say this is 10 minute duration, 20 minute duration, 30 minute duration etcetera. So, associated with these we get the depths and then this depth we distribute in this time interval delta t. So, the precipitation that is occurring in this duration will be simply the precipitation at this time minus precipitation at this time that is what we do here. Like this we get intervals of precipitation that is first 10 minutes there is a certain precipitation in the next 10 minutes there is another certain precipitation and so on. What we then do is we arrange these precipitation depths, such that the maximum precipitation depth occurs at the center of the total duration, you have the total duration let us say of one hour and then your delta t is of 10 minutes. So, 10 minutes, 20 minutes, 30 minutes, 40 minutes etcetera up to 60 minutes. We obtain the maximum precipitation by looking at the t d 2 and t d 1 and so on. And then we arrange the precipitation in such a manner, that the maximum precipitation occurs at the center of the total rainfall duration let us say this is the total rainfall duration and we first fix the maximum value at the center. Then on either side we pick up the values from the precipitation depths and then put one value here, next value here, one value here, next value here, etcetera like this it keeps on decreasing in this direction also keeps on decreasing in this direction. So, let us say this was 1 hour duration, 10 minutes, 20 minutes, 30 minutes, 30 minutes may be here and 40 minutes, 50 minutes, 60 minutes like this, we put the maximum value of the precipitation at the 30 minutes and then keep decreasing on this direction and on this direction that is, how we obtain the precipitation hyetograph, design hyetograph. So, let us look at one example to drive home this point very simple procedure and except that you do not forget that the basis for all of this is the design duration and then from which we are picking up design intensity. So, let us look at a storm a 2 hour storm in 10 minute increments for Bangalore city rainfall which I have discussed earlier and we will again use the same rainfall data with a 10 year return period. So, you have 10 minute intervals and then it is a 2 hour storm. So, 120 minutes is the total duration of the storm and then we have 10 year return period. Now, we will just use the formula empirical formula you can also pick this up from the IDF relationship that we derived in the previous lecture corresponding to a 10 year return period and 10 minute 2 hour duration. So, 2 hour duration and associated with 10 year return period you can go to the IDF relationship and pick up the intensity. However we will use this empirical relationship here i is equal to KT to the power a divided by t plus b to the power n. As I discussed in the previous lecture and for the Bangalore city they have given these constants. So, we will use those constants exactly the same way as we did in the previous class and then obtain for T is equal to 10 year duration t is equal to ten minutes and which is recall to 0.167 hours, 10 minutes is in hours remember the t that we are using here should be in hours. So, we get an i of 13.251 centimeters per hour. So, in this case the units are centimeters per hour because this is a empirical relationship. So, we must be always careful about the units that we are using. Now starting with this intensity what we do is, what is it that we did, we took 10 minute interval here sorry we took the 10 minute interval and for this 10 minute duration, we obtain the intensity here. So, this is 13.251 centimeters per hour is a intensity that is occurring in this duration. Similarly for 10 minutes to 20 minutes again, you obtain the intensity and so on. So, at every interval of 10 minutes duration you calculate the precipitation intensities. Now, the precipitation depth is obtained by simply taking intensity into duration. So, 13.251 centimeters multiplied by the 10 minute interval 0.167 which is 2.208. There is a point that you need to note now, let us say when you go to 20 minutes duration you will put this as 20 the duration will be 20 minutes and it will not be uniformly distributed because there is a non-linear expression here and therefore, you keep putting t is equal to 10 minutes first 20 minutes, 30 minutes etcetera. So, you keep getting the accumulated precipitation depths from here you get the intensities and then you start getting the accumulated precipitation depths corresponding depths. So, you got for 10 minutes you got a precipitation depth of 2.208 centimeters that means, the first 10 minutes the precipitation is 2.208. Now we will go to t is equal to 20 minutes that means, you have a for precipitation of 20 minutes you have you get 3.434 centimeters. When use it for 20 minutes and then multiplied by the time duration which is 20 minutes you get a precipitation depth of 3.434 which means what during the first 20 minutes the total rainfall is 3.434, but during the first 10 minutes already 2.208 has occurred and therefore, the remaining 10 minutes you will get a rainfall of 1.226 centimeters. Like this we keep on computing what is the incremental rainfall that occurs during every 10 minutes in this particular case. Once we get that this is how we get let us say these are the intensities 10 minute duration intensities 13.25, 20 minute 10. how did we get this, this we get from the empirical relationship by putting duration is equal to 10 minutes, 20 minutes, 30 minutes etcetera. So, t d is put like this and then we obtain the intensities. Then corresponding to this intensity we start getting the cumulative depths. So, this is associated with the first 10 minutes, next 10 minutes, next 10 minutes etcetera. So, these are the cumulative depths. From the cumulative depth we get the incremental depth that means, during that ten minutes what has happened. So, let us say 0 to 10 minutes what is a incremental depth, now this is 2.208, 1.226 this will be this minus this, this will be this minus this and so on. Like this we keep getting incremental depth 4.699 minus 4.194 that is what you will get here and so on. Like this you get the incremental depth. Once we get this you look at the center point here, this is a center point put the maximum value that has occurred at the center. So, essentially what we are doing is that this value you look, this value is the maximum that has occurred here in the incremental depth. So, put this at the center, center of the duration. Then the next value we pick up 1.226 you put it here. So, this is 1.226 the next smallest value which is 0.760 you put it here, put it here like this one by one you keep distributing on either side of the maximum value until you reach the end points of the duration. So, I repeat the maximum value you put at the center and then keep distributing the remaining values in the decreasing order on either side of the maximum value and this is how you get the hyetograph. So, for the first duration of 0 to 10 minutes you will get a total precipitation of 0.069 as you can see the rainfall in precipitation depth is slowly increasing it reaches the maximum and then starts decreasing and reaches the minimum. So, that is how you obtain the design hyetograph. So, this particular method is called as the alternating block method. We are alternating we are taking one value here, one value here, one next year, next year and so on. So, it is called as a alternating block method by far the simplest method of distributing the design intensity that you obtain from the IDF relationship. So, this is the hyetograph that you get. So, precipitation in centimeters on the y axis time in minutes in the x axis this is the total duration is 2 hours. For the 2 hours at 60 minutes you get the maximum value which is 2.208 and then you are distributing the value. So, essentially this is the design hyetograph that you will use corresponding to the design intensity of the rainfall. So, this is what we do with the IDF relationships let us just quickly recapitulate what is the significance of the IDF relationship how we obtain and then what is the use of the IDF relationship. IDF relationships are essentially formed based on the extreme value distributions which means you are essentially looking at the maximum precipitation depth, maximum flood volumes or the peak flood discharges and so on. So, you are looking at maximum values now these maximum values are used to construct the hydrographs by some procedure you construct the hydrographs and then rout the hydrographs through your channel systems or rivers systems etcetera to get hydrographs at various locations. So, any of your hydrologic designs essentially requires the hydrographs, which are obtained from the peak flows. Now from the IDF relationships you do not get the flows you what you get is the design intensities of the rainfall how did we get we fix the duration which is a design storm duration corresponding to the design storm duration corresponding to the return period you pick up the intensity. Now, from the intensity of the rainfall we also know now how to distribute the rainfall during the duration let us say that the storm duration was one hour and you picked up the intensity of 30 millimeters per hour or some such thing then you use the alternate block method and then construct the hyetograph, from the hyetograph you can move to hydrograph and so on. So, this once you know the hyetograph how to construct the hyetograph etcetera are not within the scope of this course stochastic hydrology we are essentially looking at only how to use the probability concepts in hydrologic designs. So, we will leave that part aside that will be covered in your basic hydrologic courses. From the IDF relationship you get the peak intensities of the rainfall from the peak intensities of the rainfall you know how to construct the design hyetographs from the hyetographs you move on to hydrographs, hydrographs you rout through your channel networks and then get your design criteria using your regular routing procedures and so on. Now, we will come back to some of the topics that we discussed earlier. So, for the time being we will leave aside the IDF relationships we have completed the portion on IDF relationship. Some time ago in some previous lectures may be lecture number 21, 22 etcetera. We also discussed about dependent and independent variables in hydrology and then how to formulate the relationships between the two types of variables. For example, you may be considering runoff at a particular location, which is dependent on rainfall during the entire catchment. Now, we introduced the concept of linear regression at that time, where we build linear relationships between the dependent variable runoff in that case and the independent variable rainfall. So, typically we constructed equations of the type y is equal to a x plus b where x is the independent variable and y is the dependent variable y can be runoff at location, at location and x can be rainfall in the catchment. Now, that was a simple linear regression simple because there is only one independent variable and one dependent variable, linear because we are fitting linear equations there. Now in hydrology there are situations where the dependent variable depends not on one, but several independent variables let us look at the runoff itself. So, we were always talking about runoff having being dependent upon the rainfall, but runoff will also depend on let us say you look at a situation where you are looking at runoff at a particular location. Let us say you are looking at a catchment here and then you are interested in the runoff at this location and we were saying that the runoff will depend on the rainfall that is falling on this catchment. So, we use this as the dependent variable and this as the independent variable. It is known from your basic hydrology that the runoff will not only depend on the rainfall, it will also depend on what kind of soil set up there in terms of let us say antecedent moisture content, how much rainfall has already fallen and how much soil moisture exists and what kind of vegetation is there, what kind of slope is there and so on. So, and also on the evapotranspiration that takes place and so on. So, the runoff at this location although simplistically we always relate it with only one variable x and therefore, we were justified in using the linear regression also the simple linear regression. Once we start looking at the details of it, we know that it not only depends on the rainfall, but it depends on several other variables. For example, it may depend on x 2 here x 1 may be rainfall x 2 may be soil moisture and x 3 may be temperature in as much as it affects the evapotranspiration and so on. x 4 may be the slope of the catchment itself and so on. So, y does not depend only on one variable, but it depends on several variables. The concept of the simple linear regression that we introduced earlier namely y is equal to a x plus b this kind of equation, we now generalize and look at multiple linear regression, where the dependent variable y depends not on one, but on several variables x 1, x 2, x 3 etcetera. So, we may write y is equal to beta 1 x 1 plus beta 2 x 2 etcetera. Beta p x p there may be p variables on which the dependent variable is dependent on. So, these are the p independent variables. From the simple regression now, simple regression we are now graduating into multiple regression where we are talking about p independent variables, but we still retain the linear structure of the regression equation. So, this is a linear expression y is equal to beta 1 x 1 etcetera. Beta 1 to beta p are all constants therefore, this becomes a linear equation and therefore, we now introduce a multiple linear regression multiple because there are multiple independent variables linear because it is a linear equation. So, that is what we will do now. So, we starting with this now we will go to multiple linear regressions. As I mentioned you have a dependent variable and you have several independent variables. Now we should use all these dependent variables, all these independent variables to model the independent variable at this location this can be runoff at this location which is dependent on several variables x 1, x 2, x 3, x 4 and so on. So, that is what we do. So, we develop a regression equation for y in terms of x 1, x 2, x 3, x 4 etcetera up to x p. How did we do the linear regression just let us recapitulate you had the observed data on x and y. And we wanted to fit a equation of this type y i is equal to a plus b x i and we put a y i we put a cap to y i to show that it is a predicted value there are observed data x i and y i let us say rainfall and runoff both the observed data are there. So, this is observed data, from the observed data you want to fit a best fit line here like this and get a and b. So, your objective there was to obtain the parameters a and b. How did we do that we formulated error this is the observed value y i and this is the estimated value y i cap. So, the error is observed minus estimated we considered the square errors and write it in terms of y i minus y i cap using y i cap expression a plus b x i and so on. So, we formulate this expression and then minimize this m which is a sum of square errors with respect to the parameters a and b and therefore, we differentiate this m with respect to a and differentiate this with respect to b and set it equal to zero to get the values a and b. So, that is what we did in our linear regression. So, we differentiate with respect to a and differentiate with respect to b and get finally, a and b essentially the principle remains the same. Please look up your earlier lecture notes, lecture slides where we have discussed the simple linear regression we have also discussed some examples on that when we go to multiple liner regression the principle still remains the same except that we will start working with the matrices now, matrices and vectors. Y is the observed value let us say you are looking at runoff at a particular location and you have the observed series of runoff values and then you have x 1, x 2, x 3, etcetera. These are independent variables for which the observed values are available let us say x 1 is rainfall, x 2 is soil moisture, x 3 is temperature and so on. So, all these variables which are the independent variables for these variables also you have the observed data. Then you need to fix or estimate the parameters beta 1, beta 2 etcetera beta p such that, the errors between the observed values and the predicted values predicted values are predicted from the equation linear equation the error between the observed and the maximum and the predicted is minimized in some overall sense that is where we considered the squared errors and so on. So, precisely what we did in the simple linear regression we repeat it for the multiple linear regression the principle remains the same the mathematics also remain more or less the same. So, we write the general linear model for the multiple regression as y is equal to beta 1 x 1 plus beta 2 x 2 etcetera beta p x p. x 1, x 2, x 3 etcetera x p are the independent variables rainfall, soil moisture although rainfall and soil moisture may be related we will consider that how to handle such situations in next topic where we were discussing how do we handle the correlated variables right now we will assume that they are all uncorrelated. So, x 1, x 2, x 3 etcetera these are all independent variables let us say rainfall, temperature, catchment characteristics in terms of vegetation and so on. The beta 1, beta 2 etcetera b p up to beta p are the unknown parameters for this particular model. When we use this model to predict the dependent variable we put a cap. So, we say that it is y cap and when we use that without the cap that is the observed values. So, we also have observed values for runoff and we also have observed values for x 1 which is rainfall, x 2 which is vegetation or let us say the catchment area x 3 which is temperature and so on. So, we identify those particular independent variables which are influencing the dependent variable, we have the historical or the observed data on these variables we also have the historical or observed data on the dependent variable y. Now, our task is to estimate the coefficients beta 1, beta 2, beta p etcetera beta up to beta p. For every variable here y beta 1, x 1, x 2 etcetera. We have n observations available with us let us say runoff at a particular location we have for the last 30 years we have all the values available. Similarly concurrent values of rainfall are available, concurrent values of catchment characteristics are available and so on. So, whatever are the independent variables we have the observed values associated with the dependent variable. Now, from these observed values then what we do is because there are n observed values here we let us write n equations here. So, y 1, y 2 etcetera y n. What do we mean by that y 1 may be the runoff during the first month of first year, y 2 may be runoff during second month of first year and so on. So, like this we may have if you have 50 years of data monthly data we have 12 into 50 which is 600 value. So, n becomes 600 corresponding to each of these values we also have observed values on the independent variables. So, x 1 1 is the first value of the first variable, second value of the first variable, n-eth value of the first variable similarly first value of the second variable, second value of the second variable etcetera like this you have x one p up to x n p. Let is say you have you are looking at 50 years of observed runoff data which means you had 600 values for monthly runoff and x 1 1 is the first value of the first variable which is rainfall, second value of the rainfall and six hundredth value of the rainfall and this is the first value of let us say temperature, second value of temperature and so on. Like this you get for different variables you get these values and therefore, you write n equations of this form. Now, what we need to do is we want those p variables beta 1, beta 2 etcetera beta p using this set of equations that we have so far. Now n must be at least equal to p, p is a number of parameters which is generally much smaller compare to n. So, a general guideline is that n must be at least three to four times larger than a p. Now p is the number of parameters you know you may be typically talking about two or three parameters in our hydrologic cases whereas, n is the number of data, data values which can be which is generally significantly larger. We write it in a slightly elegant form using the summation notation. So, we can write it as y i is equal to j is equal to 1 to p beta j x i j that is, the i-eth equation we write it in this form y i is equal to sigma j is equal to 1 to p beta j x i j then, we use the matrix notation because this will be comfortable for this notation also converting the set of equations in the matrix form will be convenient for us to derive the expressions for the constants beta 1, beta 2 etcetera beta p. So, look at the matrix Y we have the n values. So, it is a column vector Y is equal to n into 1 it has n rows here is equal to X each of the p variables has n values. So, X is a matrix of n by p size into beta this is capital beta. So, I write it as B which is a vector which is a matrix in this case it is a row vector with I am sorry it is a column vector with p rows and one column. So, what do you get you will get n into 1 here Y. Y is equal to X into beta and capital B denotes the beta vector. So, this is the expression that we use as the regression relationship. So, when we write it in a matrix form in the long form y 1 to y n this is a n by 1 matrix 1 by n by one vector and then you have n by p matrix and then p by 1 vector here. So, this is what we write remember these are the observed values of the dependent variable and these are the observed values for the independent variable and these are the parameters beta 1, beta 2 etcetera beta p. Now, the when we use these equations we obtain the predicted values. So, we also write the predicted values from the same equation y i is equal to beta j x i j summation j is equal to 1 to p we write the predicted value as j is equal to 1 to p beta j x i j that is when we use the expression we get the predicted value corresponding to that. So, there is a observed value and there is a predicted value and therefore, the error is y i minus y i cap. So, this becomes the error for the i-eth value. Now, this is the error and then we consider the sum of square errors. So, this when we write it in matrix notation let us say I have this Y as the observed values and my predicted value is X beta. So, X into B is a predicted value. So, I write this as Y X beta cap which is the estimated values estimated value and then I take it as a transpose e dash E because I am squaring it. So, E dash E is what I take. So, Y minus X B cap dash Y minus X B cap now this is the observed value and X B is the predicted value. So, this is a sum of square errors now all right. So, the sum of square errors I differentiate with respect to beta the matrix beta now and equated to 0 and that we will get it as when I equated to 0 this for all i for all j is not necessary here if I write it to with respect to individual j’s then this is necessary. So, we are writing it in terms of the matrix notation. So, when I differentiate and equated it 0 we get the expression from here minus 2 X prime into Y minus X B cap this must be equal to 0. So, this we write it as X dash Y is equal X dash X beta cap. So, this is what we get from the requirement that d by d b sigma e i square is equal to 0 sigma e i square is written in matrix form as shown. Now, look at this expression now X dash Y is equal to X dash X beta we are interested in getting beat cap. So, from this I will write expressions for beta cap to do that what we will do is use this expression and then bring it to a slightly more convenient form by multiplying it with X dash X inverse on both sides why that because on this side I have X dash X. So, I will make it one by taking the inverse of that and multiplying it with this. So, X dash X inverse I will take and then pre multiplying with X dash X inverse on both the sides I get X dash X inverse X dash Y is equal to X dash X inverse X dash X beta cap from which I can write beta cap is equal to X dash X inverse X dash Y. This capital B here indicates. In fact, that beta is a vector beta is a vector and therefore, we are indicating this by B. Now X dash X is a p by p matrix and its rank must be p for it to be inverted because we are interested in X dash X inverse and therefore, its rank must be p for the matrix to be inverted. So, X dash X matrix when we write this is X which is n by p matrix and we are getting a transpose of that and this is original n by p matrix, we write it as i is equal to 1 summation x i square when you multiply this you will get it like this x i 1 square and then x i to x i one etcetera. So, these are the terms that you get for X dash X. So, once you get X dash X matrix you can obtain your beta which is given by X dash X inverse X dash Y. Recall that in our linear regression, we also obtained the best fit or the goodness of the fit for the line that was given by the R square, which is the sum of square errors due to regression and sum of square errors about the mean. So, that is what you get in multiple regression also. So, we define some of squares of errors. So, we define R square as B dash X dash Y minus this can be shown to be this expression B dash X dash Y minus n y bar square where y bar is the mean of the y data. At X dash Y is what we have computed already X dash Y is what we would have computed here and we get y cap which y bar which is the mean of the observed data y is n by 1 matrix. So, you get one n by 1 column vector and you will get y bar from that and Y dash Y is simply y square you are squaring the matrix and then you get this value. Now, this is a scalar number. So, R square you will get it as let us say 0.8, 0.9 and so on. So, this is a scalar number. So, when you do this it will indicate the goodness of fit in some sense the higher the R square value the better is the fit using those parameters. So, we know now how to estimate beta cap which is the set of parameters and we know how to estimate the goodness of fit in terms of the R square values. So, this completes the expressions for multiple linear regression. So, multiple linear regression we use when there are multiple independent variables all affecting the single dependent variable. So, you have a single dependent variable y and then x 1, x 2 etcetera x p as independent variables, we have the expression y i is equal to or y cap is equal to beta 1 x 1 1 plus beta 2 x 1 2 and so x 2 1 and so on. Like this we formulate the expressions and then beta 1, beta 2 etcetera up to beta p. To understand this better we will just solve a simple example here, we will take only two independent variables and this can be generalized for any p number of independent variables. We look at the mean annual flood in a particular watershed, what I mean by that is that let us say we have collected data from several watersheds on the maximum discharge that has occurred over previous. We are picking up the maximum discharge and then we want to relate it with the area of the watershed and the rainfall. Remember these are different watersheds and these are the rainfall that have caused this discharges in this particular watersheds for example, a forty three centimeter rainfall occurring in a area of 324 hectares in a watershed of 324 hectares has caused a peak discharge of 0.44 cubic meters per second and so on. So, these are the interpretations. So, these are different watersheds we are trying to relate the discharge peak discharge with area of watershed and rainfall in centimeters. So, these two are independent variables rainfall and the area are independent variables and the peak discharge is the dependent variable. So, we formulate the expression Q is equal to beta 1 plus beta 2 A plus beta 3 R where A is my area in hectares and Q I represent it as R that is the discharge. So, this can be annual I am sorry Q is the dependent variable and A and R are independent variable. So, I will express Q as a function of A and R. To obtain a intercept we make the first variable as one that means, we wrote in our long form as beta 1 x 1 plus beta 2 x 2 etcetera. If you make X 1 as 1 which means for all the n values you put X 1 as 1 then you get the intercept associated with the first variable and that variable is beta 1 that parameter will be beta 1. So, we get Q is equal to beta 1 plus beta 2 a plus beta 3 R. Now the task is to obtain the values beta 1, beta 2, beta 3. So, we will express this in our usual notation as Y as the dependent variable and X as independent variable matrix and then beta this is a capital beta. So, B as the parameter vector. So, we have these values now these are the observed values. So, this is y and this is x 1 and this is x 2. So, that is what we write here. These are the observed values of y and as I said the first variable will make it as 1, we are writing essentially in terms of three variables now the first variable we make it as 1. So, that we get beta 1 as the intercept itself and these are the second variable which is area in this case and this is a third variable which is the rainfall. So, we are writing y is equal to x into beta this is 12 by 1 and this is 12 by 3 and this is 3 by 1 size of those matrices. So, you look at this expression now we need to get X dash X inverse X dash Y. So, first let us get independently all these variables all these matrices. So, we get X dash X inverse X dash Y. So, X dash X is as I mentioned this is sigma x i 1 square when do this multiplication you can write it in this is summation form we use this summations this is i is equal to 1 to n of the first variable, i is equal to 1 to n and then you are multiplying here the second variable and the first variable, i is equal to 1 to n you are multiplying here third variable and the first variable and so on. So, like this you get the summations you can use any of the spreadsheet programs to obtain this and you get X dash X matrix. So, for this particular case you get the values X dash X as 12, 5277 and so on then we take the inverse X dash X inverse you can use mat lab program for this and then get the inverse of that. So, this is X dash X inverse from this you obtain this. Then you get X dash Y similar to what we obtain for X dash X you can express this in terms of the summations. So, X dash Y will be obtained as summation i is equal to 1 to n y i and so on like this. So, the X dash Y in this case turns out to be 8.06, 10642 and 417. Once you get both X dash X inverse as well as say X dash Y you get B cap which is your parameter vector. So, the parameter vector is obtained by this is X dash X inverse which we obtain just now this is X dash X inverse this matrix and X dash Y, this is X dash Y. So, this is 3 by 3 matrix, this is 3 by 1 matrix. So, you get a 3 by 1 matrix. So, beta cap is obtained as 0.0351, 0.0014 and this 10 to the power minus 5.0135 into 10 to the power minus 5 which means this is a beta 1, beta 2 and this is beta 3. So, the expression that we can write in terms of our Q, A and R is Q is equal to beta 1 plus beta 2 into a plus beta 3 into R, R is the rainfall, A is the area in hectares. So, from this expression we can estimate Q for any given A and R. So, let us look at how these values look, that is using this expression now I want to examine what kind of errors we are getting and so on individually. Although remember when we apply the regression equations, we are interested in overall error and that is why we are looking at the maximum fit which minimizes the sum of squared errors, but let us also apply this equation for the actual data and then see how much error we are getting. So, this is the observed value and this is the predicted value by predicted value I mean I apply this equation for the given A and R and get the predicted value and these are the errors. So, this is the type of error that I get if I use the expression that we just derived. On of the Q, Q which is the peak discharge and these are the kind of errors that we get. Then we go next and then look at the goodness of fit in some sense that how good is this particular regression. So, we will obtain R square which is given by B dash X dash Y minus n y cap square by Y dash Y minus n y cap y bar square y bar is the mean. So, in this case y bar is simply the mean of these values observed dependent variable values. So, that is 0.672 and then in our case is twelve number of values. So, n y cap you can y, y bar you can get and B dash we have obtained it, B we have obtained it as this vector 0.0351 etcetera. And B dash is the transpose of that. So, I write B dash as the transpose of that and X dash Y we have obtained earlier and Y dash Y is simply Y transpose into Y which comes out to be as scalar quantity which is 15.77. So, we know all of these values here, that is B dash is given X dash Y is given you multiply these two you get 15.64 and when n y cap y bar square. So, you will get 5.42 and so on. So, R square transfer to be 0.99. Remember R square the closer to one the better is the fit. So, in this particular case a good linear fit exist between the peak discharge, the area of the watershed and the rainfall in the watershed. So, this is how we fit multiple linear regression equations. Typically these situations arise when we are dealing with number of variables for example, rainfall in a particular location dependent on several climatic variable for example, mean sea level pressure it may depend on the geo potential height, it may depend on the land pressures, it may also depend on the wind speed and so on. So, when we are relating the hydrologic variables with climatic variables which quite often arises when we are doing dealing with the climate change impacts. We need to relate the hydrologic variables with the climate variables there are a large number of variables which are affecting the hydrologic variables when we start looking at the climatic variables. And therefore, we need to have statistical relationships between the hydrologic variables and the climate variables and that is where we typically use the multiple linear regressions and also some non-linear transforms also we use hopefully in the towards the end of this course I will give some background on down scaling of climatic variables. In that lecture we will discuss how we use multiple linear regression for relating the rainfall in a particular location with the climatic variables and so on. So, these are the situations where we use the multiple linear regression and this is the background for that, the way we obtain the parameters and the way we assess how good is a fit. Now a question arises, where we are dealing with a large number of variables in the multiple linear regression. Let us say p is quite large let us say ten variables, fifteen variables, twenty variables etcetera. And then some of them may be correlated with each other let us say that one of the variables was soil moisture that you are using and then another variable is evapotranspiration. Now evapotranspiration and soils moisture cannot be treated as independent the evapotranspiration is in fact, dependent on the soil moisture and then the more the evapotranspiration takes place the more the depletion in the soil moisture. So, they are mutually dependent. So, if you are taking these kind of variables, kinds of variables, where they are mutually dependent there is a correlation that exist between correlation existing between the two variables and also you would not like to handle large sizes. What do I mean by that let us say you have fourteen variables, fifteen variables and each of them have data for hundreds of years then the size that you have to deal with in the regression will be quite large. So, first to handle the dependence among the variables and next to reduce the size of the problem. We divides certain methods and then make both these possible and the one of the methods that I will be discussing in the next lecture is the principle component analysis, which is a very powerful method for most of the multiple linear regression techniques where you are interested in reducing the size as well as addressing the dependence among the several variables. So, in today is lecture essentially we saw how to start, how to obtain the rainfall hyetographs starting with the IDF relationships. From the IDF relationships you obtain the intensity and then we use the alternating block method to distribute this intensity across the duration and , we want went on to the next topic which is the multiple linear regression starting with the linear regression we have introduced several dependent variables in the linear regression expression. And saw a method by which we obtain the parameters beta 1, beta 2, beta p etcetera. And also to obtain R square which gives the goodness of the fit for this particular equation. So, in the next lecture we will continue this discussion on multiple linear regression and introduce the principle component analysis by which we will address the dependence of several variables and also address the issue of the size of the problem itself can we reduce the size of the regression equations. So, thank you very much for your attention will continue the discussion next time.