Mode - 07 lessons - 35 multivariate stochastic models - Iii

Welcome to this lecture number 35 of the course stochastic hydrology. Now, in the previous few lectures, we have been discussing the multivariate stochastic models which are essentially useful in generating stream flows in a let say catchment, where you are interested in generating simultaneously flows in several streams. These streams that we are considering, may have, in fact, they do have their own correlation structures. So, you would like to preserve the correlation structure between a stream a and a stream b in the same catchment. This type of models are essentially useful when you are planning for large scale water resources development in a particular catchment. What I mean by that is, let us say that you have a huge river basin such as Narmada river basin or the Brahmaputra river basin etcetera, where several tributaries are joining. Also, you may have a hydrologically homogeneous region in which there is a correlation structure between among several streams. Then, you would like to develop this entire catchment by putting, let say reservoirs on several streams and so forth. In such situations, you would like to generate flows on all of these streams, that is synthetically generate stream flows on all of these streams simultaneously. What do I mean by simultaneously? We have studied in our earlier courses models such as the thomas-fiering model for example, in which it was the single-site model. The single-site models preserve the statistics of that particular stream, if we are talking about the stream flow generation. The multi-site models on the other hand, apart from preserving the statistics of the individual streams, they also preserve the correlations, that is cross correlations among these different streams. If I have, let say if I have a catchment in which we have two streams. Let us understand this correctly. Let say that we have a catchment something like this. Then, I am interested in generating flows at this location as well as I am interested in generating flows at this particular location. So, let say, I have a and b. Then, we may also have a situation where I may be interested in generating flows at this location c, by taking into account the correlation structure among a, b, as well as c. In our single site models, what we would have done? We would have taken the flow records at this particular site c and then, used a single-site model, such as for example, the Thomas-Fiering model. Then, use the Thomas Fiering model to generate the flows which would have preserved the mean, the variance, as well as the lag 1 correlation of the flows at this site. However, when we come to the multi-site generation, when generating the flows at site c, we would also take into account and indeed preserve the correlations that the flows at site c have with flows at site b as well as flows at site a. Similarly, when we are generating flows at site b, we will preserve the correlations of the flows at site b with respect to the flows at site a, as well as with respect to site c. So, there is a special correlation as well as there is a temporal correlation. The temporal correlation is, in fact, the auto correlations that we are talking about. So, when we are focusing our attention on c for example, on the flows at site c, it will not only preserve the auto correlations of the flows with respect to themselves across time, but, it will also preserve the models and will also preserve the cross correlations with respect to flows at b, as well as with respect to flows at a. So, when you suspect that the flows have cross correlations and you would like to generate flows on both these sites a and b and a and c etcetera, then you need to look at multi-site models. So, we are essentially generating flows simultaneously at site a, site b and site c and so on. Now, these kind of problems will be very useful when we are talking about large scale water resource development in which case, in which situations, we will be interested in simultaneous generation of flows. So that, you can examine the implications of putting several reservoirs. Let say that, I would like to put a reservoir here, reservoir here, reservoir here and so on. So, you can examine the implications of developing this kind of water sheds or the catchments simultaneously. So, in the last class, specifically what did we see? In lecture number 34, we examined starting with the single-site model, we went on to multi-site model, typically the Markov model. That is multi-site Markov model is what we discussed in the last class. The structure of the model looks like this. x t plus 1 is equal to e x t plus g into epsilon. Now, this is the multi-site Markov model that we have discussed in the previous class. Remember, e is not the expectation here. It is a p by p diagonal matrix and x t is a p by 1 vector and p is the number of sites. So, let say that you have a case where you are talking about generation of flows in site number 1, site number 2, site number 3 etcetera. So, like that you have p number of sites. So, at every site you have a time series of flows available. These are the observed flows. That is what defines the vector x t here. x t is a p by 1 vector of standardized values of the variable generated at time t and we are writing this expression for time t plus 1. So, x t plus 1, which will generate at all of these p sites, and the flows at time t plus 1 using the flows at time t plus some random component and e and g are coefficient matrices. Now, in this model that we have already discussed, e is a p by p diagonal matrix, which has the j-eth diagonal element, which is rho j 1. What is rho j 1? It is a lag 1 serial correlation at site j. Similarly, g is a p by p diagonal matrix, which j-eth diagonal element is given by 1 by root of 1 minus rho j square and rho j 1 is the lag 1 serial correlation at the site j. A more general form and which is also very useful is the Matalas model. This is what we introduced in the last class. Starting with this kind of a structure, Matalas has introduced a more general form of the multi-site model. In fact, in this, we do not assume that it is a Markov process, although we relate x t plus 1 with x t here. The coefficient matrices a and b, in fact, reflect the cross correlation structure in the multi-site situation. So, this is what we were talking about in the last lecture towards the end. So, we introduce the Matalas model x t plus 1 is equal to a x t plus b epsilon t plus 1, which is similar to the Markov model that we have discussed earlier. Except that, the epsilon t plus 1 will be independent of x t now, in this case. As we progress, we will discuss that more. So, we are talking about this particular model in the context of simulating or synthetically generating flows at several sites simultaneously. These kind of models are typically used for large time intervals. Let say, seasonal flows or the annual flows etcetera, which are essentially useful for planning purposes. Now, this is a form that has given by Matalas. This is a multi-site normal generation model and it preserves the mean, the variance, lag 1 serial correlation and lag 1 cross correlation and lag 0 cross correlation. Distinguish between the serial correlation and the cross correlation. I keep repeating this because, serial correlation is important when we are talking about multi-site models. In the sense that, it has to preserve the structure of the time series at that particular location as well as it must preserve the cross correlation with the other time series. Therefore, both serial correlations as well as cross correlations become important in this. Now, we write x t plus 1 is equal to a x t plus b epsilon t plus 1. Let us go through this model in detail now, although I would have introduced it in the last class. Now, x t and x t plus 1 are p by 1 vectors. Remember, p is the number of sites representing standardized data corresponding to p sites at time step t and t plus 1, respectively. When talking about the Thomas Fiering model and the other arma type of models, I have told earlier that we write these models always for standardized data. So, x t and x t plus 1 are, in fact, the standardized data. Recall that, by standardization, I mean deducting the mean and dividing with by the standard deviation. For example, x t, when I have a value x t, let say I want to write x t and the standardized value of x t, I can write this as x t s is equal to the value of x t itself. This is observed value and minus x bar, which is a mean and divided by the standard deviation. So, all of these models, these type of models are written for standardized data. So, from the observed values, you standardize the data and then, we apply this kind of models. Now, there is a important assumption here. That is, the model is multi variate normal. It means that, all the variables that you are using, all the flow variables, the x t’s, they are all normally distributed random variables. So, we use these assumption and that is why, you know, essentially we use this kind of models for large time steps. For example, annual stream flows, seasonal stream flows etcetera, where the assumption of these variables being normal is justified. If you use for example, daily rainfall and such processes, which are, which cannot be approximated by normal distribution, then your models would be wrong. So, you must keep this in mind that, these kind of models, multi variate stochastic models that I am discussing here, specially the Matalas model in this particular form must be used essentially for long time interval processes, such as the seasonal flows, annual flows and so on. Now, the epsilon t plus 1 here, in this expression, is a normally distributed random variate with 0 mean and unit variance. So, it is n 0 1. It is p by 1 vector because, you have p sites and epsilon t plus 1 is what we are talking about. So, every site you have a standard normal deviate. An additional assumption here is that, epsilon t plus 1 is independent of x t. Your structure requires x t as well as epsilon t plus 1 on the right side. So, the epsilon t plus 1 here is independent of x t. This will be a useful assumption when we derive expressions for a and b as we will presently see. Now, a and b are coefficient matrices and both of these are of size p by p. As you can see here, this is p by p, this is p by p, this is p by p, this x t is p by 1. So, you are having p by p here and p by 1, p by p and p by 1 and this will be p by 1 matrix. Now, which means what? In this model now, if you specify a and b, which are the coefficient matrices, epsilon t plus 1 is obtained from your standard normal deviate tables. So, epsilon t plus 1 can be obtained and x t is the previous value. It is a matrix of previous values. Your model is completely specified. So, our objective now is to specify a and b in terms of the data. So, which means that, if you have the observed flows at all of these p sites, for some time period, let say t is equal to 1, t is equal to 2 etcetera, t is equal to n. So, at each of the p sites, you have the observed time series. There is time series of the observed flows. We would use these observations, available observations to determine a and b. So, that it is the objective with which we will proceed now. Now, in the last class, I introduced m naught and we defined m naught as expected value of x t x t prime. x t prime is a transpose. So, expected value of x t x t prime is nothing but, the cross correlation matrix of lag 0 because, you do not have a x t plus k here, x t x t prime. Therefore, you are talking about the cross correlation between x t with itself, which is cross correlation at lag 0 or in fact, it is a covariance. Now, the i j-eth element of this matrix m naught is therefore, from this we can write it as, in terms of the data, these are the observed values of flows. So, this is x t is q i t minus q i bar by s I, where q i t is the observed flow at the site i in time period t. q i bar, is the mean of the flows at site i, which is simply, you sum over all the time period divided by the number of time periods, you will get q i bar. s i, is the standard deviation of the flows at site i and similarly, for site j. So, this is m naught. That is, the i j-eth element of the matrix m naught. Now, similarly, we will examine another matrix m 1. We will define another matrix m 1. Both of which together, that is m naught as well as m 1 together, will be useful in determining a and b. So, our objective of introducing this is, finally to write some useful expressions, elegant expressions for determining a and b. The way I defined m naught was, I took expected value of x t x t prime. Now, I will lag it by 1 time period and write expected value of x t with x t minus 1 prime. So, I will define m 1, which is a matrix p by p as expected value of x t x t minus 1 prime. So, I am lagging it by 1 time step and then taking the expectation. Similar to what we did for the m naught matrix, the i j-eth element of the m 1 matrix now, will be simply expected value of x i t x j t minus 1. That is, i j-eth element. This we determine, this we are able to write because, expected value of a matrix is, in fact, matrix of expected values of individual elements. So, that is how we write. Much the same way as we wrote m naught i j, we write now m 1 i j. So, we take the flow at i-eth site and the flow at j-eth site. This is the time period t and this is the time period t minus 1. Then, we write this expression because, we are lagging it by 1 and we start it from t is equal to 2. So, I get the elements i j for i is equal to 1 to p, as well as j is equal to 1 to p, we get all the elements from this expression. Therefore, m 1 is defined completely and m 0 is defined completely. Now here, I have written in terms of q, which is a flow. Here, I have written x, although means the flows. So, x is the same as the q that we have written earlier. Remember that, m 1 as well as m naught are completely defined once we have the data. That means, you will be able to form the matrices m naught as well as m 1, just based on the observed data. What is the form of the data that we have? At each of the p sites, we have a time series, t is equal to 1 to n. Let say, you have annual flows at each of the p site and let say you have forty years of flows. So, t is equal to 1, t is equal to 2, t is equal to 3 etcetera, up to forty years at site number 1, at site number 2 etcetera, up to site number 40. So, this is how you define m naught as well as m 1. Now, once we define m naught and m 1, we will now go back to original expression, where we wrote x t plus 1 is equal to a x t plus b epsilon t plus 1 and then, see how we can determine a and b using m naught and m 1. So, this is m 1. The way I wrote here is, in fact, these are the standardized flows. So, I will write it in terms of the original flows, which is q i t q i minus etcetera, much the same way we did for m naught. Except that, you must remember here that these are for two different sites which, in fact, is the case with m naught also. q is the original random variable; that means, we may be talking about annual stream flow or some such thing. From this, you can see that m 1 is, in fact, the cross correlation matrix of lag 1. At lag 1, you take the cross correlation and m naught is at lag 0. All right. Now, we will look at this model now. x t plus 1 is equal to a x t plus b epsilon t plus 1. This is a p by 1 matrix. This is p by p, p by 1, p by p, p by 1. Now, what we will do is, just to make sure that we get expressions for a and b in terms of your m naught and m 1. Do not lose the fact, do not lose the site that m naught and m 1 are completely determined based on the data. So, the moment you have data, you can determine m naught and m 1. Therefore, we should be able to express a and b in terms of m naught and m 1. To do that, what we will do is, we will multiply, post multiply in fact, by x t prime and then, we will take the expectation. So, I multiply x t plus 1 with x t prime is equal to a x t x t prime plus b epsilon t plus 1 x t prime. I am post multiplying all through by x t prime. Then, we will take a expectation. So, expectation of x t plus 1 x t prime is equal to a comes out. It is a constant. Expected value of x t x t prime plus b expected value of epsilon t plus 1 x t prime. Now, look at this, expected value of epsilon t plus 1 x t prime. We said, right at the outside that epsilon t plus 1 is independent of x t. Therefore, the expected value of epsilon t plus 1 x t prime must be equal to 0. Why? Expected value of, let say two random variables x and y, if they are independent, will be equal to expected value of x into expected value of y and expected value of one of the random variables is 0. Because, we said epsilon t plus 1 has n 0 1. Therefore, its expected value is 0. Therefore, this expectation becomes 0. So, the second term vanishes here. We get here a expected value of x t x t prime. What is x t x t prime? It is m naught. We have defined m naught to be expected value of x t x t prime. Therefore, this term is nothing but, a m naught. What is this now? Expected value of x t plus 1 x t prime. We have defined m 1 as expected value of x t x t minus 1 prime. Therefore, I can also write this as expected value of x t plus 1 x t prime. Just lag it by 1 time step and therefore, x t plus 1 x t prime, the expected value of that will be written as m 1 and this is a and this is m naught and this becomes 0. So, I will be able to write m 1 is equal to a m naught. Therefore, I should be able to get a as m 1 m naught inverse. So, once you define the data, once you have the observed data, you have defined m naught and you have defined m 1. From m naught, you get m naught inverse and m 1 is defined. Therefore, you should be able to get a. So, a is completely fixed. So, just based on the data, without doing much exercise, you will be able to get the coefficient matrix a. But the coefficient matrix a itself, because it is determined from m naught and m 1, will incorporate the cross correlation structure. Because, we defined m naught as cross correlation at lag 0, which is in fact, the covariance and m 1 as a cross correlation at lag 1. Both of these are included in the matrix a, in the coefficient matrix a. Now, let us see what we do with the coefficient b. So, this is our model, x t plus 1 is equal to a x p plus b epsilon t plus 1. Now, we will post multiply with x t plus 1 prime on both sides and take expectation. So, I will multiply this with x t plus 1 prime x t plus 1 into x t plus 1 prime and a x t into x t plus 1 prime plus b epsilon t plus 1 into x t plus 1 prime. Now, we take the expectation. So, I will write this as expected value of x t plus 1 x t plus 1 prime is equal to a into expected value of x t x t plus 1 prime plus b into expected value of epsilon t plus 1 x t plus 1 prime. What is this now? We defined m naught to be expected value of x t x t prime and therefore, is the same time step. Therefore, I will be able to write this as m naught, which is expected value of x t plus 1 x t plus 1 prime. Then, we have expected value of x t x t plus 1 prime. We have another x expected value of epsilon t plus 1 and x t plus 1 prime. This expectation, in fact, expected value of epsilon t plus 1, if we had x t prime here, it would have been 0. Because, we said epsilon t plus 1 is independent of x t. But, epsilon t plus 1 x t plus 1 prime, when you take the expectation of that, that will not be 0. Remember this because, epsilon t plus 1 need not be independent of x t plus 1. Epsilon t plus 1 is independent of x t in this structure and therefore, this expectation is not necessarily zero. Similarly, we had defined x t with x t plus 1. I am sorry, I repeat that. Expected value of x t with x t minus 1 prime as m 1. But, this is not m 1 because, you have x t in x t plus 1 prime. Therefore, we need to evaluate both these expectation. Namely, expectation of x t x t 1 plus 1 prime and epsilon t plus 1 x t plus 1 prime separately. So, let us determine these two expectations. The left hand side is m naught. So, we will start with this expectation. Expectation of x t x t plus 1 prime. We define m 1 to be expected value of x t x t minus 1 prime. So, m 1 prime will be expected value of x t x t minus 1 prime, the whole prime. That is, expected value x t x t minus 1 prime prime. This is a matrix. So, I am taking the transpose of that. This is a matrix and I am taking the transpose of this. Now, because this is a matrix. I write this as expected value of x t x t minus 1 prime prime here. This, I will be able to write it as expected value of x t minus 1 x t prime. Why? From your matrix algebra, you should know that, let say I have a situation like this, a b transpose. This will be simply b transpose a. So here, I have x t x t minus 1 transpose and that transpose. Therefore, I will write x t minus 1 transpose and transpose. So, x t minus 1, that transpose of x t minus 1 transpose, which is x t minus 1 and x t transpose. So, this is from, this follows from here. I think, I will have to write it as b dash and a dash. That is how it becomes x t transpose here. So, I will be able to get from here, m 1 dash is equal to expected value of x t minus 1 into x t transpose. So, I can also write it as m 1 dash is equal to expected value of x t x t plus 1 dash. Why I need this expectation? From your expression, I need expected value of x t x t plus 1 dash and therefore, I will be able to write this as m 1 dash. So, m 1 dash is expected value of x t x t plus 1 dash. We are now focusing on expected value of epsilon t plus 1 x t plus 1 prime. So, we will write this term as, epsilon t plus 1 x t plus 1 prime is equal to epsilon t plus 1, I will take it out and x t plus 1 prime from our model, x t plus 1 is a x t plus b epsilon t plus 1. So, from the model, I write this as a x t plus b epsilon t plus 1. I am talking about the transpose of that, so, I will take a transpose of that. So, this will be epsilon t plus 1. Again, the way we wrote a b prime is equal to a dash, b dash and that form we use. Therefore, this will be x t prime into a prime plus this epsilon t plus 1 into epsilon t plus 1 prime into b prime. So, this is what we get as epsilon t plus 1 x dash t plus 1. Now, on this we take expectation on both sides. So, expected value of epsilon plus 1 x dash t plus 1, I write now as expected value of epsilon t plus 1 x dash t a dash, this term, plus epsilon t plus 1 epsilon t plus 1 dash and b dash. Now, this will be equal to, what is this now? This a dash is a constant. So, it comes out. So, I will take expected value of epsilon t plus 1 and x dash x dash t, which is essentially, we are talking about the expected value of epsilon t plus 1 with respect to x t. Because, epsilon t plus 1 and x t are independent, and the expected value will become 0 because, expected value of epsilon t plus 1 is 0. Much is the same way as we did for determining m 1. So, this becomes 0. This term becomes 0. Now, look at this term now. This term has expected value of epsilon t plus 1 epsilon t plus 1 dash and b dash. Now, b dash comes out. So, I will write it as b dash expected value of epsilon t plus 1 epsilon t plus 1 dash, which is essentially the variance, expected value of epsilon t plus 1 with itself. That is a variance. Because v it is a vector, I write it as i b dash. So, b dash comes out and I am writing this as expected value of epsilon t plus 1, epsilon t plus 1 dash. Therefore, the second term that we discussed in the earlier expression, namely this term can be written as, the expected value term can be written as simply equal to b dash. Now, we use this and rewrite our expression then. So, what was our expression? We were writing this particular expression now. This is m naught and this we have determined and this is b dash. So, we write this particular expression again and then, write this as m naught is equal to a m 1 dash plus this b and this we have written as b dash. So, m naught is equal to a m 1 dash plus b b dash. In which, we have already determined a, which is m 1 m 1 m naught inverse. So, a has been completely determined. So, we write m naught is equal to a m prime m 1 prime plus b b prime. Then, using a is equal to m 1 m naught inverse, I write m naught is equal to m 1 m naught inverse, which is a into m 1 prime plus b b prime. Therefore, b b prime is equal to m naught minus m 1 m naught inverse m 1 prime. So, this is how you get b b prime. Remember, all of these exercise we have been doing to determine or to specify the coefficient matrices a and b. Once a coefficient matrices a and b are fixed, then your model is completely fixed and we want to determine a and b from the data. To do that, we have defined m naught and m 1 and both of which can be determined completely from the data, from the cross correlations at lag 0 and lag 1. Now, we are trying to get another expression for b, also in terms of m naught m 1. So, b b dash is equal to m naught minus, this is the expression that we got. Let say that we write c is equal to b b dash. This is one matrix that I will write, which is a p by p matrix, where p being the number of sites. So, I write c is equal to m naught minus m 1 m naught inverse m 1 dash. Now, from here you can see, if we write like this, the matrx b does not have a unique solution. Which means, it can have several solutions which will satisfy this particular expression. Several alternate solutions, all of which will satisfy this particular expression. So, one of the ways is to assume a particular structure for the matrix b. In this particular model, the Matalas model, we typically assume b to be a lower triangular matrix. So, one method is to assume b to be a lower triangular matrix. What is the lower triangular matrix? From the principle diagonal, if you take all the elements to the lower side of the principle diagonal matrix, will all be non zero. This is a lower triangular matrix. For example, we have this and all other elements on the right hand side will all be 0. Right hand side and to the above, the principle diagonal diagonal of that matrix, this is a principle diagonal and all of these elements will be zero and all these elements will be non zero. That is a lower triangular matrix. So, if you assume b to be a lower triangular matrix, then b dash will become an upper triangular matrix, where only the elements above the principle diagonal will be non zeros. Then, you write b b dash as the matrix b as well as multiplied by the matrix b dash. The matrix b is p by p matrix and matrix b dash is also p by p matrix. Therefore, c which is equal to b b dash is also a p by p matrix. Now, the elements of c, we write it as c 1 1, c 1 2 etcetera c 1 p. Similarly, c p 1, c 2, c p 2 etcetera c p p. So, the matrix c 1 1 c the matrix c is completely determined by the from the data because, we know c as m naught minus m 1 m naught inverse m 1 prime. m naught is completely determined from the data and m 1 is completely determined. Therefore, m naught inverse can be got and m 1 prime can be got. Therefore, the matrix c is completely different. The moment you have determined the cross correlations and matrices m naught and m 1, the matrix c is completely defined. Therefore, using the c matrix, now we have to determine the matrix b and that is the exercise we are doing now. Now, look at this. The first element c 1 1 will be equal to b 1 1 into b 1 1, which is b 1 1 square. So, I will be able to determine b 1 1 as square root of c 1 1. So, we will write this as b 1 1 is equal to c 1 1 to the power half. So, first I determine b 1 1 and then, I go to b 2 1 and then, I determine b 2 2. Similarly, I determine b 2 3, b 3 1, b 3 2, b 3 3 and come to next b 4 1, b 4 2, b 4 3, and b 4 4 like this. All the elements to the left and lower side of the principle diagonal matrix, I will start determining one by one using this expression. It can be done using a long hand multiplication of the matrices. However, we will not go into that in this lecture. I will give you the expression which comes out of that. So b 1 1, you straight away determine based on square root of c 1 1. Then, we go to b 2 1 or let say, we first define all the diagonal matrices that is, b 1 1, b 2 2 etcetera, b p p. The expressions for that are b 1 1 is given like this and b 2 2, c 2 2 are given and b 2 1 is the square. So, you need b 2 1. For b k 1, we have a expression. b k 1 is equal to c k 1 by b 1 1. All of which will rise from the expression, b b dash is equal to c or c is equal to b b dash. From this, when you multiply, these are the expressions that we are defining. First, you determine the first diagonal element b 1 1 and then, go to b 2 1. Determine b 2 1, use b 2 1 to get b 2 2, which is c 2 2 minus b 2 1 the whole square. Once you determine b 2 2, that means, you have reached the diagonal element in the second row. Then, you go to the third row. 3 1 you determine and b 3 1 you determine from here. 3 2 you determine from here and then, you go to 3 3. For 3 3, you have the diagonal element. Like this, starting with the left top element p 1 1, you go to the second row and define b 2 1, b 2 2 and then, go to b 3 1, b 3 2, b 3 3, b 4 1, b 4 2, 4 3, 4 4 like that until every time you reach the diagonal, you keep on determining the elements one by one starting with the left most element. So, all of these expressions can be used to determine all the p by p elements. So, this is for the diagonal elements and this is for the other elements in the k-eth row. So, k is equal to 1, 2 etcetera, up to p. This determines a complete the this defines completely the matrix b. What are the assumptions? The assumptions are that, the b matrix is a lower triangular matrix. That is all the assumption is in that. So, what did we do? We started with a particular structure of the model and then, we determined the matrix a as well as matrix b. So, our structure was x t plus 1 is equal to x a x t plus b epsilon t plus 1, where x t plus 1 is a p by 1 vector and p is a number of sites. You are talking about multi-site generation now. So, p is the number of site and x t plus 1 is the value of the flow at each of these sites. So, it is a vector at each of the sites in time period t plus 1. So, you want to determine in the synthetic generation situation, what is it that you want to do? You want to determine the flows at x t plus 1 at each of the sites. p is equal to 1, 2, 3 etcetera, up to p at each of the sites, using the flows at site, at time period t, at the same site simultaneously. Which means, at a particular site when you are determining, you are using the flows, and the information on the flows during the previous time period at all the sites. The matrix a, which is a coefficient matrix in this particular case and the matrix b, both will use the cross correlation structure as well as the auto correlation structure. So, this is how the model is. The correlation structure, both space wise as well as time wise at the same site. That is, when we are talking about the time correlation, we are talking about auto correlation and space wise, when we are talking about, you talk about the cross correlation from a site a to site b also lagged 1 time step. That means, it is a cross correlation at lag 0 as well as cross correlation at lag 1. Similarly, auto correlation at lag 0 as well as lag 1. Then, it also preserves the mean, and the standard deviation at the particular site. This is very useful and, in fact, in implementation, when we are talking about planning models and so on, it is an extremely elegant way of preserving the correlation structure of multi-sites. Let us look at an example now. Which means, you know, just to summarize this particular model, essentially what we do is, we first calculate the lag 1 correlations as well as lag 0 correlations and then, you define the models and define the matrices m naught as well as m 1. How do we determine? We determine based on the observed data. So, you must have, if you are talking about annual flows, you must have annual flows at each of the p sites. You must be reasonably justified in using the assumption that these flows are all normally distributed at all the p sites. Multivariate normal is the assumption that is involved and then, you determine epsilon t plus 1 from the standard normal deviate tables and epsilon t plus 1 must be independent of x t. So, this assumption must be valid. Look at an example now, by which all of these procedures that I just mentioned is clear. We will take the annual flow at two sites, p and q. There are 19 years of flows. So, you have annual flow at site p, annual flow at site q for 19 years. Now, we will demonstrate it by just generating two values. But, the same procedure you can take it forward and generate it for any number of values. As I mentioned in one of my earlier lectures, whenever we are talking about synthetic generation of data, the data must be generated for a fairly long period of time. Typically, you know, when we are talking about planning for water recourse systems and so on, using let say 50 years of data, 30 years of data, you should be able to generate 100 years of data, 50 years of data, 200 years of data etcetera, because the water that we are putting in place are supposed to serve for next 100 years. Even if they are not physically meant to serve the next 100 years, you still need to generate flows for several years. May be, nexus of 100 years to examine the performance of that system in future. That is, possible performance of the system in future. In fact, when we talk about reliability of system, resiliency of system and so on, we generate the data for as long as 500 years, 1000 years and so on, essentially to generate sequences which follow the same statistical properties as the observed historical flows and then, determine the implications of these flows on the water resource system. Therefore, whenever we talk about synthetic generation of data, you must remember that, you must necessarily simulate or generate the flows for fairly long period. Typically, of the order of 100 years, 150 years and so on. In this example, I will just demonstrate how to generate the first two values. The same structure with using the same procedure, you can generate any number of values. As I mentioned, the data is used primarily to define m naught and m 1. Once you define m naught and m 1, you discard the data and then, straightaway start working with m naught and m 1. Also, the model is for standardized flows. Which means, the x t values here, these are the flows and the flows have to be standardized and then, use the standardized flows in the model. Then, when you get the model x t plus 1, you can obtain the original flows, that is, original values of the flows, generated flows, by the standardizing. That is, from x t is equal to x t, let say x t is standardized is equal to x t minus mean divided by standard deviation. From this, you should be able to get back your x t values. So, let us look at how we do this. So, we get the mean and the standard deviation and then, we get m naught. The mean and standard deviation, I got from the data and m naught is defined as the lag 0 correlation cross correlation matrix. So, you take the cross correlation at lag 0, which is essentially the covariance. So, you define r p q 0 as x p I, which is a flow in time step i at site p minus the mean of flow at site p. Similarly, at site q and divided by n s p s q, where s p is the standard deviation at site p and standard deviation site q. Using this, you get m naught is equal to 1 0.796 0.796 and 1. Then, m 1 similarly, we define it as lag 1 cross correlation matrix between p and q. So, this is p p, p q, q p and q q. So, you get m 1. Use this expression as this is just the lag 1 correlation, you get m 1 is equal to 0.302, 0.164 etcetera. So, you define m 1. From m naught, you can determine m naught inverse. So, m naught has been defined here. So, you get m naught inverse. So, m naught inverse is 2.73 minus 2.17 etcetera. So, you get m naught inverse. M 1 is given and m naught inverse is given and therefore, you can get a. So, your a is m 1 m naught inverse and therefore, you get a is equal to 0.47 minus 0.21 0.31 minus 0.37. Then, we go to b. We know that to determine b, you need the matrix c. So, you know that c is equal to m naught minus m 1 m naught inverse m 1 prime. So, m 1 is given here. So, m 1 prime will be this. You get m 1 prime and m naught is given. Therefore, you can get m naught inverse here. So, m naught is given and m 1 is given and m naught inverse is obtained and m 1 prime is also obtained. Therefore, you obtain c. Now, this is c. All of these are two by two matrices. So, you get c as 0.89, 0.76, 0.76, 0.95. Once you get c, that means, you have the elements here, that is, c 1 1, c 1 2 etcetera. So, c 1 1, c 1 2, c 2 2, all these elements are given. Then, we first start with b 1 1. That is a left top element, the first element. So, b 1 1 is c 1 1 to the power half, which is 0.94. Then, you go to b 2 1. b 2 1 is c 2 1 by b 1 1. I am using these expressions here. These expressions b 1 1 is equal to c 1 1 to the power half, b 2 2 is equal to etcetera, diagonal elements as well as k-eth row elements. So, I use these expressions because, c has been completely defined and the previous elements that we have just determined, we use that and then determine the next element. So, we determine that. Start with b 1 1 and then, go to b 2 1. Then, go to the diagonal element of that row and then, go to the next row and define b 3 1 and so on, if you have 3 sites. So, in this particular case, we have only two sites. So, you have defined b 1 1, b 2 1 b 2 2 and it is a lower triangular matrix. Therefore, all the elements on the upper side, they are all 0. So, we get b matrix as 0.94, 0.81, 0.54 and this element, which is above the principle matrix, this is 0. This is how you define the matrix b. Once you define the matrix b, you can determine the flows for the next time period. So, x t plus 1 is equal to a x t b epsilon t plus 1. a has been defined and b has been defined and x t is the flow during the time period t. To begin with, you can assume this to be zeros here and epsilon t plus 1 are the standard normal deviates. So, you can get epsilon t plus 1 at b, both of these sites. So, a is this matrix and b is this matrix. You need to start the simulation, you need this matrix. Yu typically assume all of these to be zeros and then, generate it for t plus 1. That is the idea. So, you are generating at both site p as well as q for the time period t plus 1, using the flows at site p and q at the time period t. So, assuming these to be zeros, we can determine x p 1. Initial values are considered at zero and these are obtained from your standard normal deviates tables. So, e p 1 and e q 1 are obtained. So, these are the assumed values because, these are zeros. The first term vanishes here and only the second term remains. So, x p 1 and x q 1, these are obtained. That is, at t is equal to 0, if you put, these are zeros. You get x p 1 and x q 1. That is, minus 0.126 minus 0.254. Remember, these are standardized values and therefore, they can be negative. Then, we use these values now to get x p 2 and x q 2. This is 2, that is, x p 2 as well as x q 2. We use the same expression. Now, x p 1 and x q 1 are defined and therefore, you get x p 2 and x q 2. Like this, you can go to x p 3 and x q 3, using these two values and x p 4 and x q 4, using the next values and so on. So, this is how we generate the standardized values of the flows for the next few time periods. So, this t can be for 100 hundred and 150 and so on as I mentioned. From the standardized values, you can get back your flow values using the mean and the standard deviation. So essentially then, in today’s lecture, we have completed discussion on the Matalas model. It is a multi-site generation model. Typically, it is used for annual flows, seasonal flows and perhaps, in some cases for monthly flows also. But, the focus here is to maintain or preserve the cross correlation structure among several streams in the same catchment or in adjoining catchments as well as the auto correlation structure. Even when we talk about cross correlation structure, it is cross correlations at lag 0, which means at the same time as well as at lag 1. So, we are talking about cross correlations at lag 0 as well as lag 1 and the auto correlations at lag 0 as well as lag 1. The model also preserves the mean, the standard deviation and these two correlations, that is, cross correlation as well as auto correlations. This model is extremely useful for multi-site generation, which are, you know typically used in cases where development of an entire catchment has to take place, where there are several streams and then, you would like to generate data on several steps. So, we will continue this discussion in the next class. Thank you very much for your attention.