Tip:
Highlight text to annotate it
X
Welcome to this lecture number 35 of the course stochastic hydrology. Now, in the previous
few lectures, we have been discussing the multivariate stochastic models which are essentially
useful in generating stream flows in a let say catchment, where you are interested in
generating simultaneously flows in several streams. These streams that we are considering,
may have, in fact, they do have their own correlation structures.
So, you would like to preserve the correlation structure between a stream a and a stream
b in the same catchment. This type of models are essentially useful when you are planning
for large scale water resources development in a particular catchment. What I mean by
that is, let us say that you have a huge river basin such as Narmada river basin or the Brahmaputra
river basin etcetera, where several tributaries are joining. Also, you may have a hydrologically
homogeneous region in which there is a correlation structure between among several streams. Then,
you would like to develop this entire catchment by putting, let say reservoirs on several
streams and so forth. In such situations, you would like to generate
flows on all of these streams, that is synthetically generate stream flows on all of these streams
simultaneously. What do I mean by simultaneously? We have studied in our earlier courses models
such as the thomas-fiering model for example, in which it was the single-site model. The
single-site models preserve the statistics of that particular stream, if we are talking
about the stream flow generation.
The multi-site models on the other hand, apart from preserving the statistics of the individual
streams, they also preserve the correlations, that is cross correlations among these different
streams. If I have, let say if I have a catchment in which we have two streams. Let us understand
this correctly. Let say that we have a catchment something like this. Then, I am interested
in generating flows at this location as well as I am interested in generating flows at
this particular location. So, let say, I have a and b.
Then, we may also have a situation where I may be interested in generating flows at this
location c, by taking into account the correlation structure among a, b, as well as c. In our
single site models, what we would have done? We would have taken the flow records at this
particular site c and then, used a single-site model, such as for example, the Thomas-Fiering
model. Then, use the Thomas Fiering model to generate the flows which would have preserved
the mean, the variance, as well as the lag 1 correlation of the flows at this site.
However, when we come to the multi-site generation, when generating the flows at site c, we would
also take into account and indeed preserve the correlations that the flows at site c
have with flows at site b as well as flows at site a. Similarly, when we are generating
flows at site b, we will preserve the correlations of the flows at site b with respect to the
flows at site a, as well as with respect to site c.
So, there is a special correlation as well as there is a temporal correlation. The temporal
correlation is, in fact, the auto correlations that we are talking about. So, when we are
focusing our attention on c for example, on the flows at site c, it will not only preserve
the auto correlations of the flows with respect to themselves across time, but, it will also
preserve the models and will also preserve the cross correlations with respect to flows
at b, as well as with respect to flows at a.
So, when you suspect that the flows have cross correlations and you would like to generate
flows on both these sites a and b and a and c etcetera, then you need to look at multi-site
models. So, we are essentially generating flows simultaneously at site a, site b and
site c and so on. Now, these kind of problems will be very useful when we are talking about
large scale water resource development in which case, in which situations, we will be
interested in simultaneous generation of flows. So that, you can examine the implications
of putting several reservoirs. Let say that, I would like to put a reservoir
here, reservoir here, reservoir here and so on. So, you can examine the implications of
developing this kind of water sheds or the catchments simultaneously. So, in the last
class, specifically what did we see? In lecture number 34, we examined starting with the single-site
model, we went on to multi-site model, typically the Markov model. That is multi-site Markov
model is what we discussed in the last class.
The structure of the model looks like this. x t plus 1 is equal to e x t plus g into epsilon.
Now, this is the multi-site Markov model that we have discussed in the previous class. Remember,
e is not the expectation here. It is a p by p diagonal matrix and x t is a p by 1 vector
and p is the number of sites. So, let say that you have a case where you are talking
about generation of flows in site number 1, site number 2, site number 3 etcetera. So,
like that you have p number of sites. So, at every site you have a time series of
flows available. These are the observed flows. That is what defines the vector x t here.
x t is a p by 1 vector of standardized values of the variable generated at time t and we
are writing this expression for time t plus 1. So, x t plus 1, which will generate at
all of these p sites, and the flows at time t plus 1 using the flows at time t plus some
random component and e and g are coefficient matrices. Now, in this model that we have
already discussed, e is a p by p diagonal matrix, which has the j-eth diagonal element,
which is rho j 1. What is rho j 1? It is a lag 1 serial correlation at site j.
Similarly, g is a p by p diagonal matrix, which j-eth diagonal element is given by 1
by root of 1 minus rho j square and rho j 1 is the lag 1 serial correlation at the site
j. A more general form and which is also very useful is the Matalas model. This is what
we introduced in the last class. Starting with this kind of a structure, Matalas has
introduced a more general form of the multi-site model. In fact, in this, we do not assume
that it is a Markov process, although we relate x t plus 1 with x t here. The coefficient
matrices a and b, in fact, reflect the cross correlation structure in the multi-site situation.
So, this is what we were talking about in the last lecture towards the end. So, we introduce
the Matalas model x t plus 1 is equal to a x t plus b epsilon t plus 1, which is similar
to the Markov model that we have discussed earlier. Except that, the epsilon t plus 1
will be independent of x t now, in this case. As we progress, we will discuss that more.
So, we are talking about this particular model in the context of simulating or synthetically
generating flows at several sites simultaneously. These kind of models are typically used for
large time intervals. Let say, seasonal flows or the annual flows etcetera, which are essentially
useful for planning purposes. Now, this is a form that has given by Matalas. This is
a multi-site normal generation model and it preserves the mean, the variance, lag 1 serial
correlation and lag 1 cross correlation and lag 0 cross correlation.
Distinguish between the serial correlation and the cross correlation. I keep repeating
this because, serial correlation is important when we are talking about multi-site models.
In the sense that, it has to preserve the structure of the time series at that particular
location as well as it must preserve the cross correlation with the other time series. Therefore,
both serial correlations as well as cross correlations become important in this.
Now, we write x t plus 1 is equal to a x t plus b epsilon t plus 1. Let us go through
this model in detail now, although I would have introduced it in the last class. Now,
x t and x t plus 1 are p by 1 vectors. Remember, p is the number of sites representing standardized
data corresponding to p sites at time step t and t plus 1, respectively. When talking
about the Thomas Fiering model and the other arma type of models, I have told earlier that
we write these models always for standardized data.
So, x t and x t plus 1 are, in fact, the standardized data. Recall that, by standardization, I mean
deducting the mean and dividing with by the standard deviation. For example, x t, when
I have a value x t, let say I want to write x t and the standardized value of x t, I can
write this as x t s is equal to the value of x t itself. This is observed value and
minus x bar, which is a mean and divided by the standard deviation. So, all of these models,
these type of models are written for standardized data.
So, from the observed values, you standardize the data and then, we apply this kind of models.
Now, there is a important assumption here. That is, the model is multi variate normal.
It means that, all the variables that you are using, all the flow variables, the x t’s,
they are all normally distributed random variables. So, we use these assumption and that is why,
you know, essentially we use this kind of models for large time steps. For example,
annual stream flows, seasonal stream flows etcetera, where the assumption of these variables
being normal is justified. If you use for example, daily rainfall and
such processes, which are, which cannot be approximated by normal distribution, then
your models would be wrong. So, you must keep this in mind that, these kind of models, multi
variate stochastic models that I am discussing here, specially the Matalas model in this
particular form must be used essentially for long time interval processes, such as the
seasonal flows, annual flows and so on.
Now, the epsilon t plus 1 here, in this expression, is a normally distributed random variate with
0 mean and unit variance. So, it is n 0 1. It is p by 1 vector because, you have p sites
and epsilon t plus 1 is what we are talking about. So, every site you have a standard
normal deviate. An additional assumption here is that, epsilon t plus 1 is independent of
x t. Your structure requires x t as well as epsilon t plus 1 on the right side. So, the
epsilon t plus 1 here is independent of x t.
This will be a useful assumption when we derive expressions for a and b as we will presently
see. Now, a and b are coefficient matrices and both of these are of size p by p. As you
can see here, this is p by p, this is p by p, this is p by p, this x t is p by 1. So,
you are having p by p here and p by 1, p by p and p by 1 and this will be p by 1 matrix.
Now, which means what? In this model now, if you specify a and b, which are the coefficient
matrices, epsilon t plus 1 is obtained from your standard normal deviate tables. So, epsilon
t plus 1 can be obtained and x t is the previous value. It is a matrix of previous values.
Your model is completely specified. So, our objective now is to specify a and b in terms
of the data. So, which means that, if you have the observed flows at all of these p
sites, for some time period, let say t is equal to 1, t is equal to 2 etcetera, t is
equal to n. So, at each of the p sites, you have the observed time series. There is time
series of the observed flows. We would use these observations, available observations
to determine a and b. So, that it is the objective with which we will proceed now.
Now, in the last class, I introduced m naught and we defined m naught as expected value
of x t x t prime. x t prime is a transpose. So, expected value of x t x t prime is nothing
but, the cross correlation matrix of lag 0 because, you do not have a x t plus k here,
x t x t prime. Therefore, you are talking about the cross correlation between x t with
itself, which is cross correlation at lag 0 or in fact, it is a covariance.
Now, the i j-eth element of this matrix m naught is therefore, from this we can write
it as, in terms of the data, these are the observed values of flows. So, this is x t
is q i t minus q i bar by s I, where q i t is the observed flow at the site i in time
period t. q i bar, is the mean of the flows at site i, which is simply, you sum over all
the time period divided by the number of time periods, you will get q i bar. s i, is the
standard deviation of the flows at site i and similarly, for site j.
So, this is m naught. That is, the i j-eth element of the matrix m naught. Now, similarly,
we will examine another matrix m 1. We will define another matrix m 1. Both of which together,
that is m naught as well as m 1 together, will be useful in determining a and b. So,
our objective of introducing this is, finally to write some useful expressions, elegant
expressions for determining a and b.
The way I defined m naught was, I took expected value of x t x t prime. Now, I will lag it
by 1 time period and write expected value of x t with x t minus 1 prime. So, I will
define m 1, which is a matrix p by p as expected value of x t x t minus 1 prime. So, I am lagging
it by 1 time step and then taking the expectation. Similar to what we did for the m naught matrix,
the i j-eth element of the m 1 matrix now, will be simply expected value of x i t x j
t minus 1. That is, i j-eth element. This we determine, this we are able to write because,
expected value of a matrix is, in fact, matrix of expected values of individual elements.
So, that is how we write. Much the same way as we wrote m naught i j, we write now m 1
i j. So, we take the flow at i-eth site and the flow at j-eth site. This is the time period
t and this is the time period t minus 1. Then, we write this expression because, we are lagging
it by 1 and we start it from t is equal to 2.
So, I get the elements i j for i is equal to 1 to p, as well as j is equal to 1 to p,
we get all the elements from this expression. Therefore, m 1 is defined completely and m
0 is defined completely. Now here, I have written in terms of q, which is a flow. Here,
I have written x, although means the flows. So, x is the same as the q that we have written
earlier. Remember that, m 1 as well as m naught are completely defined once we have the data.
That means, you will be able to form the matrices m naught as well as m 1, just based on the
observed data. What is the form of the data that we have?
At each of the p sites, we have a time series, t is equal to 1 to n. Let say, you have annual
flows at each of the p site and let say you have forty years of flows. So, t is equal
to 1, t is equal to 2, t is equal to 3 etcetera, up to forty years at site number 1, at site
number 2 etcetera, up to site number 40. So, this is how you define m naught as well as
m 1.
Now, once we define m naught and m 1, we will now go back to original expression, where
we wrote x t plus 1 is equal to a x t plus b epsilon t plus 1 and then, see how we can
determine a and b using m naught and m 1. So, this is m 1. The way I wrote here is,
in fact, these are the standardized flows. So, I will write it in terms of the original
flows, which is q i t q i minus etcetera, much the same way we did for m naught. Except
that, you must remember here that these are for two different sites which, in fact, is
the case with m naught also. q is the original random variable; that means, we may be talking
about annual stream flow or some such thing.
From this, you can see that m 1 is, in fact, the cross correlation matrix of lag 1. At
lag 1, you take the cross correlation and m naught is at lag 0. All right. Now, we will
look at this model now. x t plus 1 is equal to a x t plus b epsilon t plus 1. This is
a p by 1 matrix. This is p by p, p by 1, p by p, p by 1. Now, what we will do is, just
to make sure that we get expressions for a and b in terms of your m naught and m 1. Do
not lose the fact, do not lose the site that m naught and m 1 are completely determined
based on the data. So, the moment you have data, you can determine
m naught and m 1. Therefore, we should be able to express a and b in terms of m naught
and m 1. To do that, what we will do is, we will multiply, post multiply in fact, by x
t prime and then, we will take the expectation. So, I multiply x t plus 1 with x t prime is
equal to a x t x t prime plus b epsilon t plus 1 x t prime. I am post multiplying all
through by x t prime. Then, we will take a expectation.
So, expectation of x t plus 1 x t prime is equal to a comes out. It is a constant. Expected
value of x t x t prime plus b expected value of epsilon t plus 1 x t prime. Now, look at
this, expected value of epsilon t plus 1 x t prime. We said, right at the outside that
epsilon t plus 1 is independent of x t. Therefore, the expected value of epsilon t plus 1 x t
prime must be equal to 0. Why? Expected value of, let say two random variables x and y,
if they are independent, will be equal to expected value of x into expected value of
y and expected value of one of the random variables is 0. Because, we said epsilon t
plus 1 has n 0 1. Therefore, its expected value is 0. Therefore, this expectation becomes
0. So, the second term vanishes here. We get
here a expected value of x t x t prime. What is x t x t prime? It is m naught. We have
defined m naught to be expected value of x t x t prime. Therefore, this term is nothing
but, a m naught. What is this now? Expected value of x t plus 1 x t prime. We have defined
m 1 as expected value of x t x t minus 1 prime. Therefore, I can also write this as expected
value of x t plus 1 x t prime. Just lag it by 1 time step and therefore, x t plus 1 x
t prime, the expected value of that will be written as m 1 and this is a and this is m
naught and this becomes 0. So, I will be able to write m 1 is equal to a m naught. Therefore,
I should be able to get a as m 1 m naught inverse.
So, once you define the data, once you have the observed data, you have defined m naught
and you have defined m 1. From m naught, you get m naught inverse and m 1 is defined. Therefore,
you should be able to get a. So, a is completely fixed. So, just based on the data, without
doing much exercise, you will be able to get the coefficient matrix a.
But the coefficient matrix a itself, because it is determined from m naught and m 1, will
incorporate the cross correlation structure. Because, we defined m naught as cross correlation
at lag 0, which is in fact, the covariance and m 1 as a cross correlation at lag 1. Both
of these are included in the matrix a, in the coefficient matrix a.
Now, let us see what we do with the coefficient b. So, this is our model, x t plus 1 is equal
to a x p plus b epsilon t plus 1. Now, we will post multiply with x t plus 1 prime on
both sides and take expectation. So, I will multiply this with x t plus 1 prime x t plus
1 into x t plus 1 prime and a x t into x t plus 1 prime plus b epsilon t plus 1 into
x t plus 1 prime. Now, we take the expectation. So, I will write this as expected value of
x t plus 1 x t plus 1 prime is equal to a into expected value of x t x t plus 1 prime
plus b into expected value of epsilon t plus 1 x t plus 1 prime.
What is this now? We defined m naught to be expected value of x t x t prime and therefore,
is the same time step. Therefore, I will be able to write this as m naught, which is expected
value of x t plus 1 x t plus 1 prime. Then, we have expected value of x t x t plus 1 prime.
We have another x expected value of epsilon t plus 1 and x t plus 1 prime. This expectation,
in fact, expected value of epsilon t plus 1, if we had x t prime here, it would have
been 0. Because, we said epsilon t plus 1 is independent of x t. But, epsilon t plus
1 x t plus 1 prime, when you take the expectation of that, that will not be 0. Remember this
because, epsilon t plus 1 need not be independent of x t plus 1. Epsilon t plus 1 is independent
of x t in this structure and therefore, this expectation is not necessarily zero.
Similarly, we had defined x t with x t plus 1. I am sorry, I repeat that. Expected value
of x t with x t minus 1 prime as m 1. But, this is not m 1 because, you have x t in x
t plus 1 prime. Therefore, we need to evaluate both these expectation. Namely, expectation
of x t x t 1 plus 1 prime and epsilon t plus 1 x t plus 1 prime separately.
So, let us determine these two expectations. The left hand side is m naught. So, we will
start with this expectation. Expectation of x t x t plus 1 prime. We define m 1 to be
expected value of x t x t minus 1 prime. So, m 1 prime will be expected value of x t x
t minus 1 prime, the whole prime. That is, expected value x t x t minus 1 prime prime.
This is a matrix. So, I am taking the transpose of that. This is a matrix and I am taking
the transpose of this. Now, because this is a matrix. I write this
as expected value of x t x t minus 1 prime prime here. This, I will be able to write
it as expected value of x t minus 1 x t prime. Why? From your matrix algebra, you should
know that, let say I have a situation like this, a b transpose. This will be simply b
transpose a. So here, I have x t x t minus 1 transpose and that transpose. Therefore,
I will write x t minus 1 transpose and transpose. So, x t minus 1, that transpose of x t minus
1 transpose, which is x t minus 1 and x t transpose. So, this is from, this follows
from here. I think, I will have to write it as b dash
and a dash. That is how it becomes x t transpose here. So, I will be able to get from here,
m 1 dash is equal to expected value of x t minus 1 into x t transpose. So, I can also
write it as m 1 dash is equal to expected value of x t x t plus 1 dash. Why I need this
expectation? From your expression, I need expected value of x t x t plus 1 dash and
therefore, I will be able to write this as m 1 dash. So, m 1 dash is expected value of
x t x t plus 1 dash.
We are now focusing on expected value of epsilon t plus 1 x t plus 1 prime. So, we will write
this term as, epsilon t plus 1 x t plus 1 prime is equal to epsilon t plus 1, I will
take it out and x t plus 1 prime from our model, x t plus 1 is a x t plus b epsilon
t plus 1. So, from the model, I write this as a x t plus b epsilon t plus 1. I am talking
about the transpose of that, so, I will take a transpose of that.
So, this will be epsilon t plus 1. Again, the way we wrote a b prime is equal to a dash,
b dash and that form we use. Therefore, this will be x t prime into a prime plus this epsilon
t plus 1 into epsilon t plus 1 prime into b prime. So, this is what we get as epsilon
t plus 1 x dash t plus 1. Now, on this we take expectation on both sides. So, expected
value of epsilon plus 1 x dash t plus 1, I write now as expected value of epsilon t plus
1 x dash t a dash, this term, plus epsilon t plus 1 epsilon t plus 1 dash and b dash.
Now, this will be equal to, what is this now? This a dash is a constant. So, it comes out.
So, I will take expected value of epsilon t plus 1 and x dash x dash t, which is essentially,
we are talking about the expected value of epsilon t plus 1 with respect to x t. Because,
epsilon t plus 1 and x t are independent, and the expected value will become 0 because,
expected value of epsilon t plus 1 is 0. Much is the same way as we did for determining
m 1. So, this becomes 0. This term becomes 0.
Now, look at this term now. This term has expected value of epsilon t plus 1 epsilon
t plus 1 dash and b dash. Now, b dash comes out. So, I will write it as b dash expected
value of epsilon t plus 1 epsilon t plus 1 dash, which is essentially the variance, expected
value of epsilon t plus 1 with itself. That is a variance. Because v it is a vector, I
write it as i b dash. So, b dash comes out and I am writing this as expected value of
epsilon t plus 1, epsilon t plus 1 dash.
Therefore, the second term that we discussed in the earlier expression, namely this term
can be written as, the expected value term can be written as simply equal to b dash.
Now, we use this and rewrite our expression then. So, what was our expression? We were
writing this particular expression now. This is m naught and this we have determined and
this is b dash. So, we write this particular expression again and then, write this as m
naught is equal to a m 1 dash plus this b and this we have written as b dash.
So, m naught is equal to a m 1 dash plus b b dash. In which, we have already determined
a, which is m 1 m 1 m naught inverse. So, a has been completely determined. So, we write
m naught is equal to a m prime m 1 prime plus b b prime. Then, using a is equal to m 1 m
naught inverse, I write m naught is equal to m 1 m naught inverse, which is a into m
1 prime plus b b prime. Therefore, b b prime is equal to m naught minus m 1 m naught inverse
m 1 prime. So, this is how you get b b prime. Remember,
all of these exercise we have been doing to determine or to specify the coefficient matrices
a and b. Once a coefficient matrices a and b are fixed, then your model is completely
fixed and we want to determine a and b from the data. To do that, we have defined m naught
and m 1 and both of which can be determined completely from the data, from the cross correlations
at lag 0 and lag 1. Now, we are trying to get another expression for b, also in terms
of m naught m 1. So, b b dash is equal to m naught minus, this
is the expression that we got. Let say that we write c is equal to b b dash. This is one
matrix that I will write, which is a p by p matrix, where p being the number of sites.
So, I write c is equal to m naught minus m 1 m naught inverse m 1 dash. Now, from here
you can see, if we write like this, the matrx b does not have a unique solution. Which means,
it can have several solutions which will satisfy this particular expression. Several alternate
solutions, all of which will satisfy this particular expression.
So, one of the ways is to assume a particular structure for the matrix b. In this particular
model, the Matalas model, we typically assume b to be a lower triangular matrix. So, one
method is to assume b to be a lower triangular matrix. What is the lower triangular matrix?
From the principle diagonal, if you take all the elements to the lower side of the principle
diagonal matrix, will all be non zero. This is a lower triangular matrix. For example,
we have this and all other elements on the right hand side will all be 0. Right hand
side and to the above, the principle diagonal diagonal of that matrix, this is a principle
diagonal and all of these elements will be zero and all these elements will be non zero.
That is a lower triangular matrix. So, if you assume b to be a lower triangular
matrix, then b dash will become an upper triangular matrix, where only the elements above the
principle diagonal will be non zeros. Then, you write b b dash as the matrix b as well
as multiplied by the matrix b dash. The matrix b is p by p matrix and matrix b dash is also
p by p matrix. Therefore, c which is equal to b b dash is also a p by p matrix. Now,
the elements of c, we write it as c 1 1, c 1 2 etcetera c 1 p. Similarly, c p 1, c 2,
c p 2 etcetera c p p. So, the matrix c 1 1 c the matrix c is completely
determined by the from the data because, we know c as m naught minus m 1 m naught inverse
m 1 prime. m naught is completely determined from the data and m 1 is completely determined.
Therefore, m naught inverse can be got and m 1 prime can be got. Therefore, the matrix
c is completely different. The moment you have determined the cross correlations and
matrices m naught and m 1, the matrix c is completely defined. Therefore, using the c
matrix, now we have to determine the matrix b and that is the exercise we are doing now.
Now, look at this. The first element c 1 1 will be equal to b 1 1 into b 1 1, which is
b 1 1 square. So, I will be able to determine b 1 1 as square root of c 1 1. So, we will
write this as b 1 1 is equal to c 1 1 to the power half. So, first I determine b 1 1 and
then, I go to b 2 1 and then, I determine b 2 2. Similarly, I determine b 2 3, b 3 1,
b 3 2, b 3 3 and come to next b 4 1, b 4 2, b 4 3, and b 4 4 like this. All the elements
to the left and lower side of the principle diagonal matrix, I will start determining
one by one using this expression.
It can be done using a long hand multiplication of the matrices. However, we will not go into
that in this lecture. I will give you the expression which comes out of that. So b 1
1, you straight away determine based on square root of c 1 1. Then, we go to b 2 1 or let
say, we first define all the diagonal matrices that is, b 1 1, b 2 2 etcetera, b p p. The
expressions for that are b 1 1 is given like this and b 2 2, c 2 2 are given and b 2 1
is the square. So, you need b 2 1. For b k 1, we have a expression. b k 1 is equal to
c k 1 by b 1 1. All of which will rise from the expression, b b dash is equal to c or
c is equal to b b dash. From this, when you multiply, these are the
expressions that we are defining. First, you determine the first diagonal element b 1 1
and then, go to b 2 1. Determine b 2 1, use b 2 1 to get b 2 2, which is c 2 2 minus b
2 1 the whole square. Once you determine b 2 2, that means, you have reached the diagonal
element in the second row. Then, you go to the third row. 3 1 you determine and b 3 1
you determine from here. 3 2 you determine from here and then, you go to 3 3. For 3 3,
you have the diagonal element. Like this, starting with the left top element p 1 1,
you go to the second row and define b 2 1, b 2 2 and then, go to b 3 1, b 3 2, b 3 3,
b 4 1, b 4 2, 4 3, 4 4 like that until every time you reach the diagonal, you keep on determining
the elements one by one starting with the left most element.
So, all of these expressions can be used to determine all the p by p elements. So, this
is for the diagonal elements and this is for the other elements in the k-eth row. So, k
is equal to 1, 2 etcetera, up to p. This determines a complete the this defines completely the
matrix b. What are the assumptions? The assumptions are that, the b matrix is a lower triangular
matrix. That is all the assumption is in that. So, what did we do? We started with a particular
structure of the model and then, we determined the matrix a as well as matrix b. So, our
structure was x t plus 1 is equal to x a x t plus b epsilon t plus 1, where x t plus
1 is a p by 1 vector and p is a number of sites. You are talking about multi-site generation
now. So, p is the number of site and x t plus 1 is the value of the flow at each of these
sites. So, it is a vector at each of the sites in time period t plus 1.
So, you want to determine in the synthetic generation situation, what is it that you
want to do? You want to determine the flows at x t plus 1 at each of the sites. p is equal
to 1, 2, 3 etcetera, up to p at each of the sites, using the flows at site, at time period
t, at the same site simultaneously. Which means, at a particular site when you are determining,
you are using the flows, and the information on the flows during the previous time period
at all the sites. The matrix a, which is a coefficient matrix in this particular case
and the matrix b, both will use the cross correlation structure as well as the auto
correlation structure. So, this is how the model is. The correlation
structure, both space wise as well as time wise at the same site. That is, when we are
talking about the time correlation, we are talking about auto correlation and space wise,
when we are talking about, you talk about the cross correlation from a site a to site
b also lagged 1 time step. That means, it is a cross correlation at lag 0 as well as
cross correlation at lag 1. Similarly, auto correlation at lag 0 as well
as lag 1. Then, it also preserves the mean, and the standard deviation at the particular
site. This is very useful and, in fact, in implementation, when we are talking about
planning models and so on, it is an extremely elegant way of preserving the correlation
structure of multi-sites. Let us look at an example now. Which means, you know, just to
summarize this particular model, essentially what we do is, we first calculate the lag
1 correlations as well as lag 0 correlations and then, you define the models and define
the matrices m naught as well as m 1. How do we determine? We determine based on
the observed data. So, you must have, if you are talking about annual flows, you must have
annual flows at each of the p sites. You must be reasonably justified in using the assumption
that these flows are all normally distributed at all the p sites. Multivariate normal is
the assumption that is involved and then, you determine epsilon t plus 1 from the standard
normal deviate tables and epsilon t plus 1 must be independent of x t. So, this assumption
must be valid.
Look at an example now, by which all of these procedures that I just mentioned is clear.
We will take the annual flow at two sites, p and q. There are 19 years of flows. So,
you have annual flow at site p, annual flow at site q for 19 years. Now, we will demonstrate
it by just generating two values. But, the same procedure you can take it forward and
generate it for any number of values. As I mentioned in one of my earlier lectures, whenever
we are talking about synthetic generation of data, the data must be generated for a
fairly long period of time. Typically, you know, when we are talking about planning for
water recourse systems and so on, using let say 50 years of data, 30 years of data, you
should be able to generate 100 years of data, 50 years of data, 200 years of data etcetera,
because the water that we are putting in place are supposed to serve for next 100 years.
Even if they are not physically meant to serve the next 100 years, you still need to generate
flows for several years. May be, nexus of 100 years to examine the performance of that
system in future. That is, possible performance of the system in future. In fact, when we
talk about reliability of system, resiliency of system and so on, we generate the data
for as long as 500 years, 1000 years and so on, essentially to generate sequences which
follow the same statistical properties as the observed historical flows and then, determine
the implications of these flows on the water resource system.
Therefore, whenever we talk about synthetic generation of data, you must remember that,
you must necessarily simulate or generate the flows for fairly long period. Typically,
of the order of 100 years, 150 years and so on. In this example, I will just demonstrate
how to generate the first two values. The same structure with using the same procedure,
you can generate any number of values. As I mentioned, the data is used primarily
to define m naught and m 1. Once you define m naught and m 1, you discard the data and
then, straightaway start working with m naught and m 1. Also, the model is for standardized
flows. Which means, the x t values here, these are the flows and the flows have to be standardized
and then, use the standardized flows in the model. Then, when you get the model x t plus
1, you can obtain the original flows, that is, original values of the flows, generated
flows, by the standardizing. That is, from x t is equal to x t, let say x t is standardized
is equal to x t minus mean divided by standard deviation. From this, you should be able to
get back your x t values. So, let us look at how we do this.
So, we get the mean and the standard deviation and then, we get m naught. The mean and standard
deviation, I got from the data and m naught is defined as the lag 0 correlation cross
correlation matrix. So, you take the cross correlation at lag 0, which is essentially
the covariance. So, you define r p q 0 as x p I, which is a flow in time step i at site
p minus the mean of flow at site p. Similarly, at site q and divided by n s p s q, where
s p is the standard deviation at site p and standard deviation site q.
Using this, you get m naught is equal to 1 0.796 0.796 and 1. Then, m 1 similarly, we
define it as lag 1 cross correlation matrix between p and q. So, this is p p, p q, q p
and q q. So, you get m 1. Use this expression as this is just the lag 1 correlation, you
get m 1 is equal to 0.302, 0.164 etcetera. So, you define m 1. From m naught, you can
determine m naught inverse. So, m naught has been defined here. So, you get m naught inverse.
So, m naught inverse is 2.73 minus 2.17 etcetera. So, you get m naught inverse.
M 1 is given and m naught inverse is given and therefore, you can get a. So, your a is
m 1 m naught inverse and therefore, you get a is equal to 0.47 minus 0.21 0.31 minus 0.37.
Then, we go to b. We know that to determine b, you need the matrix c. So, you know that
c is equal to m naught minus m 1 m naught inverse m 1 prime. So, m 1 is given here.
So, m 1 prime will be this. You get m 1 prime and m naught is given. Therefore, you can
get m naught inverse here. So, m naught is given and m 1 is given and m naught inverse
is obtained and m 1 prime is also obtained. Therefore, you obtain c. Now, this is c. All
of these are two by two matrices.
So, you get c as 0.89, 0.76, 0.76, 0.95. Once you get c, that means, you have the elements
here, that is, c 1 1, c 1 2 etcetera. So, c 1 1, c 1 2, c 2 2, all these elements are
given. Then, we first start with b 1 1. That is a left top element, the first element.
So, b 1 1 is c 1 1 to the power half, which is 0.94. Then, you go to b 2 1. b 2 1 is c
2 1 by b 1 1. I am using these expressions here. These expressions b 1 1 is equal to
c 1 1 to the power half, b 2 2 is equal to etcetera, diagonal elements as well as k-eth
row elements. So, I use these expressions because, c has been completely defined and
the previous elements that we have just determined, we use that and then determine the next element.
So, we determine that. Start with b 1 1 and then, go to b 2 1. Then, go to the diagonal
element of that row and then, go to the next row and define b 3 1 and so on, if you have
3 sites. So, in this particular case, we have only two sites. So, you have defined b 1 1,
b 2 1 b 2 2 and it is a lower triangular matrix. Therefore, all the elements on the upper side,
they are all 0. So, we get b matrix as 0.94, 0.81, 0.54 and this element, which is above
the principle matrix, this is 0. This is how you define the matrix b.
Once you define the matrix b, you can determine the flows for the next time period. So, x
t plus 1 is equal to a x t b epsilon t plus 1. a has been defined and b has been defined
and x t is the flow during the time period t. To begin with, you can assume this to be
zeros here and epsilon t plus 1 are the standard normal deviates. So, you can get epsilon t
plus 1 at b, both of these sites. So, a is this matrix and b is this matrix. You need
to start the simulation, you need this matrix. Yu typically assume all of these to be zeros
and then, generate it for t plus 1. That is the idea.
So, you are generating at both site p as well as q for the time period t plus 1, using the
flows at site p and q at the time period t. So, assuming these to be zeros, we can determine
x p 1. Initial values are considered at zero and these are obtained from your standard
normal deviates tables. So, e p 1 and e q 1 are obtained. So, these are the assumed
values because, these are zeros. The first term vanishes here and only the second term
remains.
So, x p 1 and x q 1, these are obtained. That is, at t is equal to 0, if you put, these
are zeros. You get x p 1 and x q 1. That is, minus 0.126 minus 0.254. Remember, these are
standardized values and therefore, they can be negative. Then, we use these values now
to get x p 2 and x q 2. This is 2, that is, x p 2 as well as x q 2. We use the same expression.
Now, x p 1 and x q 1 are defined and therefore, you get x p 2 and x q 2. Like this, you can
go to x p 3 and x q 3, using these two values and x p 4 and x q 4, using the next values
and so on. So, this is how we generate the standardized
values of the flows for the next few time periods. So, this t can be for 100 hundred
and 150 and so on as I mentioned. From the standardized values, you can get back your
flow values using the mean and the standard deviation. So essentially then, in today’s
lecture, we have completed discussion on the Matalas model. It is a multi-site generation
model. Typically, it is used for annual flows, seasonal flows and perhaps, in some cases
for monthly flows also. But, the focus here is to maintain or preserve
the cross correlation structure among several streams in the same catchment or in adjoining
catchments as well as the auto correlation structure. Even when we talk about cross correlation
structure, it is cross correlations at lag 0, which means at the same time as well as
at lag 1. So, we are talking about cross correlations at lag 0 as well as lag 1 and the auto correlations
at lag 0 as well as lag 1. The model also preserves the mean, the standard
deviation and these two correlations, that is, cross correlation as well as auto correlations.
This model is extremely useful for multi-site generation, which are, you know typically
used in cases where development of an entire catchment has to take place, where there are
several streams and then, you would like to generate data on several steps.
So, we will continue this discussion in the next class. Thank you very much for your attention.