Mod - 04 lec - 12 time series analysis - Iii

Good morning and welcome to this the twelfth lecture of the course stochastic hydrology. If you recall in the last lecture, we discussed methods about methods of data generation and forecasting and for the forecasting, we discussed specifically the average based methods that is the methods based on averages. Specifically, the moving average and the double moving average, where the time window is kept constant, and then you keep on computing the averages for over that time window and the window itself keeps on shifting across the data. And in the double moving average, what do we do? We take the moving average of the first order, and then take the averages of the moving averages themselves in a fixed time window. And we discussed, the example of moving average of order three and moving average of order three by three into three; that is the second order moving averages; again also of order three. Then we discussed the data generation methods; in one of the earlier lectures, we had discussed data generation of uncorrelated data, where we used the distributions of the particular data, and then generate data from that particular distribution, by using the cdf and equating the cdf value to a uniformly distributed random number. Then, in the last lecture, we also discussed data generation methods for serially correlated data, as I mentioned in the last lecture, we have in hydrologic data many times the data are serially correlated. For example, the July month’s flow may be serially correlated with June month flow, and also correlated with the July month flow of the previous year. So, to account for such serial dependents in the data, we have discussed in the last lecture in the first order Markov model and specifically for generating annual stream flows the first order Markov model is also in hydrologic literature, often time called as Thomas fiering model, after the hydrologist who proposed this particular model. So, we will continue the discussion today on serially correlated data, remember the assumption that we made in the first order Markov model. In the way, it was presented in the last lecture was that the flows are normally distributed and this is a stationary model; as we introduced in the last lecture this was a stationary model in the sense that the mean and the standard deviation and the lag one correlation all of which appear in the particular model they are all stationary across time. So, we will continue the discussion today as we wrote the first order model; first order Markov model it is X j plus 1 is equal to mu x, which is a stationary mean then the lag one correlation in to X j minus mu x plus that is a standard normal deviate t j plus 1 sigma x root of 1 minus rho 1 square, rho 1 is the lag one correlation, this is stationary with respect to mean variance and lag one correlation. What we do in using this model, if you recall is from the data we estimate the parameters, we estimate the moments actually mu x and sigma x and also we estimate the lag one correlation rho 1. We use these sample estimates into this particular model, to start the model we assume a particular value of X j here. Let’s say X 1, we assume and then generate X 2. X 1 we assume and for convenience, we often assume X 1 to be mu x itself. So, that this term vanishes. So, starting with that we generate several values typically, we generate a large sequence of values from this particular model and discard the first few values. Let’s say, you generated for 200 years then first about 50 to 60 values you discard to make sure that the effect of the initial value assumption dies down. And we generate sequences like X 2, X 3 etc after assuming the value for the first flow X 1. So, assuming the value of the first flow, we generate large number of such values. When you generate values using this model, the generated model, the generated data will have the same mean approximately the same mean as a historical mean, the same standard deviation as a historical standard deviation and the same lag one correlation approximately as a historical lag one correlation. So, once we generate the data you compare the historical mean standard deviation and lag one correlation with the generated mean standard deviation and lag one correlation. These three comparisons must be acceptable, if there are certain cases where the model does not perform well either in terms of the standard deviation or in terms of lag one correlation, it then means that the assumptions that we have made in building this model namely, that it is a stationary model and that the flows, follow a normal distribution. These assumptions May not be valid for the data that you are using. So, a test for the model to be valid is that all the three namely the mean, the standard deviation and the lag one correlations must compare well, between this historical data and the generated data. Now, we will start relaxing the requirement of the stationarity a bit, what did we assume in the Markov model that we just presented that the mean remains the same; that means between June month, July month etc we are not changing the mean. We have a time series let’s say, for the last fifty years you have collected the flows and you have that particular time series, you have one mean for the same for the entire time series, you have one standard deviation and so on. So, these are stationary mean, stationary time stationary standard deviations and so on. But as we are well aware the hydrologic time series exhibit non stationarity, especially when you are talking about the stream flows. Let’s say the stream flow of June month will have its own mean, which will be much different from say flows during April or flows during February and so on especially in the monsoon climate. So, there are many situations where the mean standard deviation and the lag one correlations will be significantly different from one month to another month and therefore, it is essential that we build in this variation or this non stationarity in the moments into the generating model. So, we will now consider the first order Markov model with non stationarity. Now, the model that we just considered earlier, it was meant for annual flows. We will start relaxing the requirement that these need to be stationary and then what we do is, we write this model for seasonal flows. The seasons can be either months in which case, we will have twelve periods they can be monsoon, non monsoon season in which we will have two seasons or monsoon summer and winter three seasons. We May have we May consider ten day durations in which case we May have 36 or 37 time periods, according to how we. Like this, we will now consider the intra year periods over which we are interested in generating the flows. So, the same model now we generalize to account for non stationarity and this non stationarity essentially arises from the periodicity, as I just mentioned the June month flow of this year May be correlated with the June month flow of the previous year. So, there May be a twelve month periodicity, there May be a six month periodicity, there May be two year periodicity and so on depending on the type of data that we consider. And, this kind of periodicity introduces non stationarity in the data and this non stationarity, we will build into this model now build into the first order Markov model. A main application of the Markov model considering the non stationarity in the data is essentially for monthly stream flow generation, it has been very effectively used for generating monthly stream flows in situations, where there is a pronounced periodicity or seasonality. And, the periodicity as you know now will affect not only the mean and standard deviation, but also it will affect the lag one correlations, all of which appear in our Thomas fiering model or the Markov model, first order Markov model. So, from your stationary model we start introducing the non stationarity in the mean lag one correlation and the standard deviation. By introducing one more index here so i is the year and j is the month. So, we are generating for the i-th year and j plus 1 at a month or j plus 1 at the season if you like. So, what was mu x which means you had one mean for the entire sequence, we now convert that into mu of that particular season for which you are generating the data. So, mu j plus 1 plus instead of calling it as rho 1 lag one correlation, we denote it by rho j where rho is in fact, the lag one correlation and rho j indicates the lag one correlation between the month j and the j plus 1, the month j and month j plus 1. Because we are generating for the month j plus 1, the rho j will be the dependence of j plus 1 will indicate the dependence of j plus 1 th month’s flow on its previous month’s flow. So, rho j indicates the correlation between the flows of month j and the flows of month j plus 1, similarly sigma j indicates the standard deviation of the month j. And, the random component we are introducing here for every value, you generate here the random component will be different. So, it will have the same indices as your flow has the generating; the flow to be generated has. So, X i j plus 1 you have t i j plus 1, where t i j plus 1 is the drawn from N 0 1; that means, this follows normal distribution, standard normal distribution. So, using this expression then we should be able to generate monthly flows from the historical available flows. So, we would first estimate the moment’s mu j plus 1, which is mu j for j is equal to 1 to m where m is a number of seasons that we are considering. If it is a monthly model, m will be equal to 12. So, for each of the months we have the mean standard deviations and the lag one correlations with the next month. So, we have the data ready we start with an assumed initial value, which is typically assumed to be the mean itself. So, that this term vanishes exactly the same way as we did for the annual flows. We will consider an example now; we have stream flows at a particular river for 29 years, we have shown 12 years data here. So, like this we have for 29 years the data, the data is like this June, July etc up to May from 1979 80 goes on for 29 years. So, you have the data collected for 29 years. Remember, if you wanted to model it using a non stationary model what you will do, that you will compute the mean for each of these months, June month using the 29 values of June, July month using the 29 values of July and so on. So, you will have means for all the twelve months, similarly standard deviations. Similarly, you take the pairs June July and get rho 1, July August rho 2 etc, then May and June has rho 12. So, when you are considering the last month the twelfth month, the correlation will be with respect to the next month, that the next following month which will be June. So, you will estimate all the parameters based on this data and use them in the model. So, this is the time series for twenty years time series of the flows same data is shown in a figure here and from this data, we now compute the mean standard deviation and the lag one correlation. Now, these values here provide the mean standard deviation and lag one correlation. I repeat again the lag one correlation that we are writing here, e is with respect to the next month that is this indicates the lag one correlation between the flows of June to the flow of July. Similarly, this indicates the lag one correlation between the flows of May with flow of June. Once, we are ready with this we start generating the model generating the data, we assume the first value let’s say X 1 is assumed to be the same as the mean of the first month which is 117.49. And, sigma 1 is given here sigma 1 is 52.24 and you want to generate the second month’s flow using the first month flow. So, your correlation will be 0.348 similarly, because you are generating the second month flow you will get mu 2 and sigma 2 that, you will use from mu 2 is 474.5 and sigma 2 is 150.18. These values we use and write X 1 2 what does this mean, the flow for the first year for the second month. So, you would have assumed X 1 1, here this is X 1 1, this is X 1 1 is equal to mu 1 is what we have assumed and starting with that, we will write this to be 474.5 plus 0.348 into 150.18 by 52.24 etc. I just want to explain one thing in the expression here, when we wrote from the stationary model, stationary first order model to a non stationary model, here for a second term we introduced the ratio sigma j plus 1 by sigma j, you can see here X j minus mu j by sigma j here this is nothing but the standardized value. So, we use the standardized value and similarly, you get here X i j plus 1 minus mu j plus 1 by sigma j plus 1. So, that is why we introduce this ratio sigma j plus 1 by sigma j and that is what we are using here. So, this will be sigma j plus 1 by sigma j and we write this value to be assumed, because we are assuming X 1 1 to be mu 1 itself. So, this term goes this is a standard normal deviate, pick it up from the table or otherwise and this is your standard deviation for the second month in to root of 1 minus rho 1 square. So, you get 521.67. Now, we use this 521.67 to generate X 1 3 now we get X 1 3 for which, we will also require mu 3 sigma 3 and rho 2 in this particular case we need. So, we use these values and generate the next values here, there is another small mistake here this is not rho 3, we will just take. So, we get the value as 474.64 this was 0.348 that we used from here. So, the next value that we will be using is 0.154. So, 0.154 is what we use this is not rho 3, but rho 2 here. So, rho 2 is what we will be using, because it is a correlation between month 2 and month 3. So, X 1 3 is what we are getting so, we generate it to be equal to 474.64. Like this, we keep on doing from the third value we generate the fourth value and so on. This is a monthly model, so we generate from first month, second month etc up to twelfth month, like this we keep on going to twelfth month. Once, we reach the twelfth month we go to the next year flows. So, we write this as 2 1 second year first flow, first month’s flow and what will be using there mu 1 plus rho 12, because rho 12 is what drives the first month’s flow, the dependence of the first month’s flow on the previous month’s flow which is the twelfth month flow, that is what is given by rho 12. We write the expression for the second year first month flow, then we carry on second year second month flow etc like this. Like this, you generate for 50 years, 100 years, 150 years etc depending on the need. So, like this we proceed and generate the time series of monthly flows. Let’s consider another example, what we then do is like this we generate for 50 years and then compare; in this particular case we generated it for hundred years like this 2 1 up to 212, 31 312 etc like this it keeps on going up to 100 1 100 12, 100 comma 12 that is a hundredth year all the twelve months. We generate the values and then compute the mean standard deviation and the lag one correlation of the generated data. Typically, when we do this we discard the first few values and compute the mean standard deviation and lag one correlation of the generated data. Now, these should be approximately the same as the mean standard deviation and lag one correlation of the historical data. So, this we compare by drawing bar charts. So, this is a general procedure that you generate the data for a sufficiently long period of time, for a sufficiently long sequence of data you generate. Discard the first few values and then using the remaining data, you compute the mean standard deviation and lag one correlation and compare these with the historical that is the observed so, the observed is historical data and the generated data. Typically, in this particular example the means match fairly well so this is an observed data, this is a generated data. Similarly, the standard deviation this is the observed data, this is a generated data. So, mean and standard deviation compare fairly well. So, let’s see what happens to the lag one correlation. So, similarly for the lag one correlation we had observed values and we also have the generated values. Sometimes, the lag one correlations do not perform really well or specifically for example, here you had a lag one correlation of order of about minus point just around minus 0.2 whereas, generated one has reached almost minus 0.4, if this repeats many times within a twelve months period, then you should be concerned. In this particular case, you also see that at one point the lag one correlation was on the positive side whereas, the generated one is on the negative side. As long as both of them are insignificant, statistically insignificant then you do not have to worry much whether, it is on the positive side or negative side, because anyway it is insignificant. But if they are significant lag one correlations, statistically significant lag one correlations and they show different signs, then you should be concerned about the generated data. This is a significance band here. So, this is slightly about the significance band whereas, this is the insignificance band. So, there is a cause for concern here that the model is not really performing well in terms of lag one correlation whereas; in terms of the mean and the standard deviation the model is fairly acceptable. So, if you have applications where your lag one serial preservation of the lag one serial correlation is extremely important and you do not want to sacrifice on these two months. For example, you are talking about the month July here and the month January, if the lag one correlations have to be preserved for your application for these months, then you may have to start looking at other possibilities or looking at improvement of this model and so on. Otherwise, this is a fairly acceptable situation. We will also have situations, where the original data as shown here may not perfectly fit or may not be acceptable for the assumption of normal distribution and then you may get some unacceptable results, in terms of the mean or standard deviation or lag one correlations in as much as they do not compare well with the observed or the generated values do not compare well with the observed data. Then, what you must try is try with the logarithms of the flows; that means, your original data may not be normal distribution, but it is possible that the logarithms of the flows may be normally distributed, which means that the flows can be approximated as log normal distributions. In which case, we write the same model exactly the same model, but in terms of the logarithms of the flows. So, we use the transformation Y i j plus 1 is equal to logarithm of X i j plus 1. Simply convert the flows into logarithms and then write the same model in terms of the logarithms of the flow. Remember here, mu Y j plus 1 is with respect to the logarithm. So, all these moments here mu y j, sigma y j and rho y j-these refer to the mean, respectively mean standard deviation and lag one correlation of logarithms of original data. So, if you apply the model with the original data and you find that the comparison of the generated data with the observed data is not acceptable, then you may try with the logarithm’s data, logarithmic flows log transformed flows the same model, but now written in terms of the log transformed flows. Let’s, examine the data that was given in the previous example the same data, but we will convert that into logarithm of the flows. So, again 12 years are shown, but we have 29 years of data for that. So, you convert that into logarithm of flows. When you are converting into logarithm of flows generally, you face a difficulty that some of the flows may be zero, in which case log 0 is not defined and therefore, you may face that difficulty. What we generally do is, if you have 0 values put it to a very small value let’s say 0.005 or some such thing. Very small value compared to the other values that are there in the series. So, that you can use the log transformation and because you are talking about log transformation if you have values less than one, and then you may also get a negative value here, that is perfectly fine. So, we use the log transformed values and then we get the mean standard deviation and lag one correlation associated with the log transformed values. We use these in the model that we just defined and generate values of Y, remember we are generating values of log transformed values now log transformed data. When we do that and compare the mean standard deviation etc, this is how it looks the standard deviations appear like this and the lag one correlations appear like this. The lag one correlations again there are some months, in which they do not seem to tally well, they do not seem to compare well, but as long as they are statistically insignificant and it is acceptable for those particular months for the particular applications that you are talking about, then it is fine. With this methodology, now we will demonstrate one shorter time period flow generation as I said, this is a seasonal Thomas fiering model where the seasons can be monsoon, non monsoon, summer etc or the seasons can be months January, February etc or in many applications especially for irrigation reservoirs, hydro power reservoirs etc. You may be talking about other durations for example, ten daily duration, weekly duration and in certain cases daily duration and so on. But when we are applying the Thomas fiering model or the first order Markov model with the assumption of normal distribution for the flows, we must be alert to the situation that as you start reducing your time duration. The assumption of normal distribution for the flows may not be strictly valid and therefore, you can should not use this model for very small duration like daily flows, six days flows and so on. Often, it has been used successfully for ten day flows and we demonstrate one such example here ten day flows to Sardar Sarovar Reservoir in India. This is the data, this data is available for fairly long time, this is a ten day time period. So, what we do is a year is divided into 36 time periods, 36 time intervals during a year. Like in the case of monthly time period, you had 12 intervals you had 36 time intervals during a year and this shows the why so, this shows the flow in million cubic meters. So, this is just a time series plot. Now, with this time series plot then we use the Thomas fiering model as I just wrote here just for completeness, we will just see this is the Thomas fiering model or the first order Markov model, that we use i is the year and j is the month season and in this particular case day varies from 1 to 36. We have 36 time periods. So, for all these 36 time periods we would have computed the mean, the standard duration and the lag one correlations. We use this for 36 time periods and obtain the generated data and compare the generated data with the observed data. So, when we do that the generated and the mean flows with generated and historical data we compare like this; this keeps going as it is mentioned here July 1, July 2, July 3 is the first month which is July in this case, first month three time periods July 1 to 10, 11 to 20 and so on. Similarly, August 1, August 2, August 3 are the three 10 day time periods in the month of August and so on. Like this we compare this figure extends, but because of lack of space I shown it here up to February first 10 day time period. So, we compare the mean flow which is fairly acceptable in this case, similarly standard deviation we compare and then we compare the lag one correlations. Lag one correlation again there is a problem with August second time period, but as long as this is insignificant then we can take it as acceptable, statically insignificant or we can take it as acceptable. So, this type of generation we use for in applications such as reservoir operation, what we exactly did in this particular case is that, the flows into Sardar Sarovar Reservoir from the historical available data. Let’s say, we had 40 years of data at the reservoir. Ten day periods, we generate this for let’s say 100 years, 200 years etc several such sequences, how do we generate several such sequences by using different sequences of the random numbers, that appear in the model. By generating different sequences of random numbers, we generate different sequences of flows. Like this, let’s say you have several sequences of 50 year data ten day period data. We use these data in the simulation of reservoir operation and generate let’s say, you are operating the reservoir for hydro power. We use these sequences of data in the simulation and then generate several levels of hydro power, as resulting from this particular in flow sequence and then start talking about, how the system performs for this level of generated data. Because, the generated data has the same statistical properties as the observed data, instead of dealing with just one sequence of the observed data. We now, deal with several such sequences of observed data, so this is one direct application. In subsequent lectures, we will also see several other applications. So, when we are doing this a stochastic model of flows, there are several issues that we need to consider. That is essentially, you have an observed sequence of data from the observed sequence of data you want to generate another sequence or several such sequences of the data of the stream flow data, let’s say which have the same statistical properties as your observed data. The model that we just introduced namely, the first order stationary or non stationary Markov model. Suppose, you use this model and observe that your peak flows that where present in your historical data are not reproduced well, then you cannot may blame the model. Because, the model is not essentially meant to generate the peak flows, the model is meant to generate overall it has to it preserves the historical mean, it preserves the historical standard deviations and it preserves the lag one correlations. So, there is no feature built in to the model to preserve the peak flows, let’s say the annual maximum flows or the annual minimum flows. So, either way the either the maximum or the minimum if you are interested in those things, then you should not use a models like this. So, when you want to select a stochastic model, the purpose for which you want to use the model is extremely important. And therefore, we address several issues before we go into the model selection itself, what type of model that you want. The first issue that we address in this is, is it necessary to model peak flows. If you want to model the peak flows, then you have to adopt those particular models which will preserve the peak flows. Let’s say you are talking about the stream flows to a reservoir for the purpose of reservoir operation for irrigation, for hydro power, for municipal water supply and so on. In such situations you are not really concerned about preserving the peak flow themselves. So, you would be interested in on an average how the system behaves, in which case you are not really interested in the peak floods or the other extreme of droughts and so on. So, we are interested in looking at on an average how the system performs. So, the peak flows are not important. In situations, where you would like to have the peak flows also modeled; the next level question that you would like to ask is, is it important that you also consider the time during which the peak flow occurs. For example, the maximum flows may have occurred during the month of August, whether this fact is also important to be built into the model; that is the time during which the peak flow occurs is also to be built into the model. Then, are we also interested in the volume of flow or just the fact that the flow has exceeded a particular threshold is enough, especially when we are modeling the peak flows. So, is the volume of the flow is important? Then is the duration of flow to be considered important? That is what I mean by that is that whether you want to model this for daily flows, weekly flows, monthly flows and so on. As it is very obvious, depending on the duration of the flow that you want to consider the type of the model that you would like to fit can be quite different. From annual to season to about months, you can still use the first order Markov model either stationary version or the non stationary version of it. But as you start reducing your time periods, let’s say you come down to weekly time period, daily time period and so on. Then, the assumption of normality of flows will be violated and therefore, you may not be able to use such model then, you will have to go for different type of models, which will subsequently discuss in this course. Then, we will also look at dependence of the flow from one time period to another time period important in the Markov model, what did we do we introduced the lag one correlation coefficient. So, this dependence if it is important then we may need to introduce the correlation coefficients, it need not be only lag one correlation; it can be correlation with respect to let’s say twelve months behind that is the flow during a particular month, let’s say June of this year. Its dependence on the flow during June month of previous year, which means we may considering lag of the order of twelve - lag of twelve. So, we need to understand the particular issue that is important for the specific application that we have in mind and also the structure of the data itself. If there is a significant correlation or a significant dependence of a particular months flow on another months flow in the with a certain lag period, then that correlation has to be built in to the particular model. Then another important question that, we need to address affront is whether the time series is stationary whether, the data that we have is stationary, there are ways of assessing or estimating whether, the time series is stationary which will discuss subsequently. But, this is a very vital question, very critical question that we need to address, because if the time series is non stationary then we need to adopt address a non stationarity as we just did first order non stationary Markov model, in which we wanted to build in the non stationarity due to the flows having different moments during different time periods specifically months. Then, in the data is there in evidence of a jumps or trends whether, the data is it shows the continuously increasing trend, a continuously decreasing trend or is there a significant jump it was operating at a certain level and suddenly there is a jump and then it starts operating at a different level. So, is there an evidence of trends or jumps in the model? A most critical requirement or the issue that we need to be aware of, Is quality and quantity of data available itself. We may have for example, we may have flows at a particular location for last twelve to thirteen years, the quantity of data for any meaningful model will be quite small, if you have only 12 to 15 years of data. So, typically for the models to be useful if you have at least about 30 years of data, then you can rely on the results that you get out of the data. Although, when these models were developed actually somewhere around 60’s, 70’s etc, these models have been used for data of lengths of as small as 15 years, 12 years and so on. But, we expect that you have about 30 years of data to get meaningful results out of such models. And also, the quality of the data itself in many situations you may have data, but just by looking at the data you can say that this is of a very poor quality. In the sense that there may be missing data or there may be repetitions of the data which obviously, points to errors in the data and are hazy data in the sense that there may be values that are repeating, and there may be values that are missing in the sum sequence which indicates that the data has not been collected with reliable sources and so on. So, we must be alert to situations where the data is of poor quality. So, quantity as well as quality of data is important. So, what we have listed here are the issues that we need to be alert to before we actually choose a particular stochastic model. Now, we progress on to another important topic, which also leads to development of stochastic models. So, far what we were doing we were expressing the time series X t as a deterministic component plus a stochastic component; X t is equal to d t plus epsilon t or e t we wrote earlier and all this analysis that we were doing was all on time domain in the time domain. So, this was called this is called as analysis of data in the time domain. There is another elegant way of doing this is to convert the data and time domain into frequency domain. So, we write the time series in terms of frequencies specifically sin waves, cosine waves of varying frequencies and then we start looking at the time series in the frequency domain. So, that is called as analysis in the frequency domain. So, in today’s class I will just introduce what we mean by analysis in the frequency domain, the details of that we will discuss in the next lecture. So, before going into the analysis in the frequency domain let us recapitulate, what we did in the analysis in the time domain. In the time domain, we expressed X t to be consisting of X t is the time series consisting of a deterministic component d t and a stochastic component epsilon t. And it was our aim to capture what are these various components - what are these two components d t and epsilon t, if you recall the deterministic component d t can be either a trend or it can be a long term mean, around which there is a stochastic perturbations or there may be a periodicity. So, the data may be exhibiting a specific periodicity or there may be a jump or a drop, now these are the deterministic components. Then the stochastic component, we need to capture the essence of the stochastic component and build it into the model. So, this is essentially the principle of analysis in the time domain. In the frequency domain, what we do is the X t which is the time series we write it as consisting of combination of several frequencies sin and cosine waves of several different frequencies and then start looking at the periodicities that come out significant, come out as significant periodicities inherited the data. The frequency domain analysis is essentially used in hydrologic applications to capture the significant periodicities, as you recall in the when we discuss the correlograms or the auto correlations, the auto correlations also give an indication of whether the process is periodic or not, but it does not exactly indicate whether the periodicities that are indicated by the auto correlation function or in fact significant or not. Along with the auto correlation function, if we also use analysis in the frequency domain then we will be able to pin point the particular specific periodicities, that need to be considered in our models, in our generation models or the forecasting models and so on. So, in the frequency domain analysis this is just what I just mentioned that you may have a correlogram which is exhibiting a certain degree of periodicity here. So, if you have a periodic process the correlogram can be will appear something like this. Specifically, if you have monthly flows, the correlograms can be like this and then slowly it shows a decay. There will be a slow decay of the correlogram itself as I said in the time domain, we write X t is equal to d t plus e t and then we capture the deterministic component d t. So, periodicities in data can be determined by analyzing the time series in the frequency domain. So, along with the information that you generate for using the correlogram you also generate information using analysis in the frequency domain to capture those periodicities. So, the frequency domain analysis which is also called as a spectral analysis in this the time series is represented in the frequency domain, instead of in the time domain. The observed series essentially the underlying principle here is as the observed time series is a random sample of a process over time, which is made up of oscillations of all possible frequencies. So, we convert the time domain the time information into frequency information and as I just mentioned the spectral analysis or the frequency domain analysis is used to identify the frequencies or periodicities inherent in the data. Just a preliminary recap of what we mean by this frequency domain analysis. We have the concepts of the wave lengths and the amplitude and the periodicities; we express the X t time series as a combination of cosine waves and sin waves. And there is a random component associated with this, n is a sample size here, I will discuss this in detail in the next lecture. But just to give you a flavor of what we do in the frequency analysis X t or time series we express this as consisting of cosine and sin waves with different harmonics f k is the k-th harmonic of the natural frequency. And then, we do the analysis on X t by determining alpha naught and alpha k and beta k, and capture how much of variance that is present in a given frequency interval. So, essentially the idea is that how much of the variance in the data can be explained by different bands of frequencies. So, this is the basic principle of the frequency analysis; we will deal with the frequency analysis in detail in the next lecture. So, we will just recapitulate what we covered in this particular lecture, in the lecture number twelve. We started with the stationary first order Markov model for generation of the data. This is specifically used for annual flows; generation of annual flows the parameters and the moments that it preserves are the mean standard deviation and the lag one correlation. Now, these are assumed to be stationary. That is the model is stationary with respect to mean standard deviation and lag one correlation. Then, we relax this and build a non stationary first order Markov model which is typically used for generation of monthly flows and often it is also used for smaller time periods like 10 days time periods and larger time periods for seasonal flows. We saw three examples of using the non stationary model, first with the monthly data as given and then we convert that that into a log transformed data, the same non stationary Markov model can also be used for log transformed data. If the log transformed data can be approximated using a normal distribution. Then we saw an example of an Indian case study, where we considered the flows into Sardar Sarovar Reservoir and the time duration that we were interested in that particular case was 10 days duration, a year was divided into 36 time periods and we generated several sequences. We generated sequences of large lengths of data and then compare the historical mean, standard deviation and lag one correlation with the observed data. Then, we consider several issues that we need to be alert too when you are choosing a stochastic model especially those dealing with peak flows, the time during which the peak flow occurs and so on. Towards the end of the lecture, I just introduced the frequency domain analysis that means from the time domain, we convert the data into frequency domain and then start looking at the data in the frequency domain essentially to identify the periodicities inherent in the data. So, the purpose of all these methodologies that we are introducing in this course is to learn from the data - the data that is actually observed at a particular location is telling us some story about what has happened, we want to extract this information, so that we can model the data for purposes of several applications. So, we continue this discussion in the next lecture, in which I will introduce in detail the spectral analysis or the frequency domain analysis. Thank you for your attention.