Tip:
Highlight text to annotate it
X
Good morning and welcome to this, the lecture number 17 of the course Stochastic Hydrology.
If you recall in the last lecture, we discussed the ARIMA models and specifically, how the
AR and MA components of the model behave. For example, we considered AR 1 model, formulated
a theoretical AR 1 model and saw how the correlogram of the theoretical AR 1 model behaves and
how the spectrum behaves and how the PAC function behaves that is the Partial Auto Correlation
function behaves and then we also looked at two examples of AR 2 models and similarly,
for MA 1 process we saw how the autocorrelation, partial autocorrelation function and the spectral
density function behave. So, we essentially identify the number of
AR terms and the number of MA terms in a AR ARIMA model by looking at the PAC function
as well as the correlogram. But I as I said in the last lecture, when you have both AR
and MA terms in the model and they are slightly of higher order, let us say that you are talking
about ARIMA 4, 5 ARIMA 4, 2; 4, 1, these kind of models where the number of AR terms is
quite large and the MA terms are also reasonably large in number, then identification becomes
quite difficult. And therefore, what we do is, we formulate
candidate models, several of candidate models and then for the candidate models, we examine,
we estimate the parameters and then examine whether the model is valid or not, then we
also examined in the last lecture, the procedure for parameter estimation. As I mentioned,
there are several algorithms available for parameter estimation and we introduced the
mat lab function ARMAX, which can be readily used and the the parameters for any given
ARIMA type of model can be estimated. We, towards end of the lecture, we examined
the maximum likelihood criteria for selection of the models. The specific problem is, that
we have a number of candidate models, we have estimated all the parameters for the models
based on the data, please do not lose sight of the hydrologic aspects of this, we have
observed data, for example the stream flow at a particular location. And then on this
observed data, we are doing all of these exercise so that we can fit a model for the time series
that has been observed. So, once we formulate the candidate models
and estimate the parameters for each of these models, then we have to choose which among
these candidate models is best suited for the observed data, this we do by two methods
as I mentioned in the last lecture, one is the maximum likelihood criterion which we
use for, when your application is for long term simulation of the data, then we use the
maximum likelihood criterion. What we did in that, we defined a likelihood
function, log likelihood function to be precise and estimate the log likelihood function value
for each of the models, so each of the model will have one log likelihood value, then pick
up that particular model which results in the maximum value of the log likelihood function
value. Now, we will go to the second part of the
applications of the model where the models are meant not so much for a long term synthetic
data data generation, but for one time real time, one time forecasting models that means,
one time step ahead forecasting models, like I mentioned, these applications will be, for
example monthly forecasting of the stream flows which may be useful for your real time
operation of the reservoirs, where you are operating the reservoir for irrigation, hydropower,
etcetera depending on the storage level and depending on the forecast of the flows, you
would like to operate the gates. This also becomes important when we are looking
at smaller time periods like 10 day time periods for real time irrigation scheduling and so
on, when we look at the applications, these points will be clear. But essentially we we
use the time series models to develop forecasting develop forecast for the reservoir in flows,
evapotranspiration, rainfall and so on, so forth.
So, when our applications are for short term forecasting, then the maximum likelihood criterion
is not the criterion that we use, we must then go for the minimum mean square error
criterion. Now, that is what we will discuss in the lecture today, how we develop the minimum
mean square error values for the models and then pick the best model that gives the minimum
mean square error.
So, the minimum mean square error criterion or the MSE in short, which is also called
as a prediction approach, what we do in this is that, let us say you have 50 years of data,
monthly stream flow data, then we use the first 50 first 20, 25 years of data to formulate
the mean square error.
Let us look at this sketch here, let us say that your time is on this scale and you have
N values of the time series, we use the first N by 2 values typically for developing the
model and then the remaining N by 2, we use for calculating the MSE. So, we calculate
the mean square error with the remaining N by 2 values.
What do I mean by that; let us say that you have fit a AR 1 type of model for using this
particular data. When I say you have fit the model, it means that the parameters have been
estimated for this particular model. How do I write this? We write this as X t is equal
to in our notation phi 1 X t minus 1 plus epsilon t. Now, this is a model that we we
have written and we have estimated the parameter phi 1 using one of the algorithms.
Now, when we want to do, use this model for forecasting, what does the forecast mean,
remember that this is a noise term here or a residual term, which has a zero mean and
they are all uncorrelated, the epsilon t are all uncorrelated. When we want to use this
particular model for forecasting purpose, the forecast is the expected value for the
next time period, let us say that you are standing in time period t and you want to
forecast for time period t plus 1. So, you are here now, and you know the information
that is available up to this particular point, and you want to forecast for X t plus 1. Let
us say that I write X t plus 1 is equal to phi 1 X t plus epsilon t plus 1, I am writing
it for t plus 1 now. Then for forecasting, whenever we say we want
to forecast this using this particular model, what is it that we are doing, we want to get
the expected value of X t plus 1 given the entire history up to time period t, that is
a problem, so we want to see, what is the expected value of the flow for time period
t plus 1 given the value until X t.
Now, to do this, I will take the expected value of X t plus 1 now, let us say I write
X expected value of X t plus 1. So, I am taking the expected value of this particular expression,
is equal to phi 1 into expected value of X t plus expected value of epsilon t plus 1.
Now, this is the forecast, and we generally denote the forecast by putting a cap on that
particular variable. So, X t plus 1 cap which is the forecast for the time period t plus
1 is equal to phi 1 expected value of X t plus expected value of epsilon t plus 1.
So, I will write expect the forecast X cap t plus 1 is equal to phi 1, expected value
of X p X t, I will write keep it as it is plus, what is this? This has the mean of 0,
the epsilon t remember is a sequence which has a mean of 0. So, when you take the expected
value of epsilon t or epsilon t plus 1 that is 0 here, so this is simply phi 1 expected
value of X t. Similarly, you can look at any models, let
us say that you want to take ARMA 1, 1 model here, and then use it for forecasting. So,
ARMA 1, 1 model we write it as X t plus 1. Let us say again we look at two time periods,
two adjacent time periods, time t, time t plus 1, you are here now and you want to forecast
for the next time period. To keep the relevance with hydrology, let us say that you are standing
in the month of June and you have the flow information up to the month of June. And you
would like to forecast what is likely to happen to the next month that is the flow during
the next month, if you are looking at the forecast for flows.
So, we will write this as X t plus 1 cap is equal to expected value of I am sorry I will
may be I will write first the model here. So, ARMA 1, 1 model we will write first, we
write ARMA 1, 1 model as for X t plus 1 is equal to I have one AR term. So, I will write
this as phi 1 into X t, I have one MA term. So, I will write it this as theta 1 into e
t plus e t plus 1, because I am writing for time period t plus 1.
Now, you would like to forecast, use this model for forecasting. As I said, what do
we mean by forecast? Forecast is the expected value of the particular variable, stream flow
in this particular case for example, for the next time period t plus 1.
So, we take the expected value of X t plus 1, I will write that as expected value of
X t plus 1 is equal to phi 1 expected value of X t plus theta 1 expected value of e t
plus e t plus 1. Remember here, when we are writing a ARMA
1, 1 model, the e t term here is the residual arising out of the application of this model
in the time period t. So, when you write this expected value of e t in this particular term,
the e t value is a constant. It is not the expected value of t plus 1 that becomes 0.
For example, we are may be talking about expected value of five units and therefore, this becomes
e t itself. So, I will write this as X t plus 1 cap is equal to phi 1 expected value of
X t plus theta 1 e t plus expected value of X t plus 1 e t plus 1 which is 0.
What is a difference here? What I did is that, this is a actual value of residual that resulted
from this, the this is actual value of residual that results from applying this particular
model for the time period t. So, there was a residual, there was a actual value that
was available and then the the forecasted value was available. So, the actual value
minus forecasted value gave me this e t value and therefore, when I take the expected value,
it becomes e t itself. So, I will write this as phi 1 expected value of X t plus theta
1 e t and this becomes 0, because these are all noise terms and therefore, you write the
forecasted value for time period t plus 1. Essentially then, we are getting the forecasted
values and applying for the first N by 2 terms that you have, as I mentioned you have N by
2 here, number of values and you have another N by 2 number of values. So, what we did,
we have developed the model using this N by 2 number of values, this is of for model development,
by model development I mean we used all of these values to obtain the parameters of the
candidate models. We had let us say for the forecasting, we
had the models AR 1, AR 2, ARMA 1, 1, ARMA 1, 2 and so on. As I mentioned in the last
class, when we are talking about the forecasting models, typically the lower order models in
terms of the number of parameters will suffice. In fact, when we see in the applications most
of one, the AR 1 or AR 1, AR 2 model themselves will be sufficient for one type step away
forecasting, especially when we are talking about smoothen processes like monthly stream
flow, seasonal flow flows and so on. So, we choose the candidate models which may
be different from the candidate models that you would have chosen for long term synthetic
generation of the data. We chose that and then developed the model using the first N
by 2 value, let us say if you have 50 years of values, you would have 50 into 12 that
is 600 values is the data, this is number of years and this is number of months. So,
you would have 600 values, so use the first 25 years data which is N by 2 to develop the
model and then calculate the errors of using this model on one time step ahead forecasting
for the remaining data, remaining part. What do I mean by that; let us say that you chose
your AR 1 model.
So, in the AR 1 model, you would have written your expected value of X t is equal to phi
1 into expected value of X that is forecast for time period t plus 1, you would have written
as phi 1 into X t bar, or is equal to phi 1 expected value of X t.
So, this phi 1, you would estimate based on the first N by 2 values of the data and then
start applying this, when you start applying this, let us say that you are now applying
for the remaining N by 2 values and this is one time step ahead focus. So, given X t now,
you had this X t value known, you want to apply this. So, I will get X t plus 1 cap,
this is known, X t is known, typically let us say you are standing at the end of one
month, June month. The flow during the month of June is known, you want to forecast the
flow during the month June July. Then we apply this, phi 1 is known, phi 1
is estimated already. So, we apply this as X bar t plus 1 is equal to phi 1 into expected
value of X t that becomes X t itself, because it is known. So, this you get X bar t plus
1, but because you have the data already for this N by 2, this data is available. What
we do, this is the forecasted value and X t is the actual known value of the flow. Let
us say that you have X t available with you, so I will I will take the error error of forecast.
So, what will be the error of forecast, X t plus 1 is your known data, I will say this
is known data and X t plus 1 cap is your forecasted value, let us say known data value. So, the
error I will write for the time period t plus 1 is equal to X t plus 1 minus X cap t plus
1, this is what is forecasted from the model, so like this I get error.
Now, we are writing for the remaining N by 2 time periods. So, next time when I go, let
us say that I want to write for X t plus 2 now. So, I finish my forecasting problem for
t plus 1. So, I come to t plus 2, when I am writing for the next time period, we use the
actual value that is known, X t plus 1 to get for X t plus 2 the forecast, not the forecasted
value, remember. Because this data is already available, and the question that we are asking
is standing at this point, knowing the value up to this point, what is my forecast for
the next time period? So, X t plus 2, I will write this as phi 1
X t plus 1, the expected value of X t plus 1 which becomes a X t plus 1 itself, because
this is known, this is known here plus expected value of the error term which is 0. So, I
will simply get X t plus 1 based on this, X t plus 2 based on this. Then again calculate
the error corresponding to X t plus 2, because the X t plus 2 forecast, because you know
the X t plus 2 value itself. And therefore, you get errors of t plus 1, t plus 2, the
errors corresponding to t plus 1 t plus 2 and so on. So, you formulate the error sequence,
using the error sequence, then you can get the mean square error.
So, this is the procedure that we use to obtain the errors of forecast. So, the same thing
is summarized here, using a portion of available data which is typically N by 2, estimate the
parameters of different candidate models. Use this forecasted models so developed, all
the candidate models one by one to get the series of forecast one time step ahead by
using the candidate models. And corresponding to each of these forecast, you know the error
term, get the mean square error and then pick up that particular model which results in
the minimum mean square error.
It is written more formally like this, so once time step ahead forecast for ARIMA p
q model now can be written as X bar t plus 1, you have p of AR terms X j X t minus j
plus you have q of moving average terms j is equal to 1 to q phi j e t minus j, the
noise term has mean 0, therefore it when you take the expected value, it vanishes and then
the error for one time step ahead focus is e t plus 1 is equal to X t plus 1 which is
the known value minus the forecasted value.
Once you know e t plus 1, then you get the mean square error, mean square error is because
you have used N by 2 number of values, you sum over sum over all these errors, the squares
of errors and then divide by N by 2, you get the mean square error. And choose that particular
model among the candidate models, among the number of candidate models to pick the particular
model which results in the minimum mean square error.
Now, what did we do, whether it is for long term simulation of the data or it is for short
short term, one time step ahead forecasting, we use the models ARIMA models and integration
we use for differencing, so typically we difference the series first and then apply the ARIMA
models. So, whenever I say ARIMA models, the integration is applied implied that is the
differencing is already done and on that we are applying the ARIMA models.
So, in either case, whether it is being used for long term sequence synthetic generation
or for short term duration of the data, we use part of the data for calibration and parameter
estimation, the remaining part we are using it for validation type. Let us say you are
using it for long term simulation and it is of ARIMA type of model, ARIMA 1, 1 or ARIMA
2, 1 etcetera. So, there is also a MA MA model available, MA term that is available.
Then what we do, that we applied for X t plus 1, get X t plus 1, you already have a known
X t plus 1. So, the residue term e t plus 1 is got from the available data minus the
model data, same principle as we did for forecasting. So, you generate corresponding to this term,
the residual series e t series. Similarly, if you are using it for the forecast,
the error of the forecast that we just obtained there, that error term becomes a residual
series. So, for the test data, you have generated e t plus 1, e t plus 2, e t plus 3, etcetera
up to all the value, all the remaining values. So, you have essentially the residual series
available with you, after you apply the model. Now, for the validation of the model, we test
this residual series that you so obtained. That means, you applied the model for the
remaining N by 2 values or whatever number of values that you have chosen for validation,
corresponding to each of the term you obtain either the residual or the forecast error.
So, e t sequence is known, on this e t sequence, now we carry out all the tests.
What are the tests? The test if you if you recall, when we formulated these models, we
wrote the residual or the noise term e t and said that the noise term should have a zero
mean that is the first assumption of the model, next that it should be divide of periodicities
and it should be uncorrelated. So, we do three primary test, primarily three test we do,
one is to test the series that we have generated, namely e t series. This series has a zero
mean and that the series is divide of any periodicities and also that the series is
uncorrelated.
So, we perform the test, so examine whether these assumptions that we have made that is
the residual series has a zero mean, no significant periodicities are present in the residual
series and that the residual series is uncorrelated. How do we formulate the residual series? As
I said, this is e t is the residual is equal to X t which is a known data, minus you have
this term corresponding to the simulated data from the model. So, this is phi j X t minus
j plus theta j e t minus j. So, this is how you calculate the e t term, so you have the
sequence of e t’s now.
On this sequence of e t, we do the following validation test, one is significance of the
residual mean that, what what is it that we want to test here that, the mean of the residuals
that we so obtained have a mean of is zero, the mean is zero. But obviously, it will not
be exactly equal to 0, therefore the mean should be not far away from 0. So, we say
that the mean is not significantly different from 0 that is the test that we make in this.
Similarly, significance of periodicities, you may still have periodicities present in
the data when you do the spectral analysis, the spikes may still appear. But the periodicities
that you identify on the residual series when you carry out the spectral analysis must be
insignificant; all of them must be insignificant. Now, in this we do the cumulative periodogram
test for this and then we also do the white noise test to make sure that the series is
uncorrelated. In the white noise test, we will carry out whittle’s test and portmanteau
test, typically most of this test we formulate in appropriate statistic and then knowing
the knowing that that statistic follows either way F distribution or t distribution corresponding
to critical values of F and t, we decide whether the particular series that we have passes
the test or not. Exception to that is the cumulative periodogram test where all the
periodicities are tested at once in one go, we will see the details of this now.
So, for the significance of residual mean, it examines the validity of the assumption
that the error series e t has a zero mean or the mean of the e t is not significantly
different from zero. So, we define I follow Kashyap and Rao’s book here, this is a book,
reference is given here. We form a statistic eta e as N to the power
half e bar by rho cap 1 by 2, remember rho cap here is not correlation, but it is a residual
variance. So, you had the residual series and the residual series, you have calculated
the mean that is the residual series is e t, this is given now and then e bar is associated
with this e t and rho cap is the variance of this e t and N is the number of values,
N is the data sample length, so you get N of e.
Once you get eta of e i am sorry eta of e, eta of e is known to be approximately distributed
as t alpha N minus 1 where alpha is the significant level for example, you pick a 95 percent or
99 percent and then you get the t alpha comma N minus 1, N is the number of data.
If the value of eta e that you so compute using this is less than the critical t value,
this is a critical t value corresponding to the level of significance that you choose.
If this value is less, then the mean of the residual series is not significantly different
from 0 and then we say that the series passes the test.
It is pretty simple as you can see the entire series is considered first, residual series
e t and then you formulate for the entire series. You formulate one value of eta e which
is your statistic and then look at, compare the eta is so, you so compute with the t distribution
of a specified alpha value and then look at whether this eta e value that you have computed
is in fact, less than the critical t value and then we say the series passes, so that
was the test for significance of the mean.
Now, the mean may be 0, but still you may have periodicities present in the residuals
and that residual series that you obtain from the model should be divide of any periodicities,
therefore we look for significance of the periodicities. And in fact, the residual series
should not have any significant periodicities present in the data. Now, I will discuss two
test, both of the test are valid, but one is slightly superior to the other as we presently
state. So, the first states again you formulate a
statistic eta e is equal to gamma k square into N minus 2 by 4 rho 1 cap, remember we
are doing the test for periodicities here, we pick one periodicity at a time. So, the
test is conducted for different periodicities, let us say that the error series that you
get has some periodicities present, let us say that this is a residual series and then
you see that there are certain periodicities here like this. So, we pick the periodicity
corresponding to this particular omega value first and then carry out this test.
So, the k that I am mentioning here corresponds to the particular periodicity that you want
to test, one at a time we are testing. So, corresponding to that particular value of
k, you calculate gamma k square and rho 1 again is the variance of the residual series.
So, we know rho 1 cap and gamma k, we compute simply alpha k square plus beta k square as
we did in our spectral analysis, N is the total number of values that you have.
So, gamma k square is equal to alpha k square plus beta k square for the particular value
of k, as I mentioned here gamma k corresponds to the periodicity being tested. So, in this
case we may be testing for this particular value and the omega k that we do in out spectral
analysis, let us say that this is omega k we are writing and this is I k with our notations.
We pick up that particular omega k and then for that particular k, we compute gamma k.
Then rho 1 cap I am sorry I mention this as the variance, rho 1 cap is computed based
on your e t, this is actually minus alpha 1 cos omega k t, this is corresponding to
that particular k value, omega k minus beta cap sin omega k t whole square. And alpha
k we know 2 by N, this is from our spectral analysis, so alpha k and beta k you get directly
from the spectral analysis. So, rho 1 cap is obtained from this for that
particular periodicity which we are examining now. I again repeat. This test, test one periodicity
at a time. So, we pick that particular omega k and then corresponding to that k, we calculate
gamma k and similarly, rho 1 cap is known. See, here your all these values are known,
alpha k is known, alpha cap is alpha k actually and then you have the beta k value which is
also known, for completeness sake let me make this correction. This is alpha k cap and this
is beta k cap, for that particular k and then you calculate rho 1 cap. So, rho 1 cap is
known in this statistic gamma k is known, rho 1 cap is known and therefore, you can
compute the statistics. Now, the periodicity for which the test is
being carried out is 2 pi by omega k, let us say you obtain a 12 month periodicity even
in the residuals, then the corresponding value of omega k is 2 pi by that particular periodicity.
So, the statistic eta e is approximately distributed as F 2 N minus 1 N minus 2 where alpha is
the significance level. The F distribution tables, if you look up
in any standard text book, they will give for each of the significance level that number
of degrees of freedom and for the number of values. So, let us say you choose F alpha
as F 0.95 or alpha as 0.95, 95 percent significance corresponding to 95 percent significance you
compute these. In fact, this is just the same test that we did in our spectral analysis
to identify the significance of periodicities and then you take the statistic that value
of the statistic that you have calculated eta of e. If this is less than F alpha 2 N
minus 2, then the periodicity is not significant. Let us say that your spectral analysis like
this shows of two or three different periodicities, first you test for this periodicity. If this
is not significant, then naturally these periodicities may not be significant if they are all in
decreasing order like this. However, you may have a case where this may be significant
if one of them is significant, then you have to test for the next periodicity also. So,
you keep testing for the periodicities until you are satisfied that all the periodicities
that are thrown up by the spectral analysis of the residual series are all insignificant,
then the series passes the test, even if one periodicity is significant, then the particular
model does not pass the test.
Now, for the significance of periodicities, we also have another test which is called
as the cumulative periodogram test or the Bartlett’s test. The advantage of this Bartlett’s
test is that unlike the previous test that I just explained, this test examines all the
periodicities at one time rather than going by one periodicity to another and therefore,
this is computationally convenient. So, the test is more convenient and it is preferred
because of its ability to test all the periodicities at a time.
So, essentially what we do in this test is that we form a cumulative periodogram as follows;
we define gamma k for k is equal to 1 to N by 2, as I mentioned this is a validation
test. So, this is validation period, for the N by 2 values you calculate gamma k square,
corresponding to each of the k, you know omega k from your spectral density, you you would
have computed this and e t is that particular value of the residual series and you are summing
over t is equal to 1 to N, t is equal to 1 to N and therefore, all the terms are known
here. So, for k is equal to 1, you get gamma k,
k is equal to 2 you get gamma k and so on. So, like this for k is equal to N by 2, you
have gamma k square, then you you determine g k as j is equal to 1 to k summation gamma
j square. So, from 1 to k, you are adding up gamma j square. So, g 1 you have one term,
g 2 you have two terms, 3 you have three terms and so on, like this you calculate summation
of this divided by over the entire period, k is equal to 1 to N by 2 gamma k square.
So, the plot of g k verses k is called as the cumulative periodogram. So, you know how
to compute g k, all values are known here, therefore you can calculate gamma k. Once
you know gamma k, you sum up to k and then get g k by normal I think with respect to
the entire sum here. And because of this nature, g k varies from 0 to 1; the maximum value
it can take is 1.
Then we plot k k on the x axis and g k on the y axis, now I will explain this with respect
to this figure. So, you computed g k and you have the k values, so this is the plot, this
black line you are seeing is the plot of g k verses k for the residual series.
Now, on this we have used N by 2 values and the maximum value of g k is 1. So, draw a
line between 0 and N by 2,1, this corresponds to a frequency of 0.5, so 0.5, 1 on the frequency
diagram. So, draw a line, this is the red line here, on this line on either side of
this red line, you draw the confidence bands, the confidence band is gamma that is lambda
divided by root of N by 2. So, this is lambda by root N by 2 on either side of the line
so drawn. So, like this you get a band band now, this is a band.
If your cumulative periodogram that you have drawn, any part of that goes above or below,
goes beyond the bound that you have drawn, then the series does not pass the test. In
fact, it means that corresponding to that particular, let us say that it was beyond
this particular band at a particular k value, the periodicity corresponding to that particular
k value in the residual series is significant. If the periodogram completely lies within
this band, then it is not significant, so this is how we examined the significance of
the periodicities. So, let me summarize that, so you draw first
the cumulative periodogram by considering g k by constructing g k for each of the k
values, you have N by 2 values and therefore, you construct g k verses k for N by 2 values.
And then on the cumulative periodogram two confidence limits given by lambda by root
N by 2 are drawn on either side of the line joining 0, 0 and N by 2, 1 N by 2, 1. What
is 1? 1 is the maximum value of this and N by 2 corresponds to your k max that is the
maximum lag, in fact this corresponds to a frequency of 0.5.
Then what is this lambda value? The value of lambda for 95 percent confidence limit
is 1.35 and for 99 percent confidence limit is 1.65. In fact, this test also I have taken
it from Kashyap and Rao. So, you can refer to Kashyap and Rao, these are the values prescribed
for this. So, you know how to draw the bounds now, you
know you have drawn the cumulative periodogram, you also know how to draw the bounds. Now,
if all the g k values lie within the significance band, there is no significant periodicity
present in the series, which means that the series passes the test. If a particular value
of g k lies outside the significance band, the periodicity corresponding to that value
of g k is significant and therefore, the series does not pass the test. So, this is how we
we carry out the Bartlett’s test or the cumulative periodogram test.
In most of the cases if the models are acceptable, the residual series typically gives, typically
lies well within the bounds that we have drawn. It hardly comes very close to these bounds,
in fact when I discuss the case studies it will be clear. However, just for your curiosity,
what you can do is, you draw the cumulative periodogram also for the original data, and
then you will see that there are several periodicities which are lying way above and below this particular
band, either on this side or on this side, so indicating that the periodicities are significant.
Now, we will see the white noise test. So, what we did is, for the periodicities we have
now two test and typically we prefer the Bartlett’s test, because of its ability to test all the
periodicities in one go and also computationally it is very simple. That means, you simply
calculate g k much like your correlogram, you formulate the periodogram with respect
to lag k, you calculate g k values and plot this and immediately this significance comes
out. Whereas, in the first two tests that we had, we formulated a statistic corresponding
to each of the periodicities that we suspect to be significant, we have to make that particular
test and therefore, the Bartlett’s test is preferred for test of periodicities.
Now, we will test for white noise or the assumption that we made that the series is uncorrelated,
the residual series that we formulate is uncorrelated. So, in this we have the whittle’s test for
white noise, whittles test we formulate again the covariance matrix, remember we are dealing
with the e t series that is the residual series. So, the covariance r k at lag k of the error
series e t is calculated, this is simply r k is equal to 1 by N minus k, j varies from
k plus 1 to k, k goes from 0 to k max, k max can be typically 0.5 0.15N here, e j into
e j minus k.
Then once you know the the covariance that is r k is given, you formulate the covariance
matrix. So, covariance matrix will be also size k max by k max, the symmetrical r 0,
r 1, r 2, etcetera, r k max, like this you formulate; tau n 1, this is a covariance matrix
and we denote it by tau n 1, then we formulate a statistic.
So, essentially we determined the covariance and formulated the covariance matrix. Using
the covariance matrix, we define a statistic eta e is equal to N is given n 1 is k max
minus 1 rho naught is lag zero correlation which is 1 and rho 1 cap are defined now minus
1. Rho 1 cap here is determinant tau n 1 by determinant
tau n 1 minus 1, tau n 1 is given here. So, this is a matrix, the determinant of this
is used here, determinant tau n 1, now tau n 1 minus 1 is constructed by eliminating
the last row and the last column from the t n 1 the tau n 1 matrix that is, you have
this matrix, you delete this row I am sorry the last row and the last column, you delete
this row and this column. So, you will have tau n 1 minus 1 and the determinant of that,
that is what is used here, so you can formulate rho 1 cap.
So, you know N here, n 1 is known, k max rho naught is 1 and rho 1 cap is formed by rho
1 cap is calculated by taking the determinant tau n 1, tau n 1 is defined here, then take
out the last row and last column, take the determinant and that is what defines determinant
of tau n 1 minus 1. Therefore, rho 1 cap is known and therefore, you calculate this particular
statistic.
Once that statistic is known, this statistic is approximately distributed as F alpha n
1 N minus n 1, n 1 is your k max, typically taken as 15 percent of your N. So, as we did
for earlier test using the F distribution, you fix your confidence level, typically 95
percent or 99 percent, n 1 is known, capital N is known which is a number of values and
therefore, you know the critical value of F alpha.
If the value of eta e that you calculate, this value if the value that you calculate
is less than F alpha, then the residual series is uncorrelated. So, this is what we do for
white noise test, white noise is the series is uncorrelated, this is called as the whittle’s
test.
In fact, we have another test for examining whether the series e t is in fact, white noise
or not, it is called as the portmanteau test. Now, this test is also carried out to test
the absence of correlation in the series and this also uses the r k as defined earlier
that is the covariance matrix as we have defined earlier, this covariance matrix is yours.
So, you know how to formulate the covariance matrix, from the covariance matrix we start
defining another statistic in the portmanteau test.
So, using the covariance matrix r k here, your statistic eta e is defined, N is the
number of values, n 1 is your maximum lag and r k is corresponding to the covariance
of order k. And then r naught is the covariance of order 0, so k is equal to 1 to n 1, so
you know how to calculate the statistic value.
Now, this statistic is approximated as, approximately distributed as chi square distribution with
argument n 1. So, you fixed alpha and n 1, n 1 is your k max which is a maximum lags
up to which you have gone. And from the chi square tables, you can get this value, the
value n 1 is chosen again as 0.15 N. So, get the chi square, alpha n 1 value from the tables,
if your eta e that you have calculated in less than the critical value of alpha, then
the residual series is uncorrelated. So, for the white noise test, that means to
examine whether the series is uncorrelated or not, we have two tests, namely whittle’s
test and the portmanteau test. In both of them we use the covariance matrix, covariance
matrix of the residual series and then formulate a statistic.
Now, between these two Kashyap and Rao have proved, the same text book that I have been
mentioning, they have proved that the portmanteau test is uniformly inferior to whittle’s
test and we prefer the latter for applications. So, while there are two test that I have mentioned,
if you if you are in a fix which one to use, you always go for the whittle’s test as
as shown by Kashyap and Rao. Now, before we go to the applications, essentially
what we have done is that we have formulated the models, ARIMA type of models and then
we have made the test and then we now start applying to different case studies. So, let
us summarize what we did on the choice of the ARIMA models, this is not just what I
have covered in this lecture, but also on the previous two lectures.
We formulate the ARIMA models after differencing the series, ARIMA that is autoregressive integrated
moving average models that order of differencing, whether it is a first order or second order
differencing depends on the non-stationarity present in the data. So, you do the differencing
essentially to convert the data into a stationary data, once you have the data as the stationary
time series, then you apply this ARIMA type of models.
Now, in the ARIMA models, you know how to estimate the number of AR parameters and the
number of MA parameters. In the event that you are not very clear about, how many AR
terms to use, how many MA terms to use etcetera, you form the candidate models, a large number
of candidate models and then apply this candidate models to the observed time series, estimate
the parameters and calibrate the models using part of the series, typically we use half
the length of the data to calibrate the series and use these models for validation for the
remaining half, remaining part of the data. When you apply these models for the remaining
part of the data, you get the residual series that is e t series.
We do the test of validation on the residual series, we do essentially three different
tests, one is to test whether the residual series has a zero mean or the mean of the
residual series is not significantly from significantly different from zero that is
the first test that we do. Then we test whether the residual series, the that we have obtained
by applying the model or is divide of any significant periodicities. So, you may come
up with some periodicities, you test for the periodicities, we sort two tests for periodicity,
for significance of periodicities and we normally prefer the cumulative periodogram test for
significance of periodicities. Then we also examine whether the series e
t that we have obtained is, in fact white noise or it is uncorrelated. For that again
we had two tests, the portmanteau test and the whittle’s test, it is shown in the standard
text book, Kashyap and Rao that the whittle’s test is better than the portmanteau test.
So, we apply the models, make all these tests and then make sure that the model that you
have chosen, either based on the maximum likelihood criteria or the minimum mean square criteria
passes all these tests and then choose that particular model for application. We will
continue the discussion and specifically, we will see how to apply these models, whatever
procedures that I have explained so far, we will see the applications of these in the
coming lectures. Thank you for your attention.