Tip:
Highlight text to annotate it
X
Today’s lecture............... is on the lattice formulation of LPC coefficients. Over
the past few classes we have been discussing about the different approaches in order to
compute the LPC coefficients. The problem we posed few classes back but then we had
seen that essentially the challenge lies in trying to obtain the optimal set of coefficients
which will be based on the observation based on the pitch. And we had been discussing few methods such as the covariance method
and the correlation based method, auto auto autocorrelation based method and there we
had seen that essentially the approach gets into an iterativpitche one so that one can
compute in an efficient way all these coefficients alpha k’s where k varies from 1 to p. So
essentially we are obtaining the first set of estimates and then we are updating our
estimates with iterations and at the end of the pth iteration we are getting the final
coefficients which we have to use.
Now this involves as I said that this involves two basic operations: first is that we have
to determine the correlation from the observed pitch waveform and the second stage is that
we have to do the matrix operation in order to determine the LPC coefficients so these
two are involved.
Now there is a methodology which we are going to call as the lattice formulation where these
two approaches would be treated in an integrated manner. That means to say that we will not
be having any explicit computation of the correlation but rather we will be determining
the LPC coefficients in terms of the prediction errors and we will be describing the technique
pertaining to that very shortly. So, before we begin the lattice formulation
let us go back to what we did in Durbin’s algorithm for the autocorrelation. remember
that for Durbin’s algorithm what we were doing is that the set of coefficients we were
calling as alpha j the set alpha j and we were writing it as alpha j with the superscript
i the superscript indicating the iteration number so we were writing the set of alpha
ji and this j was in the range of 1 to i this was the set that we have and the significance
is that all these coefficients are the ith order optimal linear predictor
So we are having only up to i so in the ith iteration of prediction we are having only
ith order of prediction i passed coefficient. So in the n we are going to at p, i is equal
to p at the end and there we will be having alpha jp as the last estimate. We have already
seen that. And using these coefficients means to say that using this ith order predictive
coefficients it is possible for us to design a filter. And let us give the filter expression
like this that we are going to call that filter in the z domain, we are going to call that
as A(z) and we are going to write the filter A of z as 1 minus and then we express this
as the summation of alpha k superscript (i) into z to the power minus k and this index
k here changes from 1 to i.
So essentially what we have written is that, again with this ith order prediction we are
expressing the filter. So this is just a filter. Let us let us imagine that somehow we are
able to realize a filter using this alpha k’s, using this prediction parameters alpha
k’s we are able to obtain a design a filter like this.
Now in this filter if we are giving a pitch segment as an input to this filter then what
would you expect at the output of that filter? This is a filter realized out of the prediction
coefficients so what I am asking is input is the pitch segment, so we call the pitch
segment; so the input to the filter is the pitch segment and we are going to write it
as s suffix n of m. You know that what we mean by this expression that suffix n and
within parenthesis when we are writing (m) that means to say that we are going over to
the nth sample and then we are going to have (m) as our variable index and that segment
we are going to write it as s(n plus m) into w(m) where w(m) is the windowing coefficients.
So essentially s n of m is the windowed version of the segment and again for simplicity instead
of writing every time the segment with the subscript n we are going to write it as s
of m. So, what we are writing here as s n(m) we can simplify that by writing s of m by
just simply dropping this subscript m.
Now if s of m is the input to the filter, the filter as defined above what would you
expect at the output of the filter? Any answer? See, what is this filter?
This filter is realized out of the predictor coefficient. Have you come across this filter
expression in one of the previous lessons; I believe you have; the lesson which we had
started, I mean, when we had started this linear prediction linear predictive coding
there we had come across an expression like this.
correct, that is, that is it. So whenever we are giving the s of m as an input to this
filter we will be getting the error or rather the prediction error at the output. So we
can write it, in the time domain expression we can write the prediction error to be like
this. In fact the prediction error also we will be normally writing it as e n with the
superscript (i) e n (i) (m) that is how we are going to write and again for simplicity
we will be dropping this subscript n and we are simplify going to write it as e (i) of
m which will be the ith prediction error. So the prediction error naturally whenever
I am writing it with the superscript that means to say that the prediction error will
be iterative.
Hence, the output to the filter will be the prediction error and we are going to write
it like this. So let us say that........ let us number the equations. So the first one,
the filter that we realize out of the predictor coefficient is the equation number 1 this
z transform expression of that this we are calling as the equation number 1 and as equation
number 2 we are going to write the expression for the error term. So the error expression
will be e (i) (m) and that will be written as s of m minus summation k is equal to 1
to i alpha k i s of m minus k. This directly follows. You can just have a look at the expression,
this is in the z transform domain which means to say that when you are back again to the
time domain this z to be power minus k is there which will introduce k units of delay
which gets expressed as s of m minus k when we are back again to the time domain. So this
is what we are having, so this is the prediction error. Now this we are going to call as equation
number 2.
Now let us take the z transform of equation number 2. So what are we going write?
In fact taking the z transform we can write it as; now we are going to express it as the
capital capital E (i) of z. In fact we did not have to use this at all because we know
very well that E (i) of z is going to be A (i) of z that is to say the filter times s
of z which means to say that the filter that we have is simply like this that here at the
input we are going to have s of m and then we have a filter over here whose function
is A of z if we say and then the output of that will be e of m or rather to say in the
z transform domain we are going to write it as e of z and here it will be s of z the corresponding
z transformed version. So, that in the z transform our............ sorry sorry I mean, I just
wrote it in the other place wrong place; s of z and here e of z so e of z will be nothing
but A of z into S of z so this becomes our expression 3.
Now again our objective will be to express the relationship in an iterative form. In
order to do that let us go back to Durbin’s relation what we were discussing in the last
class and as per the Durbin’s relation remember that we had written this kind of an expression
that: alpha ji the ith estimate of the predictor we were writing as alpha j (i) minus 1 minus
k j alpha i minus j of (i minus 1)th iteration this is what we had written for the alphas
recursion; this is a recursive relation we got for the alphas.
Now what we are going to do is that we are going to put this expression of alpha in............
we will be putting it in the equation number 1; equation number 1 is this; of course equation
number 1 is written in the z transform form, this gets the filters z transform form and
in place of this alpha k i we are going write this expression which means to say that the
ith prediction will be expressed in terms of i minus 1th prediction and this residual
term that is what we have to do. And again let me tell you; although I will not be doing
it in that details but I leave to you for working.
What you can play with this equation number 1 is that you can express this equation number
1 as 1 minus summation k is equal to 1 to i minus 1 and make it alpha k i minus 1 z
to the power minus k and then you get another term which will be the ith the last term of
the expression which will be alpha (i, i) into z to the power minus I; just replacing
k by i you get the last term. And now this alpha (i, i) what you are getting, to express
this alpha (i, i) you just make use of this expression you get this alpha i previous and
then you get alpha then you get this expression and then again in order to get this you substitute
the i minus 2th prediction and so on. So in that process after doing the manipulations
accordingly, in order to play with the recursive relation................. actually you have
to know the art of playing with the recursive relationship, some students are quite good
in that.
So if you if you can play well with those recursions ultimately what you are going to
get is a recursive relation as something like this: A i (z) that can be expressed as A (i
minus 1) (z) minus k i z to the power minus i. See, we can immediately note that how this
z to the power minus i term will come; this is after isolating the last term in the summation
process we are definitely going get a z to the power minus i term and then with a proper
by properly identifying you will be finding that this becomes A (i minus 1) and the argument
of this becomes (z to the power minus 1) this is very interesting.
Here is a (z to the power minus 1) term as an argument of this A (z). Here it is A (z),
here it is A (z) but here it is A (z to the power minus 1). By simplification, by properly
working this is what you will be obtaining and this will be something which will become
a very interesting...................... and more interesting will be in the time domain,
very intelligent students may be able to guess what it is leading to.
Can anybody prematurely guess what is it leading to; z to the power minus 1?
Okay, then let us let us wait for a while. We will see the great thing that this z to
the power minus 1 is going to do.
Now this is expression number 4and substituting 4 in the equation number 3 that will be a
very simple equation. Equation number 3 is the simplest equation which just expresses
the error in terms of this A (z) and S (z) and if I substitute this equation number 4
into that simply what results is; what we are going to do; E (i) (z) is equal to A (i)
(z ) S(z); now in place of A (i) (z) we are going to write simply this so what simply
results is that it will be A (i) (z) a S(z) that means to say A (i minus 1) (z) S(z) minus
this expression times S(z).
Let us write it down. So substituting 4 in 3; E (i) of (z) is equal to A (i minus 1)
(z) S(z) minus k i z to the power minus i A minus 1 argument z minus 1 s of z. this
is the expression and let us call this as equation number 5.
Now what is the first term of this; what is the first term of this expression?
The first term is nothing but E (i minus 1) (z) because it simply results in the prediction
error for i minus 1th predictor and then there is a it is not just the (i minus 1)th prediction
error but also there is something, a term like this with that interesting z to the power
minus 1 and let us just make a just introduce, just define a term. Let us say that we define
another filter expression say B (i) of (z) we write B (i) of (z) we define as this expression:
z to the power minus i A (i) with an argument (z to the power minus 1) s of z so this expression;
what remains over here is this expression and this we are calling as B (i) (z). of course
here it is (i minus 1) so then automatically this will become z this this will become B
(i minus 1); of course that would mean that there is another term which will be there
and that will be z to the power plus 1.
Anyway, ultimately, if you are taking the inverse of this so say this is equation 6
and taking the inverse transform of 6 taking the inverse transform of 6, in the left hand
side we are definitely going to have....... I mean, call its inverse transform to be small
b (i) and now instead of (z) we will be writing (m) because we are back again to the time
domain and this becomes equal to s (m minus i) minus summation k is equal to 1 to i alpha
k (i) s(m plus k minus i) Non-causal system; yes, good observation, any other observation?
Delayed version; would you call it as delayed version really or would you call it as advanced
version? Delayed version is of course very understandable.
When we were doing this alpha k’s; the way we use the alpha k’s and we were generating
the error that error is a prediction error which we are going to call as the forward
prediction error. Now there is a great deal of similarity between........ call it as equation
7............ between equation 7 what we wrote over here and between equation 2; just let
us have a look at equation 2.
Now you see now you try to find out the similarity between equation 2 and equation 7. Both are
quite similar expressions that there is a (s) term and there is a prediction term summation
k is equal to 1 to i, alpha k, those things are remaining the same, the only thing is
that here there is s(m) and the predicted versions are s (m minus k), the samples which
are being used for prediction those are s (m minus k) and k is a positive quantity which
means to say that we are going back, we are taking the past samples to predict s(m). So
we are using the i past samples to predict s(m) and in that process generating a forward
prediction error E (i) (m).
What we are doing over here is just something which is opposite. Here it is s (m minus i)
so it is as if to say that we are going to predict s(m minus i) from s (m plus k minus
i) and k again is a positive quantity that means to say that we are going ahead in time.
So, based on the future samples we are going to predict the present sample something like
this. So in this case it is a backward prediction error and because it is a backward prediction
error we will be denoting we are denoting this by b, so E we are reserving for forward
prediction and b we are reserving for backward prediction.
Hence, there is a backward prediction, there is a forward prediction. Now it may seem to
be puzzling as one of your friends immediately expressed the application that it is getting
into non-causal; yes, it is apparently becoming non-causal in the sense that we are using
future samples so it is definitely in that sense a non-causal. But it does not matter.
It is realizable because what you have to do is only to introduce delays. If you introduce
some delays then the same set of samples could be used to predict a past one and same set
of samples could be used to predict a future one.
Thus, let us now live with the mathematical juggleries without going too much into the
nitty-gritties of it but realizing that the physical significance that emerges from it
is that we have to build simultaneously with both forward prediction error and also with
the backward prediction error.
Now what we are going to do is that......... the prediction error; now we return to this
equation; so this is our equation number 5 and if we return to this equation number 5
from here or rather to say; yeah you can say that if we go over to................... in
fact equation number 2 because equation number 5 is in the z transform domain so better you
look at the equation number 2 which is already in the time domain. So there what we can do
is that there is a summation term k is equal to 1 to i; instead if you are taking (i minus
1) you will be observing that essentially we will be able to identify the e of i minus
1 m and in fact it can be shown that the prediction error sequence e (i) of m that can be shown
as: e (i) of m can be written as e (i minus 1) (m) minus k i b (i minus 1) (m minus 1)
this is going to be the equation number 8.
We are getting this by just simple mathematical jugglery, nothing great about it, but a very
interesting fact that emerges is that the forward prediction error; the ith forward
prediction error is expressed in terms of (i minus 1)th forward prediction error and
(i minus 1)th backward prediction error.
Now what we can do is that......... this is the time domain expression and if we now substitute
the equation 4; equation 4 is this one, if we substitute in 6, substituting 6 means what;
in 6 we are having this A (i) (z to the power minus 1) this form so if we instead of writing
A (i) (z to the power minus 1) if we derive it from this equation number 4 then what results
is that....................... so substituting 4 in 6 what results is: B (i) of z that becomes
equal to z to the power minus i A (i minus 1) (z to the power minus 1) s of z minus k
i A (i minus 1) (z) S(z) and we are going to call this as equation number 9 or it could
be written rather as in a better form: B (i) of z can be written as: z to the power minus
1 into B (i minus 1) (z); by simplification it is possible to show this B (i minus z)
minus k i E (i minus 1) into z; this we are going to call as equation number 10.
Even in the z domain also you can identify what we are trying to do. It is the ith backward
prediction error that we are expressing in terms of (i minus 1)th backward prediction
error and (i minus 1)th forward prediction error. This is in the z domain, it is very
clear and just taking the inverse z transform what we will be getting is b (i) of m can
be written as b (i minus 1) (m minus 1) yes, because of the presence of this (z to the
power minus 1) this one unit of delay.
So, in the realization of the filter we had to give that one unit of delay and then this
is minus k i into e (i minus 1) as it is and here it is e (i minus 1) (m) simply. So this
is equation number 11.
Now we have got two nice equations at our hand. One is equation number 8; the other
is this equation number 11. So these two are intermingled; the ith forward prediction being
expressed like (i minus 1)th forward prediction, (b minus 1)th backward prediction and ith
backward prediction is expressed as (i minus 1)th backward prediction and (i minus 1)th
forward prediction. That means to say again this is a very nice iterative expression.
That means to say that if we know a starting point, we can iteratively go up to pth and
if we can go iteratively up to pth then we solve the problem because then this is what
we are realizing ultimately that in terms of this e (i) (m) and using this b (i) (m)
we will be able to express everything.
So what is going to be our starting point; what is i?
i is the order of the filter essentially and it is the intermediate order of the filter
in an iterative filtering process. It basically leads to an iterative filtering process. In
the iterative filtering process (i) is the intermediate order we are defining so our
starting point could be the zeroth order prediction. And what is zeroth order prediction?
Zeroth order prediction means we are not having any past value. So in absence of any past
value what will be the prediction error for s of m? A zeroth order predictor will give
you a prediction error equal to s of m that is the zeroth order forward prediction. And
zeroth order backward prediction also, s of m. So let that be the starting point.
Therefore, with initial condition that we know e (0) of m and we know b (0) of m we
should be in a position to now determine e (1) of m. Can we determine e (1) of m?
Yes. We now go to equation number 8, we want to determine e (1) of m; we know e (0) of
m; e (0) of m is nothing but s of m the sample value itself minus k i b (0) (m minus 1);
we know b (0) (m minus 1) because b (0) is nothing but the sample itself that is going
to be s of m minus. So essentially what we have is simply s of m e (1) of m becomes equal
to s (m minus k i) or in in this case it is going to be k 0 so k 0 into s (m minus 1).
How to get s (m minus 1)? Give a unit delay and you get s of m minus
1 so you can get e (i) (m) or e (1) (m) is obtained. Again you can obtain b (1) (m) no
problem because you know b (0) (m); you know e (0); you know b (0) you know e (0) so you
get b (1) and if you know b (1) in that case you will able to determine e (2) and you will
be able to determine b (2) and the computations are going on hand in hand.
So, what is the filter structure? Now anybody getting a filter structure in their dreams
it becomes a lattice. Hence, in the filter realization what we can simply do is that,
given the initial condition that e (0); when for the zeroth order prediction given the
initial condition that e(0) of m is equal to b (0) of m is equal to s of m, we can depict
these two equations what we have marked with the red this 8 and 11 these together can be
depicted in a flow graph like this. You get s(n) as the input sample and s(n) is same
as that of or rather just telling it in the opposite way e (0) of n is same as that of
s(n) and similarly b (0) of n is going to be same as that of s(n).
Now what we do is that, b (0) of n to that we put a delay. So we put a delay z to the
power minus 1. So what we have over here is b (0) n minus 1.
Now what we want to do? Let us say that we want to get e (1) of n.
To get e (1) of n; what we need? We need e (0) of n and we get and we want to take b
(0) of m minus 1. Now b (0) of n minus 1 is already obtained from this. But not simply
this: b (0) of n minus 1; in this case instead of writing n we are writing m that is not
a problem. But b (0) m minus 1 is multiplied by a coefficient k i. So in this case it is
becoming k 1. So we have to multiply this b (0) n minus 1 by a factor k 1; not exactly
k 1 rather minus k 1. So what we simply do is that just multiply this by minus by a coefficient
minus k 1 and then we are going to add up.
So, if we add up what results over here? e (1) (n) so here we get e (1) (n). And now
let us try to get the other part that is to say b (1) of n.
To get b (1) of n what we need? b (1) of n we need b (0) n minus 1, we already
have it. At the lower arm we already have b (0) (n minus 1) and we also need to get
minus k i e (0) (m) e (0) (n) rather. Now e (0) (n) we already have so what we have
to do simply is to multiply that by minus k 1. Hence, this also needs to be multiplied
by minus k 1. So we multiply this by minus k 1 and now we get at this end what are we
going to call b (1) of n. Thus, we have got e (1) of n; we have got b (1) of n.
Now the structure is going to repeat. So just for a completeness sake if we are drawing
the flow graph in a very formal way we should write here as 1 1 and all these things so
you know that we do like that in a flow graph. Now this e (1) n is available and b (1) n
is available and now this structure we have to continue which means to say that now there
will be another delay z to the power minus 1 and in the next one again there will be
a coefficient multiplication but this time it will be not minus k 1 instead we will be
having minus k 2 so this will be a multiplication by minus k 2 and in this case also very similarly
there will be a multiplication by minus k 2. So in this process you will be getting
here e (2) of n and here you will be getting b (2) of n let us write it with a bracket,
better; b (2) of n.
Now we are showing dotted lines just to indicate that a very similar structure is in progress
and the last realization will be; in the lattice the last realization will be e of p minus
1 (n) and here we will be having b (p minus 1) (n) and then there will be a z to the power
minus 1 and after the z to the power minus 1 there will be................. the diagram
should not appear clumsy to you, let me just try to draw it in a clear manner. So this
will be minus k p this will be minus k p over here so that at the end here you obtain e
p of n and then you obtain here b p of n. So here the filtering ends ultimately using
this e p of n and b p of n.
Now, in other words this is also giving you the coefficients. So essentially what we did?
As if to say it is Durbin’s algorithm only. It is an implementation of Durbin’s algorithm,
then what is the new thing in it. We have utilized a lattice filter structure for that
but essentially it is Durbin’s algorithm.
But what advantage is the lattice structure bringing before us?
You see, in case of Durbin’s algorithm or rather that iterative algorithm which we have
got the recursive algorithm for the autocorrelation method there we had a requirement that the
autocorrelation values are to be precomputed and then only you will be going in for Durbin’s
approach of matrix solution.
In this case the two have been integrated into one. You just do the filtering in this
manner you get the coefficients. So now what we can do is that, which one is more efficient
than the...............yes please; yes, what are those k 1 to k p values; how to get them?
What are the k 1 to k p values? You see, this k 1 to k p values are ultimately
this is what; just to go back to Durbin’s algorithm; what are those ki’s?
ki’s etc are the intermediate values. So ultimately this kp’s these are going to
give you the coefficients. Just refer to your earlier things. Ultimately see you are not
interested in e p and b p, you are interested in these things only. So it is these coefficients
that one gets. The first iteration is giving us the k 1, the first estimate and then the
last one is going to give us the k p. So this k 1 to k p......... let me go back to the
old notes then perhaps this is from there only, I am just trying to take out the old
slide. Anyway this should be clear to you when we were discussing about the autocorrelation
approach that k p...............
Remember, this is what we were doing; k 1 k 2 and all these things. So ultimately alpha
2 alpha (1, 2), alpha (2, 2) these are those values only so it is based on those things.
So now we come to some amount of comparison between these methods.
Now going in by the LPC....... now the different approaches that we have dealt with so far
there are three approaches so let us make a tabular comparison. Let us write down the
covariance method here. This is the autocorrelation method and this is the lattice method. This
is covariance means that Cholesky’s method and then autocorrelation is Durbin’s and
the lattice method this is also called as the Burg method.
Now essentially this question must have come to your mind that then what is the use of
the lattice structure because ultimately we are getting this e p’s and b p’s. Now
the thing is that it is these errors; if we encode the errors then it is effectively getting
a realization of our LPC parameters in effect.
In one of the coming lectures I will be also telling you about the synthesis aspect that
using the LPC coefficients how the synthesis can be done and then this point will be more
clear to you. We do the comparison of these three methods based on two considerations:
one is the storage and the other is the computational consideration. We will evaluate these three
methodologies from these two aspects. As far as the storage is concerned let us first take
the data storage requirements.
Now we normally go in for an n point analysis. Remember, in covariance method also we were
truncating the series to n n number of samples. in autocorrelation method also we were having
n number of samples and lattice also ultimately whenever we are inputting the samples there
is a certain finite number of samples that we are making into the system bringing into
the system so let us call those things as n. but because there are methodologies, the
number of points that we will be requiring may not be exactly the same in these three
approaches so datawise let us call that let us say covariance method has got N 1 number
of points; autocorrelation method has got N 2 number of points and lattice method that
has got N 3 number of points but in fact it will be three times N 3.
Now do you know that why it is three times of N 3?
It is because we also need to have the errors to be stored. Here it is three times of N 3 and then the matrix. It is three
times of N 3 because you have to store the data, you have to store the forward errors,
and you have to store the backward errors. Matrix: here it is proportional to P square
by 2 and in the case of autocorrelation it is proportional to p and there is no matrix
storage that will be required in lattice method, so here the question does not arise.
Now regarding the window windowing, covariance method uses 0, autocorrelation uses N 2 and
this does not require any windowing. And for the computation of windowing; windowing computation
here there is no windowing that is involved in the covariance method whereas in the autocorrelation
indeed the windowing is there with N 2 number of points and in the lattice method there
is no such windowing concept.
And then regarding the correlation; for correlation the covariance method has the correlation
computation proportional to N 1 into P and here for the autocorrelation method it is
N 2 into p and there is no correlation computation for the lattice method. But for the matrix
solution that means to say that for the matrix inversion here this is proportional to covariance
method is proportional to p q, very heavy computational burden whereas autocorrelation
method is proportional to p square and lattice method has got five N 3 times p.
In fact this follows from the algorithm; we are not going in to the exact derivation but
it is just to give you an idea about what sort of storage and computational comparisons
one gets by comparing these three methods: covariance, autocorrelation and lattice method.
Yes please? Sir, in lattice method it is set to correlation..........
yeah........... there should be the........ there is a p i and these values of p i depend
on the other correlations so there must be computational complicity involved with respect
to correlation calculation; also every P proportional to N P; proportional to N P; no, there is
no there is no correlation computation that is done. Then how will we get the value of
k i because to calculate ki’s we require correlation coefficients.
I suppose that this aspect I will throw some more light on it; I will be explaining this.
In fact correlation is not required but how one can get an idea of this ki’s this aspect
I will talk of in the coming class because we are already coming to the end of the time
for the present class; so thank you for now.