Tip:
Highlight text to annotate it
X
Welcome back, In this video we're going to wrap up our presentations on lexical
analysis with the discussion of how we implement Finite Automata, using a variety
of different techniques. Just to review, here's our little flow chart of how
lexical analyzers are constructed. And today we're going to be focusing on this
last step. The implementation of DFA's and actually I should say that this chart is
not quite completely accurate because sometimes we don't go all the way to
DFA's. We stop with NFA's and we implement them directly and so we'll talk a little
bit about that. What if we didn't want to build a DFA and instead wanted to base our
lexical analyzers directly on an NFA. So, beginning now with DFA's, It's very simple
to implement a deterministic finite automaton as an array. There's dimensional
array and one of the dimensions will be the state. So we might have states here
and the other dimension will be the input symbols. And so I might have a state i and
then input A and I simply look up in that position and there will be the next state
k to which we're going to move. So the table stores at every particular input
symbol and state, the next state that the machine will move to. So, let's do an
example of converting a deterministic automaton into a table driven
implementation so here is the automaton that we built last time and recall that
several videos ago. We started with a regular expression which we convert into a
non-deterministic finite automaton and then we converted it to a deterministic
automaton just in the last video. And here it is again and now let's talk about how
to realize it as a table. So draw 2-dimensional table and there are three
states so we will need three rows. And just label these rows S, T, and U and then
there are two inputs, zero and one and now let's fill in the entries in the table.
So, in state S on input zero, where do we go? We go to state T. So, the entry in the
S0 entry will be T. And some really from state S input one will wind up in state U.
So on so the S1 entry will be U, okay? And then certainly for the other the other
rows of the table let's do the T row next on one we go to U and on zero we stay in
T. So, this, this row is also TU. And finally for U, what happens well, on zero
we go to T and on one we stay in U so this row is also TU and there's our table. That
describes the transition relation of the automaton. And now if we would think about
how we would use this transition relation in a program, you can easily imagine. We
would start out say with our input index. Porting to the beginning of the input and
let's call that zero and we can have to have a current state and we start at the
start state, let's just say that that's row zero so in this case that would be row
S. And then while what we wanted to do, we wanted to walk over the input. And check
whether, and checking on it, you know and make the transitions for each element of
the input one at a time and we want to stop when the input is empty. So while
there is still as input, let's say we have an array of characters that is our, that
is our input and while the entry in that array is not null, let's do the following.
We're gonna update the state. At each step and what are we gonna update it to let's
give this array a name. Let's call this array A. So, we're gonna look up in our
transition relation A and what are we going to use? Well, we're gonna look up
the current state, And we're going to look up the input. And in that entry I think,
you know? Using the, the current state and the current input, we're gonna transition
to a new state and we also wanted to increment the input pointer. So we'll do
that at the same time. And there is our loop that processes an input according to
the transition table A. And as you can see this is a very compact, very efficient
process. Just really, just a little bit of index arithmetic and one two table look
ups, one for the input and one for the state transition table per character of
input. So that's really hard to imagine having a process that's much faster or
more compact than this. Now that was one strategy for implementing a deterministic
finite automaton and you may have noticed that one disadvantage of that particular
approach was that there were a lot of duplicate rows in the table. In fact all
the rows of the table were exactly the same and we could save some space by using
a slightly different representation. So instead of having a 2-dimensional table,
we can just have a 1-dimensional table and this table again. Would be one entry for
each state so S, T, and U. And what this table would contain is a pointer to a
vector of moves for that particular state. So, there will be a pointer here and it
would point to another. Table, another one dimensional table that would say what we
should do zero and what we should do on one. So in the case of S, we wanted to
move to state T if it was a zero and to state U if it was a one. And now when we
go to fill in the entry for T, we see we don't need to duplicate this row. We can
actually just share this row and similarly for U. And so this table, this
representation is actually much more compact which just share the rows that are
duplicated in the automaton. And it turns out that in the kinds of automata that we
look at for lexical analysis it's very, very common to have duplicated rows. And
so this can actually resolve the significant compaction of the table
particularly when you consider a number of possible states. Remember there could be.
To the N - one states in a DFA. For an NFA with end states. And while the blow up is
often not the worst case it can be very substantial. So the two dimensional table
we had on the, in the previous slide could actually be quite, quite large and we
keep, we can sometimes have a much more compact representation by little tricks
like this. Now in this advantage of this particular representation is this extra
interaction, right here. I mean these pointers actually take time to the
reference and so now in our loop will be a little bit slower. We, we, we have to do
table look up the reference. Pointer do another tab le look up and then we can
make the move. Finally, it's also possible that we might not want to convert to a DFA
at all. It might be that the particular specification we gave is very expensive to
turn into a DFA. The table has just become truly huge and we might be better off just
using the NFA directly. And so, one can imagine an implementation of an NFA as
well. We can also implement that via a table. In this case, we would have to have
a row for each state in the NFA. And we won't do them all here. But we could have
Rows for every state of the NFA and then, you know where we're going to go if the
input is zero or if the input is one. And so in this case, And I almost forgot we
would also need a transition in the most naive or the most straight forward
implementation of, where we would go if, if an epsilon. And, and now, remember
because these are NFAs in general, this will be a set of states because we might
have more than one epsilon transition or more than one transition on zero and one.
And so, in this case an epsilon A can go to B. So this would be the set of states B
and B can go To C or D. And C can only go to E and on one, alright. And D can go to
F on and input of zero and so on. We fill in the rest of the table and this table is
guaranteed to be relatively small because the number of states is limited by the
number of states in the NFA and the size of the input alphabet. Once again, we
could do a sharing of common rows and, and other tricks like that to compress the
table if we like. But now, the inner loop for simulating this automaton is gonna be
much more expensive because we have to deal with sets of states rather than
single states. So in every point in time where we can be tracking a set of states
and when we're going to do a move, we have to look up for every state in the set
where it can potentially go including things like the epsilon moves and carry
out all possible epsilon moves so we always have an accurate assessment of the
complete set of states if the NFA could reach. So while this sav es a lot of space
in terms of the tables, in terms of the size of the tables it can be much slower
to execute than deterministic automaton. Summarize, a key idea in the
implementation of lexical specifications is the conversion of nondeterministic
finite automaton to deterministic finite automaton. This is what takes a general
high level specification use of regular expressions and confer to them to
something as completely deterministic and only uses a few operations per input
character. Now, in practice, tools provide tradeoffs between speed and space. So, so
DFA's are faster And less compact so the tables can be very large and, and at times
that's a practical problem and NFAs are slower to, to implement but more concise.
And the tools give you generally a series of options often in the form of
configuration files or command lines which is, that allow you to, to choose whether
you want to be closer to a full DFA, something that's faster and perhaps bigger
or to a pure NFAs, something that's slower but consumes less space.