4 5 04 05 Implementing Finite Automata 11m56s

Welcome back, In this video we're going to wrap up our presentations on lexical analysis with the discussion of how we implement Finite Automata, using a variety of different techniques. Just to review, here's our little flow chart of how lexical analyzers are constructed. And today we're going to be focusing on this last step. The implementation of DFA's and actually I should say that this chart is not quite completely accurate because sometimes we don't go all the way to DFA's. We stop with NFA's and we implement them directly and so we'll talk a little bit about that. What if we didn't want to build a DFA and instead wanted to base our lexical analyzers directly on an NFA. So, beginning now with DFA's, It's very simple to implement a deterministic finite automaton as an array. There's dimensional array and one of the dimensions will be the state. So we might have states here and the other dimension will be the input symbols. And so I might have a state i and then input A and I simply look up in that position and there will be the next state k to which we're going to move. So the table stores at every particular input symbol and state, the next state that the machine will move to. So, let's do an example of converting a deterministic automaton into a table driven implementation so here is the automaton that we built last time and recall that several videos ago. We started with a regular expression which we convert into a non-deterministic finite automaton and then we converted it to a deterministic automaton just in the last video. And here it is again and now let's talk about how to realize it as a table. So draw 2-dimensional table and there are three states so we will need three rows. And just label these rows S, T, and U and then there are two inputs, zero and one and now let's fill in the entries in the table. So, in state S on input zero, where do we go? We go to state T. So, the entry in the S0 entry will be T. And some really from state S input one will wind up in state U. So on so the S1 entry will be U, okay? And then certainly for the other the other rows of the table let's do the T row next on one we go to U and on zero we stay in T. So, this, this row is also TU. And finally for U, what happens well, on zero we go to T and on one we stay in U so this row is also TU and there's our table. That describes the transition relation of the automaton. And now if we would think about how we would use this transition relation in a program, you can easily imagine. We would start out say with our input index. Porting to the beginning of the input and let's call that zero and we can have to have a current state and we start at the start state, let's just say that that's row zero so in this case that would be row S. And then while what we wanted to do, we wanted to walk over the input. And check whether, and checking on it, you know and make the transitions for each element of the input one at a time and we want to stop when the input is empty. So while there is still as input, let's say we have an array of characters that is our, that is our input and while the entry in that array is not null, let's do the following. We're gonna update the state. At each step and what are we gonna update it to let's give this array a name. Let's call this array A. So, we're gonna look up in our transition relation A and what are we going to use? Well, we're gonna look up the current state, And we're going to look up the input. And in that entry I think, you know? Using the, the current state and the current input, we're gonna transition to a new state and we also wanted to increment the input pointer. So we'll do that at the same time. And there is our loop that processes an input according to the transition table A. And as you can see this is a very compact, very efficient process. Just really, just a little bit of index arithmetic and one two table look ups, one for the input and one for the state transition table per character of input. So that's really hard to imagine having a process that's much faster or more compact than this. Now that was one strategy for implementing a deterministic finite automaton and you may have noticed that one disadvantage of that particular approach was that there were a lot of duplicate rows in the table. In fact all the rows of the table were exactly the same and we could save some space by using a slightly different representation. So instead of having a 2-dimensional table, we can just have a 1-dimensional table and this table again. Would be one entry for each state so S, T, and U. And what this table would contain is a pointer to a vector of moves for that particular state. So, there will be a pointer here and it would point to another. Table, another one dimensional table that would say what we should do zero and what we should do on one. So in the case of S, we wanted to move to state T if it was a zero and to state U if it was a one. And now when we go to fill in the entry for T, we see we don't need to duplicate this row. We can actually just share this row and similarly for U. And so this table, this representation is actually much more compact which just share the rows that are duplicated in the automaton. And it turns out that in the kinds of automata that we look at for lexical analysis it's very, very common to have duplicated rows. And so this can actually resolve the significant compaction of the table particularly when you consider a number of possible states. Remember there could be. To the N - one states in a DFA. For an NFA with end states. And while the blow up is often not the worst case it can be very substantial. So the two dimensional table we had on the, in the previous slide could actually be quite, quite large and we keep, we can sometimes have a much more compact representation by little tricks like this. Now in this advantage of this particular representation is this extra interaction, right here. I mean these pointers actually take time to the reference and so now in our loop will be a little bit slower. We, we, we have to do table look up the reference. Pointer do another tab le look up and then we can make the move. Finally, it's also possible that we might not want to convert to a DFA at all. It might be that the particular specification we gave is very expensive to turn into a DFA. The table has just become truly huge and we might be better off just using the NFA directly. And so, one can imagine an implementation of an NFA as well. We can also implement that via a table. In this case, we would have to have a row for each state in the NFA. And we won't do them all here. But we could have Rows for every state of the NFA and then, you know where we're going to go if the input is zero or if the input is one. And so in this case, And I almost forgot we would also need a transition in the most naive or the most straight forward implementation of, where we would go if, if an epsilon. And, and now, remember because these are NFAs in general, this will be a set of states because we might have more than one epsilon transition or more than one transition on zero and one. And so, in this case an epsilon A can go to B. So this would be the set of states B and B can go To C or D. And C can only go to E and on one, alright. And D can go to F on and input of zero and so on. We fill in the rest of the table and this table is guaranteed to be relatively small because the number of states is limited by the number of states in the NFA and the size of the input alphabet. Once again, we could do a sharing of common rows and, and other tricks like that to compress the table if we like. But now, the inner loop for simulating this automaton is gonna be much more expensive because we have to deal with sets of states rather than single states. So in every point in time where we can be tracking a set of states and when we're going to do a move, we have to look up for every state in the set where it can potentially go including things like the epsilon moves and carry out all possible epsilon moves so we always have an accurate assessment of the complete set of states if the NFA could reach. So while this sav es a lot of space in terms of the tables, in terms of the size of the tables it can be much slower to execute than deterministic automaton. Summarize, a key idea in the implementation of lexical specifications is the conversion of nondeterministic finite automaton to deterministic finite automaton. This is what takes a general high level specification use of regular expressions and confer to them to something as completely deterministic and only uses a few operations per input character. Now, in practice, tools provide tradeoffs between speed and space. So, so DFA's are faster And less compact so the tables can be very large and, and at times that's a practical problem and NFAs are slower to, to implement but more concise. And the tools give you generally a series of options often in the form of configuration files or command lines which is, that allow you to, to choose whether you want to be closer to a full DFA, something that's faster and perhaps bigger or to a pure NFAs, something that's slower but consumes less space.