Tip:
Highlight text to annotate it
X
Another question on the forums asked about the the length of writing down a bunch of
regular expression rules, or tokens, for a language.
The basic question was something like, this seems to go on forever!
I can imagine having to write down so many token definition rules
that I would expire of boredom before we ever finished.
Is that what happens in the real world? What's it like?
My answer to this is to some degree, there's no silver bullet.
There's no single easy way to encode structured information
to bring order from chaos.
But what I can tell you is that it is a totally surmountable task for real world languages.
I've been associated with a lexer and a parser, a frontend for the C-programming language.
To handle this C-programming language, which has a lot of gory details,
our list of token definitions was 600 lines long.
That might seem like a lot if you're writing it out all at once,
but in the grand scheme of things, a normal program like the Firefox web browser
is multiple millions of lines long.
So this list of regular expressions, this list of token definitions,
is actually a very miniscule part of the entire software engineering effort.
I was also involved in the creation of a lexer and parser, an interpreter for Java--
Java 1.1 at the time, and our list of tokens, our token definition file was 200 lines long.
Java was more regular than C in that regard, didn't require the lexer act,
which you can ask me about at some later point.
200 lines is even more reasonable.
Finally, I think in one of the videos at some point, I mentioned in a particular Tetris game
that I had the pleasure of working on, and there was a piece definition language
that let me use the planner pentominoes instead of the normal 4-length tetris pieces.
There the piece definition reader was more like 90 lines long,
which is seeming even smaller and more attractible.
But the way you really want to think about this is more like, say reasoning by analogy.
Is it a lot of work to build a road? Is it a lot of work to build a sewer system?
Is it a lot of work to paint a beautiful picture?
Yes, but you do the work once, and then you advertise the cost for everyone
who gets the chance to enjoy that construction.
Yes, it takes a long time to pave a road, but after that, many people can drive over it.
Often, language definitions like C or Java or C# or JavaScript--
they don't change very quickly, if they change at all.
Once you have taken the time to write down all of your token definitions for JavaScript,
even if it is a few hundred lines long,
you look at it. You write some test cases. You feel good about it.
You do a code walk through with someone else, and then you're done,
and you can just build upon that, take advantage of it for the rest of your
software development career.
So it can be a bit arduous to write down a bunch of token definitions,
especially since a lot of them seem the same.
But you do it once, and then it's over.