Tip:
Highlight text to annotate it
X
This session recapitulates materials on collections seen in the previous sessions
in the context of a larger example. The task will be to design a program that
converts telephone numbers to sentences. So here is the task.
You know that phone keys have mnemonics assigned to them.
If you look at your smartphone or other phone, you find that two gets associated
with ABC, three with DEF, four with GHI, and so on.
Assume you're giving a dictionary, which is a list of words that we call words for
simplicity. What I want you to do is design a method
translate, such that translate of a phone number would produce all phrases of words
that can save as mnemonics for the phone number.
So it's something like, one 1-800-CALLME, only for the words in the dictionary.
So here's an example. The phone number that is given by the
digit string, 7225247386,, should have the mnemonic, Scala is fun as one element of
the set of solution phrases. Why?
Because the digit seven has one of the letters associated with it, the S, 2wo has
both C and A, so that gives SCA. The five has the L, so that, this gives
SCAL, and so on. So you see the principle.
That's actually an example that has been studied before, I've taken it from a paper
by Lutz Prechelt, paper's called An Empirical Comparison of Seven Programming
Languages. It appeared, it appeared at EEE Computer,
number 33 in 2000. At the time the students were asked to solve this puzzle
in a set of scripting languages and in a set of general purpose languages.
And then., Various measures, were produced, like, how
long it took them, how many bugs, how long the program was, how fast the program was,
and so on. One fairly obvious outcome was that the
scripting languages were shorter, about 100 lines of code for the solution,
compared to two, 200 or 300 lines of code for the general purpose languages.
There were a lot of other interesting findings in that paper, so if you come
across it, I encourage you to read it. So let's see how we would solve this
problem in Scala using a worksheet. I've already laid down some intermediate
steps for you towards the final solution. The first thing we want to do is get a
dictionary of useful words. We have, put up one for you and under the
lamp.epfl.ch site, so here you find the full URL.
So that, using that URL, we can use the Source object that we have imported from
Scala IO to get an iterator that can, read in, one by one, all the characters in
that, under that URL. And the first thing we are going to do
with that is we are going to call get lines and that would now give you an
iterator that gives you back, the strings that make up the individual words, one by
one, Because in a dictionary, in fact, every
word is in its own line. Good.
The next thing we have given is the mnemonics map,
Which is the same on every phone here. So it's just a map from character to
string that contains the associations that you, that you know.
So the first task to do, then, would be, we would need to invert them on mnemonics
map. So now we wouldn't have a mapping from
digit to character strings. We want to have a mapping from characters
to digit, Call that map charCode. So the question
is, how do we implement that map? Well, one thing we could do, of course, is
write it down. We could say, A maps to two B maps to two
and so on, For every letter in the alphabet.
But that's a bit repetitive, and it violates the so-called DRY principle.
Dry means don't repeat yourself, Because, in a sense, we have encoded the
same information from this map twice. So in the unlikely case that the mnemonics
bindings for phone would change some points in the future, we'd have to change
two places in our program to reflect that instead of ideally, just one.
So let's do something that's shorter. So what we want to do is we want to, in a
sense, invert this map, now going from a map from the characters in the string here
to the digits. So the way we can do that is with a for
expression, so we let the four range of all the key value bindings in the map.
So the key would be a digit, The value would be a string,
And it comes from the map here. And then we still need to let the second
generator range of all the possible letters in the string.
So we say, letter taken from string and we yield.
What do we yield in that case? Well, we yield the binding that goes from
the letter to the digit. And what you see here is now a map that,
indeed, inverts the mnemonic maps that we've seen here.
So, equals to three is equals to nine, and so on.
The maps are, as you know, unordered so theses letters, these bindings will appear
in any order that's convenient for the implementation to be stored.
Good. So that was step one.
Step two is, I want to map not just a single character, but a whole word to the
digit-string it can represent. So for instance, if Java should map to
5282, Because J maps to five, A
Maps to two, and so on. So how can I achieve that?
Well, that's a simple map. Right?
I want to apply the same operation on every character in the world.
So, what I would write here, I would say word, map, charCode.
And that makes use of the fact that in fact maps like charCodes are functions
themselves. We've seen that before.
So we can use a map as the function of a map method.
Okay. So we have wordCode.
Let's, test it, Wordcode of Java.
That give us indeed 5282, But that was all upper case.
If you apply that to a lower case character, oops, we get a no such element
exception keynote found. The key that wasn't found was the lower
case n. Well, that's pretty understandable,
because, after all, our map went only from uppercase characters to digits, not from
lowercase characters. So the best way to correct it would be to
simply convert the word to uppercase using Java method on string to uppercase.
And that will do the trick, So now Java would map 25282 as we expect.
Good. The next step, then, would be to go again
the other way. So we know that Java maps to 5282.
What I want now is a list of all the words that map to 5282.
So I want to go the other way. I want to go from 5282.
And I want to find all the words that map to it and it's the same, of course, for
every digit string. So I will call this another map called
wordsForNum And that then would be a map from,
strings, Mainly digit strings,
Two, a sequence of strings. So that is the sequence of all the
possible words that I have on this list. So what do, how do I do that?
Well, if you think about it, then really, what we want to do is a By.group We want
to group lists of words that have the same code and we want to then, given the code,
retrieve that list. So it's a,.
It's simply the words, groupBy,. And the function B use as a discriminant
function is word, wordCode. So here's a small glitch.
The error message here says that the group by is not a member of the tight iterator
string which is a type of words and thats true.
The problem here is that words is an iterator which is the same as a Java
Iterator and, it doesn't have a groupBy method.
We haven't really covered iterators in this course, because it's an imperative
concept. So the best way to fix the problem here is
to just convert the iterator to a list. And we can do that with a two list method.
So now the word is a list and if we print it, then we see first the output that we
see the word list. So that looks like the prefix of a
dictionary, until the output exceeds the cutoff limit.
By the way, if you use the first version of the worksheet, then you will have
gotten a, the same thing that the output exceeds the cutoff limit.
But then you wouldn't get any further output,
Because there was a problem that the cutoff limit was determined for the whole
worksheet rather than for each command separately.
If you see that problem, then you should update your worksheet and, that will take
care of it. So let's see what the, worksheet gives us,
Otherwise, we see here a no such element exception that a key was not found and the
key not found was a minus. Let's try to track that down.
What we see here, We see, we see a stack of methods that all
came from the collection hierarchies. And then the first method that we wrote
was wordCode itself. So it seems that invert wordCode, we,
passed a key to charCode that wasn't found and the key, so the key here is, was a
minus. Charcode,, of course, takes care only of
uppercase letters whereas minus is not a letter at all.
So it seems, The diagnosis is that our dictionary
contains words that contain an embedded hyphen and our code here can't deal with
those. So I propose the best way to deal with
that is simply to drop words containing hyphens or other non-letter characters
from our word list, So let's try to do that.
So, we say we have a word list that we want to retain all the words that only
consist of letters. So retain would be a filter and we would
then say word forall. So all characters for the word, , are a
letter. And if we do that, that then we still get
the same dictionary, but scrolling down actually wordsForNum now gives us a true
map that has as a prefix the things that you see here.
Okay, great. So the almost the final thing to do now is
to write function in code. And function in code would return always
to encode a number, given it's a digit string as a list of words,
So it would give us a set of list of strings.
That's the words in each phrase and the set contains all the possible solutions.
So, so far, all our, implementations of methods were some simple application of a
method that's defined in the, in the Scala collection libraries.
With encode, we're not so lucky, So, there's nothing directly that,
represents encode, so we have to work on it ourselves.
A good strategy here is if you're given, data, like a string, or a list or
something to always get, take care of the boundary cases first.
So the boundary case here would be, what will you do if the number is empty?
What should you return then? The empty set.
Well, the answer is, no not really because the,
The empty number does have a solution and it's the empty phrase.
The empty phrase represents the empty number and the empty phrase would be just
represented as an empty list. So I argue that should be the right result
for empty numbers. Okay.
So if the number is not empty, what do we do?
Well, the new thing here is, we're dealing with phrases instead of words.
So we need to determine, essentially, how many characters we take from this number
to form the first word. And here we have a choice, that number
could range from one to the length of the number itself. So we could say for split
taken from one, two number of length. So that's where we want to split our
number. And up to the characters, up to the split
would form the first word and the rest would form the other words.
So lets see how we go, on then. So once we have a split, we can find out
what the first word should be. So that, that word would come in
wordsForNum off the number with the first split characters taken.
So, we take first split characters from the number, we apply wordsForNum, so that
gives us a sequence of all the strengths that have this number and to let word
range over that. Good.
Almost done. The other thing we need to do is to say,
Well, once we have taken split characters of the number, we need to treat the rest
of the number. So, what we need to do is we need to do
something with the number twelve split, That's the rest of the number.
And what do we need to do? Well, we apply encode to it.
And so, once, once we have that, we can compose our solution,
The solution would consist of the first word followed by the rest.
So we get an error here. The error message says that there is type
error. It found an index sequence, and it
required a set. Index sequence, you remember,
We got that because we started with a range.
And of course, the result can't be represented as a range.
So the Scala type inference will widen the type and the next available one is index
sequence, which has vector as its primary implementation.
So that's, of course, incompatible with a set,
So we have to convert the whole thing to a set and, one way to do that.
We could do that, at the end, Simply write two set for the result here.
And that would give us a valid in code, method.
So, let's test in code. I have prepared a test case here with a
digit string. Let's see what that gives.
We get a no such element exception key not found, seven.
So it seems we're not done yet. What happened here.
Well, if you look at where the t wasn't found, it was in the encode function
itself. So the map that we.
Apply here is wordsForNum.. So, obviously, we have a key where that is
not contained in the words for num map. That was, the key was the string seven.
And that becomes clear if you look at the mnemonics map.
So seven maps to P, Q, R, S and indeed, there's no single letter word that is
consists of any of these four letters. So, how can we fix that?
Well, You have seen in a previous session that
one way to deal with these keynote found errors is to make your maps total, and we
can do that, for wordsForNum,, we can make the map total by adding a default, so
we'll say, with default value. And what should be the default value B?
Well, we have a map to a sequence of all possible solutions.
So, I guess the empty sequence is a good default value.
So, essentially, if a key is not in the map, then we say, well, it doesn't have a
word that is associated with that digit string.
So, if you do that, then, indeed, we do get a result and we get a number of
phrases. The one we are after is, this one here.
Scala is fun is one of the possible solutions.
So, that's what we wanted here. You can play with other possible numbers
and see what they give. Good.
So, one more thing to do., We got the solutions as a set of lists.
But what I really would like to do is present them as complete phrases.
So, what I want to do is I want to, write a method,
Translate that gets a number and returns a set of, not list of strings, but strings
where the strings should be the whole phrase.
So how do we do that? Well, we start with encode number and then
for each solution, which is a list of strings, we have to present the solution
as a phrase. So for each means we have a map and the
operation we want to apply is a mkString operation, where we separate the words in
the list with spaces. So let's see what that gives.
If you now do translate of the digit string here.
Then, indeed, we get all the solutions as phrases, and one of them was the one we're
after, So that's it.
I hope you have gotten the impression how much fun it is to write programs in
interactive way, using immutable data structures and functions only.
In fact, I think that the immutable collections that you'll see in Scala and a
couple of other languages are, something that is, will play an increasingly
important role for the future of programming in general.
They're just too good not to happen.. They're very easy to use,
Because, generally, a few steps are sufficient to do the job.
They're very concise, typically, one word replaces a whole loop.
That's safe. So the type checker you've seen now, is
really good at catching the errors. Typically, errors we had when we when we
wrote the program where keynote found errors,
That's true that's one thing type checker doesn't check,
But a lot of the other things annoying errors already caught during compile time.
You don't need to test your program to find them.
The immutable collections are also pretty fast; because the collection operations
are tuned to the library and it turns out they can also be parallelized.
And then finally they're universal. That's one vocabulary that works on all
kinds of collections from lists, to vectors, to ranges, to sequences, to sets,
to maps. But also to further kind of collections
like parallel collections or even vestibule collections that you would
typically attack with map [inaudible]. I believe that makes it a really
attractive tool for software development.