7. Putting The Pieces Together

This session recapitulates materials on collections seen in the previous sessions in the context of a larger example. The task will be to design a program that converts telephone numbers to sentences. So here is the task. You know that phone keys have mnemonics assigned to them. If you look at your smartphone or other phone, you find that two gets associated with ABC, three with DEF, four with GHI, and so on. Assume you're giving a dictionary, which is a list of words that we call words for simplicity. What I want you to do is design a method translate, such that translate of a phone number would produce all phrases of words that can save as mnemonics for the phone number. So it's something like, one 1-800-CALLME, only for the words in the dictionary. So here's an example. The phone number that is given by the digit string, 7225247386,, should have the mnemonic, Scala is fun as one element of the set of solution phrases. Why? Because the digit seven has one of the letters associated with it, the S, 2wo has both C and A, so that gives SCA. The five has the L, so that, this gives SCAL, and so on. So you see the principle. That's actually an example that has been studied before, I've taken it from a paper by Lutz Prechelt, paper's called An Empirical Comparison of Seven Programming Languages. It appeared, it appeared at EEE Computer, number 33 in 2000. At the time the students were asked to solve this puzzle in a set of scripting languages and in a set of general purpose languages. And then., Various measures, were produced, like, how long it took them, how many bugs, how long the program was, how fast the program was, and so on. One fairly obvious outcome was that the scripting languages were shorter, about 100 lines of code for the solution, compared to two, 200 or 300 lines of code for the general purpose languages. There were a lot of other interesting findings in that paper, so if you come across it, I encourage you to read it. So let's see how we would solve this problem in Scala using a worksheet. I've already laid down some intermediate steps for you towards the final solution. The first thing we want to do is get a dictionary of useful words. We have, put up one for you and under the lamp.epfl.ch site, so here you find the full URL. So that, using that URL, we can use the Source object that we have imported from Scala IO to get an iterator that can, read in, one by one, all the characters in that, under that URL. And the first thing we are going to do with that is we are going to call get lines and that would now give you an iterator that gives you back, the strings that make up the individual words, one by one, Because in a dictionary, in fact, every word is in its own line. Good. The next thing we have given is the mnemonics map, Which is the same on every phone here. So it's just a map from character to string that contains the associations that you, that you know. So the first task to do, then, would be, we would need to invert them on mnemonics map. So now we wouldn't have a mapping from digit to character strings. We want to have a mapping from characters to digit, Call that map charCode. So the question is, how do we implement that map? Well, one thing we could do, of course, is write it down. We could say, A maps to two B maps to two and so on, For every letter in the alphabet. But that's a bit repetitive, and it violates the so-called DRY principle. Dry means don't repeat yourself, Because, in a sense, we have encoded the same information from this map twice. So in the unlikely case that the mnemonics bindings for phone would change some points in the future, we'd have to change two places in our program to reflect that instead of ideally, just one. So let's do something that's shorter. So what we want to do is we want to, in a sense, invert this map, now going from a map from the characters in the string here to the digits. So the way we can do that is with a for expression, so we let the four range of all the key value bindings in the map. So the key would be a digit, The value would be a string, And it comes from the map here. And then we still need to let the second generator range of all the possible letters in the string. So we say, letter taken from string and we yield. What do we yield in that case? Well, we yield the binding that goes from the letter to the digit. And what you see here is now a map that, indeed, inverts the mnemonic maps that we've seen here. So, equals to three is equals to nine, and so on. The maps are, as you know, unordered so theses letters, these bindings will appear in any order that's convenient for the implementation to be stored. Good. So that was step one. Step two is, I want to map not just a single character, but a whole word to the digit-string it can represent. So for instance, if Java should map to 5282, Because J maps to five, A Maps to two, and so on. So how can I achieve that? Well, that's a simple map. Right? I want to apply the same operation on every character in the world. So, what I would write here, I would say word, map, charCode. And that makes use of the fact that in fact maps like charCodes are functions themselves. We've seen that before. So we can use a map as the function of a map method. Okay. So we have wordCode. Let's, test it, Wordcode of Java. That give us indeed 5282, But that was all upper case. If you apply that to a lower case character, oops, we get a no such element exception keynote found. The key that wasn't found was the lower case n. Well, that's pretty understandable, because, after all, our map went only from uppercase characters to digits, not from lowercase characters. So the best way to correct it would be to simply convert the word to uppercase using Java method on string to uppercase. And that will do the trick, So now Java would map 25282 as we expect. Good. The next step, then, would be to go again the other way. So we know that Java maps to 5282. What I want now is a list of all the words that map to 5282. So I want to go the other way. I want to go from 5282. And I want to find all the words that map to it and it's the same, of course, for every digit string. So I will call this another map called wordsForNum And that then would be a map from, strings, Mainly digit strings, Two, a sequence of strings. So that is the sequence of all the possible words that I have on this list. So what do, how do I do that? Well, if you think about it, then really, what we want to do is a By.group We want to group lists of words that have the same code and we want to then, given the code, retrieve that list. So it's a,. It's simply the words, groupBy,. And the function B use as a discriminant function is word, wordCode. So here's a small glitch. The error message here says that the group by is not a member of the tight iterator string which is a type of words and thats true. The problem here is that words is an iterator which is the same as a Java Iterator and, it doesn't have a groupBy method. We haven't really covered iterators in this course, because it's an imperative concept. So the best way to fix the problem here is to just convert the iterator to a list. And we can do that with a two list method. So now the word is a list and if we print it, then we see first the output that we see the word list. So that looks like the prefix of a dictionary, until the output exceeds the cutoff limit. By the way, if you use the first version of the worksheet, then you will have gotten a, the same thing that the output exceeds the cutoff limit. But then you wouldn't get any further output, Because there was a problem that the cutoff limit was determined for the whole worksheet rather than for each command separately. If you see that problem, then you should update your worksheet and, that will take care of it. So let's see what the, worksheet gives us, Otherwise, we see here a no such element exception that a key was not found and the key not found was a minus. Let's try to track that down. What we see here, We see, we see a stack of methods that all came from the collection hierarchies. And then the first method that we wrote was wordCode itself. So it seems that invert wordCode, we, passed a key to charCode that wasn't found and the key, so the key here is, was a minus. Charcode,, of course, takes care only of uppercase letters whereas minus is not a letter at all. So it seems, The diagnosis is that our dictionary contains words that contain an embedded hyphen and our code here can't deal with those. So I propose the best way to deal with that is simply to drop words containing hyphens or other non-letter characters from our word list, So let's try to do that. So, we say we have a word list that we want to retain all the words that only consist of letters. So retain would be a filter and we would then say word forall. So all characters for the word, , are a letter. And if we do that, that then we still get the same dictionary, but scrolling down actually wordsForNum now gives us a true map that has as a prefix the things that you see here. Okay, great. So the almost the final thing to do now is to write function in code. And function in code would return always to encode a number, given it's a digit string as a list of words, So it would give us a set of list of strings. That's the words in each phrase and the set contains all the possible solutions. So, so far, all our, implementations of methods were some simple application of a method that's defined in the, in the Scala collection libraries. With encode, we're not so lucky, So, there's nothing directly that, represents encode, so we have to work on it ourselves. A good strategy here is if you're given, data, like a string, or a list or something to always get, take care of the boundary cases first. So the boundary case here would be, what will you do if the number is empty? What should you return then? The empty set. Well, the answer is, no not really because the, The empty number does have a solution and it's the empty phrase. The empty phrase represents the empty number and the empty phrase would be just represented as an empty list. So I argue that should be the right result for empty numbers. Okay. So if the number is not empty, what do we do? Well, the new thing here is, we're dealing with phrases instead of words. So we need to determine, essentially, how many characters we take from this number to form the first word. And here we have a choice, that number could range from one to the length of the number itself. So we could say for split taken from one, two number of length. So that's where we want to split our number. And up to the characters, up to the split would form the first word and the rest would form the other words. So lets see how we go, on then. So once we have a split, we can find out what the first word should be. So that, that word would come in wordsForNum off the number with the first split characters taken. So, we take first split characters from the number, we apply wordsForNum, so that gives us a sequence of all the strengths that have this number and to let word range over that. Good. Almost done. The other thing we need to do is to say, Well, once we have taken split characters of the number, we need to treat the rest of the number. So, what we need to do is we need to do something with the number twelve split, That's the rest of the number. And what do we need to do? Well, we apply encode to it. And so, once, once we have that, we can compose our solution, The solution would consist of the first word followed by the rest. So we get an error here. The error message says that there is type error. It found an index sequence, and it required a set. Index sequence, you remember, We got that because we started with a range. And of course, the result can't be represented as a range. So the Scala type inference will widen the type and the next available one is index sequence, which has vector as its primary implementation. So that's, of course, incompatible with a set, So we have to convert the whole thing to a set and, one way to do that. We could do that, at the end, Simply write two set for the result here. And that would give us a valid in code, method. So, let's test in code. I have prepared a test case here with a digit string. Let's see what that gives. We get a no such element exception key not found, seven. So it seems we're not done yet. What happened here. Well, if you look at where the t wasn't found, it was in the encode function itself. So the map that we. Apply here is wordsForNum.. So, obviously, we have a key where that is not contained in the words for num map. That was, the key was the string seven. And that becomes clear if you look at the mnemonics map. So seven maps to P, Q, R, S and indeed, there's no single letter word that is consists of any of these four letters. So, how can we fix that? Well, You have seen in a previous session that one way to deal with these keynote found errors is to make your maps total, and we can do that, for wordsForNum,, we can make the map total by adding a default, so we'll say, with default value. And what should be the default value B? Well, we have a map to a sequence of all possible solutions. So, I guess the empty sequence is a good default value. So, essentially, if a key is not in the map, then we say, well, it doesn't have a word that is associated with that digit string. So, if you do that, then, indeed, we do get a result and we get a number of phrases. The one we are after is, this one here. Scala is fun is one of the possible solutions. So, that's what we wanted here. You can play with other possible numbers and see what they give. Good. So, one more thing to do., We got the solutions as a set of lists. But what I really would like to do is present them as complete phrases. So, what I want to do is I want to, write a method, Translate that gets a number and returns a set of, not list of strings, but strings where the strings should be the whole phrase. So how do we do that? Well, we start with encode number and then for each solution, which is a list of strings, we have to present the solution as a phrase. So for each means we have a map and the operation we want to apply is a mkString operation, where we separate the words in the list with spaces. So let's see what that gives. If you now do translate of the digit string here. Then, indeed, we get all the solutions as phrases, and one of them was the one we're after, So that's it. I hope you have gotten the impression how much fun it is to write programs in interactive way, using immutable data structures and functions only. In fact, I think that the immutable collections that you'll see in Scala and a couple of other languages are, something that is, will play an increasingly important role for the future of programming in general. They're just too good not to happen.. They're very easy to use, Because, generally, a few steps are sufficient to do the job. They're very concise, typically, one word replaces a whole loop. That's safe. So the type checker you've seen now, is really good at catching the errors. Typically, errors we had when we when we wrote the program where keynote found errors, That's true that's one thing type checker doesn't check, But a lot of the other things annoying errors already caught during compile time. You don't need to test your program to find them. The immutable collections are also pretty fast; because the collection operations are tuned to the library and it turns out they can also be parallelized. And then finally they're universal. That's one vocabulary that works on all kinds of collections from lists, to vectors, to ranges, to sequences, to sets, to maps. But also to further kind of collections like parallel collections or even vestibule collections that you would typically attack with map [inaudible]. I believe that makes it a really attractive tool for software development.