12.2 - Optimal substructure - Optimal binary search trees - [dsa 2] - By tim roughgarden

So now that we have motivated and formally defined the optimal binary search tree problem, lets think about how to solve it. After settling on dynamic programming as the paradigm we are going to try to use we're going to proceed in the usual way, turning to the optimal solution for clues, asking in what way is it composed of optimal solutions to smaller sub-problems. So let me just remind you of the formal problem statement. There's n objects we got to store in a search tree, and let's just name them in the order of their keys, and let's name them one, two, three, all the way up to n, for simplicity, and we're also given. frequencies or weights reflecting how often the different objects are searched for. So that's p1 up to pn positive numbers. Canonically we think of these summing to one, being probabilities, but actually won't use that fact so in general they're just arbitrary positive numbers. The goal is to output a search tree. It should satisfy the search tree property, it should contain all of these objects one through n, and amongst all such search trees it should minimize the weighted search time. So the sum over all of the items I of the probability of I times the search time for I, namely its depth in the tree plus one. Now in case you're feeling cocky about the fact that the greedy algorithm works to solve the seemingly similar optimal prefix-free binary code problem in the form of Huffman's algorithm, I want to spend a little time pointing out that greedy algorithms are not sufficient, are not correct, to solve the optimal search tree problem. So if we were to design a greedy algorithm what kind of intuition would motivate a particular greedy rule. Well, staying at the objective function it's very clear that we want the objects that high frequency of access to be at or near the root and we want items of low frequency access to be in the bottom levels of the tree, like the leaves. So what are some ways we can compile this intuition into a Greedy algorithm? Well one, perhaps motivated by the success of Huffman's algorithm, is we could think about a bottom up approach. Now I'm not going to define what I mean here precisely, but informally we want to start with the bottom most levels, with the leaves and the nodes you want to put there are the objects that are accessed the least frequently. Any reasonable way of implementing this kind of body of greedy rule is not going to work. Let me show a simple counter example. So, let's just assume we have four objects, one, two, three, four. What I'm showing here on the right in pink is two possible search trees valid for those four keys. And let's assume that, the frequencies are as followed. Object one is searched for two% of the time. Object two for 23% of the time. Object three, the bulk of the time 73%. An object for only one% of the time. Any greedy algorithm which insists on putting the lowest frequency objects in the very bottommost level of the tree is not going to produce this tree on the right, which has the two% object below, at a lower level than the !% object. Instead, such a greedy algorithm would presumably output a searchtree like the one on the left, which has the two as the root, and then the object four, at the lowest level, at depth two. But you should be able to easily convince yourself that, for these probabilities, it's the tree on the right which is the one you want, that's optimal, because the object three is the one that's searched for the bulk of the time, that's the one you want at the root as opposed to the object two. So I realize I'm being a little informal here but I hope you get the idea that a naive bottom-up implementation of a greedy algorithm, which if you think about it is really what we did in Huffman's algorithm, is not going to work here. The same can be said about a top-down approach. Perhaps the simplest top-down approach would be just to take the most frequently searched for object and put that at the root and then recursively develop an appropriate left and right sub-trees under that most frequently accessed element. So let me show you again and, formally, just the same kind of counter-example. We're going to use the exact same four objects, the exact same two trees. I, I will however, change the numbers. Now, let's imagine that object one is searched for almost never, just one percent of the time and each of the other three objects are searched for roughly a third of the time each. But, let me sort of break ties, so that the object number two is the most frequently one. Searched for 34%. So that, in that case the Greedy algorithm will put the 34% node up in the roots when really what should happen is you want a perfectly balanced sub-tree for the objects two, three and four because each accounts for roughly a third of the searches. So let's give object three 33% of the searches and object four 32% of the searches. And again I'll leave it for you to convince yourself that this is indeed a counter example the tree's spit out by the greedy algorithm on the left, we have an average search time of roughly two, where as the search tree on the right we have an average search time of roughly five thirds. So we'd like to produce the tree on the right but the greedy algorithm proposed here will spit out the tree on the left. This of course doesn't exhaust the list of potential greedy algorithms. You could try others but it's not going to work. There's no known greedy algorithm that successfully solves the optimal binary search tree problem. So in particular if we focus on the top down approach. The choice of the route. The choice of what to do at the uppermost level. Has very hard to predict repercussions, for what the two different sub-problems look like. So this is what stymie is, not only the top down gritty approach, but also a naive divide and conquer approach. So for example if we just wanted to split the keys into the first half, and the second half, recursively compute and optimal B is I need on each of those two half's, and then put them back together. The search tree property would say that we have to, unite those two sub-solutions under a root which is the median, in between the two sub-problems. And who is to say that the median is a good choice for the root. Again, because the ramifications further down the tree, maybe that's a stupid root. But, boy, is it tempting to try to solve this problem recursively, right? We're trying to output this binary tree. It has recursive structure. If only we knew which root we should pick. We would love to recurse twice. Once to construct an optimal left subtree. Once to construct an optimal right subtree. Okay, so if only we knew the right route. this is starting to sound familiar, actually. How did it work in all our dynamic programming solutions? We always said, oh. If only a little birdie told us this one little piece of the solution. Well, then, you know? Then we could, sort of, look up or recursively compute the rest of the solution. And extend it back to one for the original problem, easily. So maybe the same thing's true here. Maybe, if only a little birdie told us what the root was. Then we could look up or recursively compute optimal solutions to smaller subproblems. And paste everything back together, and be done. That would be great. So as usual we want to make this precise with an optimal substructure lemma. We want to understand the way in which an optimal solution to an optimal BST problem must necessarily be constructed from optimal solutions to smaller sub-problems. So in the following quiz I'm going to ask you to guess what the appropriate optimal substructure lemma is and then after that quiz once we've identified the right statement at that point I will show you the proof. Okay, so the answer I'm looking for is the fourth one, is D. Which is the strongest statement of all. So the first point is that each of the trees T1 and T2, now as subtrees of a binary search tree these of course are themselves binary search trees, valid for the trees that they contain. And not only can they be viewed as search trees on the keys they contain, but the claim which we'll prove on the next couple slides is that they are indeed optimal. They minimize the weighted search time over all possible search trees for the objects contained in those two trees. So that gets us down, it rules out A, it rules out B. We can say something stronger than that, but we can even say something stronger than C, that each of the two trees is optimal for the items that they contain. We actually know exactly which items are in T1 and which items are in T2. And this is by the search tree property. Search tree property said that every node and in particular here at the roots everything to the left of the root is less than that of everything to the right of the node is bigger than it. So the root being R by assumption we know the objects one through R minus one, but they got to be somewhere. The only way they can be is in the left sub-tree, t1. So that's exactly the contents of T1. Similarly, the contents of t2 are precisely the objects r+1 through n. So the two sub-trees are optimal. And we know exactly which keys they are, it's everything less than r on the left and everything bigger than r on the right. Okay, so here's where things stand at the end of this quiz. We've identified a statement that we're really hoping is true. We're really hoping that an optimal BST, binary search tree, must necessarily be composed in this way of optimal binary search trees for the key sets to the left of the root and the right of the root. If that's true, hopefully with the experience we now have, we can sort of envision what a dynamic programming algorithm might look like. I'll just fill in the details in the next video. If this weren't true, if we didn't have this optimal substructure, honestly, I have no idea how we'd get started. It's really not clear what an algorithm would look like if this weren't true. So the next couple slides, I'm going to prove this to you. The format, you know, will not be radically different than the ones we've already seen. I don't think there'll be any big surprises, but it's so important, this really is the whole reason why the algorithm is going to work, I'm still going to give you a fool proof of this optimal substructure lemma.