12.4 - A dynamic programming algorithm 1 - Optimal binary search trees - By tim roughgarden

So now that we understand the structure of optimal solutions for this optimal binary search tree problem. That is, we understand how an optimal solution must be one of a relatively small number of candidates. Let's compile that understanding into a polynomial time dynamic programming algorithm. Let me quickly remind you of the Optimal Substructure Lemma that we proved in the previous video. Suppose we have an optimal binary search tree for a given set of keys, one through N, with given probabilities. And suppose this binary search tree has the root R. Well then it has two sub-trees, t1 and t2. By the search tree property, we know exactly the population of each of those two sub-trees. T1 has to contain the keys one through r - 1. As usual we're sorted, we're assuming these are in sorted order. Whereas the right sub-tree t2, has to contain exactly the keys r + 1 through n. Moreover, t1 and t2 are, in their own right, valid search trees for these two sets of keys. And finally, and this is what we proved in the last video they're optimal for their respective sub-problems. T1 is optimal for keys one through r minus one and the corresponding weights or. Abilities and T2 is optimal for R plus one through N and their corresponding frequencies. So let's now execute our dynamic programming recipe. So now that we understand the way in which an optimal solution must necessarily be composed in a simple way from solutions to smaller subproblems. Let's take a step back, and ask, well. Given that, at the end of the day, we care about the optimal solution to the original problem. Which subproblems are relevant? Which subproblems are we going to be forced to solve? For example, with independent sets in line graphs we observed that to solve a subproblem we needed to know the answers to the subproblems where we pluck either one or two vertices off of the right hand side. So overall what we cared about was subproblems corresponding to prefixes of the graph. In the knapsack problem we needed to understand subproblems that involved one less item and possibly a resus, reduced residual knapsack capacity, so that led to us caring about solutions to subproblems corresponding to all prefixes of the items and all integer possibilities for the residual capacity of a knapsack. In sequence alignment, when we looked at subproblems. As we were plucking a character off of one or possibly both of the strings. So we cared about subproblems corresponding to prefixes of each of the two strings. Now, here's one of the things that's interesting about the binary search tree problem which we haven't seen before. Is that, when we look at a subproblem. In the optimal structure lemma, there's two that we might consider. We don't just pluck off from the right. We care about both the subproblem induced by the left subtree. And that induced by the right subtree. In the first case, we're looking at a prefix of the items we started with. And that's like we've seen in our many examples. But in the second case, the sub problem corresponding to t sub two. That's actually a suffix of the items that we started with. So put differently, the sub-problems we care about are those that can be obtained by either throwing away a prefix from the items that we started with or throwing away a suffix from the items that we started with. So in light of this observation, that the value of an optimal solution depends only immediately on sub problems that you obtain by throwing out a prefix with a, or a suffix of the items, what I want you to think about on this quiz is, what is the entire set of relevant sub problems? That is, for which subsets s of the original items one through n is it important that we compute the value of an optimal binary search tree on the items only in s? So before I explain the correct answer which is the third one, let me talk a little bit about a very natural but incorrect answer, namely the second one. Indeed, the second answer seems to have the best correspondence to the optimal substructure lemma. The optimal substructure lemma states that the optimal solution must be composed of an optimal solution on some prefix and an optimal solution on some suffix, united under a common root r. So we definitely care about the solutions to all prefixes and suffixes of the items but we care about more than just that. So maybe the easiest way to see that is to think about the recursive application of the optimal substructure lemma. And again relevant subproblems at the end of the day are going to correspond to all of the different distinct subproblems that ever get solved over the entire trajectory of this recursive implementation. So, I mean just think about one sort of example path in the recursion tree, right? So in the outermost level recursion you've got the whole item set, let's say there's 100 items one through 100, you're going through and trying all possibilities of the root. So at some point you're trying out root number 23 to see how it does. You have to recurse once on items one through 22 to optimally build a search tree for them, and similarly for items 24 through 100. Now let's sort of drill down into this first recursive call where you recurse on item just one through 22. Now here again, you're going to be trying all possibilities of the route, those 22 choices. At some point you'll be trying route number seventeen. There's again going to be two recursive calls. And the second recursive call is going to be on items eighteen through 22. It's going to be the items that were passed through this recursive call. A prefix of the original items. And then the second recursive call here, locally is going to be on some suffix of that prefix. So in this case, the items eighteen through 22. A suffix of the original prefix, one through 22. So, in general, as you think through this recursion multiple levels. At every step, what you've got going for you is, you're either deleting a chunk of items from the beginning, a prefix. Or you're deleting a chunk of items from the end. But you might be interleaving these two operations. So it is not true that you're always going to have a prefix of a suffix of the original set of items. But. What is true is that you will have some contiguous set of items. It's going to be. If you, if you have I as your smallest item in the subproblem and J is the biggest, you're going to have all of the subproblems in between. And that's because you were only plucking off items from the left or from the right. So that's why C is the correct answer. You need more subproblems than just prefixes and suffixes. Alright, so that was a little tricky, identifying the relevant sub problems but now that we've got them in our grubby little hands the dynamic programming algorithm as usual is just going to fall in to place, the relevant collection of sub problems unlocks the power in a very mechanical way of its entire paradigm. So let's now just fill in all the details. The first step is to formalize the recurrence. That is, the way in which the optimal solution of a given subproblem depends on the value of smaller subproblems. This is just going to be a mathematical formula which encodes what we already proved in the optional substructure lemma. And then we're going to use this formula to populate a table in a dynamic programming algorithm to solve, systematically, the values for all of the subproblems. So let's have some notation to put in our recurrence, in our formula. We're going to be indexing sub-problems with two indices I and J and this is because we have two degrees of freedom where the continuous interval of item starts I, and where the continuous interval of items ends, J. So for a given choice I and J, where of course I should be the most J. I'm going to denote by capital C sub IJ, the weighted search cost of an optimal binary search tree just in the contiguous set of in, items from I to J. And of course, the weights or the probabilities are exactly the same as in the original problem they're just inherited here, PI through PJ. So now let's state the recurrence. So, for a given sub problem cij, we're going to express the value of an optimal binary search tree in terms of those of smaller sub problems. The optimal sub structure lemma tells us how to do this. The optimal substructure lemma says, that if we knew the roots, if we know the choice of the root r which here is going to be somewhere between the items I and j, then in that case, the optimal solution has to be composed of optimal solutions to the two smaller sub-problems united under the root. Now we don't know what the root is. There's a j - I + one possibilities. It could be anything between I and j inclusive. So as usual, we're just going to do brute force search over the relatively small set of candidates that we've identified. So brute force search we encode by just explicitly having a minimum. So I choose some route R somewhere between I and J inclusive. And given a choice of R we're going to inherit the weighted search cost of the optimal solution on just the prefix of items I through R minus one. So on our notation that would be C of I. R minus one. Similarly we pick up the weighted search cost of an optimal solution to the suffix of items R plus one through J. And if you go back to our proof of the optimal substructure lemma you'll see we did a calculation which gives us a formula for what, how the weighted search cost of a tree depends on that of its subtrees. And in addition to the weighted search cost contributed by each of the two search trees we pick up a constant, namely the sum of all of the probabilities in the items we're working with. So here that's sum of. P sub K, where K ranges from the first item in the sub problem I to the last item in the sub problem J. So one extra edge case we should deal with is if we choose the root to be the first item, then the first recursive term doesn't make sense, then we'll have C, I, I minus one, which is not defined. Similarly, if we choose the root to be J, then this last term would be C of J plus 1J which is not defined. Remember indices are supposed to be in order. So in that case, we'll just interpret these capital C's as zero. And so why is the recurrence correct? Well all of the heavy lifting was done and our proof of the optimal substructure lemma. What did we prove there? We proved the optimal solution has to be one of just J minus I plus one possible things. It depends only on the choice of the root. Given the root, the rest is determined for us. The recurrency is by definition doing brute force search through the only set of candidates. So therefore, it is indeed a correct formula for the optimal solution value, in terms of optimal solutions to smaller sub problems.