Tip:
Highlight text to annotate it
X
So now that we understand the structure of
optimal solutions for this optimal binary search tree problem.
That is, we understand how an optimal solution must be one of a relatively
small number of candidates. Let's compile that understanding into a
polynomial time dynamic programming algorithm.
Let me quickly remind you of the Optimal Substructure Lemma that we proved in the
previous video. Suppose we have an optimal binary search
tree for a given set of keys, one through N, with given probabilities.
And suppose this binary search tree has the root R.
Well then it has two sub-trees, t1 and t2.
By the search tree property, we know exactly the population of each of those
two sub-trees. T1 has to contain the keys one through r
- 1. As usual we're sorted, we're assuming
these are in sorted order. Whereas the right sub-tree t2, has to
contain exactly the keys r + 1 through n. Moreover, t1 and t2 are, in their own
right, valid search trees for these two sets of keys.
And finally, and this is what we proved in the last video they're optimal for
their respective sub-problems. T1 is optimal for keys one through r
minus one and the corresponding weights or.
Abilities and T2 is optimal for R plus one through N and their corresponding
frequencies. So let's now execute our dynamic
programming recipe. So now that we understand the way in
which an optimal solution must necessarily be composed in a simple way
from solutions to smaller subproblems. Let's take a step back, and ask, well.
Given that, at the end of the day, we care about the optimal solution to the
original problem. Which subproblems are relevant?
Which subproblems are we going to be forced to solve?
For example, with independent sets in line graphs we observed that to solve a
subproblem we needed to know the answers to the subproblems where we pluck either
one or two vertices off of the right hand side.
So overall what we cared about was subproblems corresponding to prefixes of
the graph. In the knapsack problem we needed to
understand subproblems that involved one less item and possibly a resus, reduced
residual knapsack capacity, so that led to us caring about solutions to
subproblems corresponding to all prefixes of the items and all integer
possibilities for the residual capacity of a knapsack.
In sequence alignment, when we looked at subproblems.
As we were plucking a character off of one or possibly both of the strings.
So we cared about subproblems corresponding to prefixes of each of the
two strings. Now, here's one of the things that's
interesting about the binary search tree problem which we haven't seen before.
Is that, when we look at a subproblem. In the optimal structure lemma, there's
two that we might consider. We don't just pluck off from the right.
We care about both the subproblem induced by the left subtree.
And that induced by the right subtree. In the first case, we're looking at a
prefix of the items we started with. And that's like we've seen in our many
examples. But in the second case, the sub problem
corresponding to t sub two. That's actually a suffix of the items
that we started with. So put differently, the sub-problems we
care about are those that can be obtained by either throwing away a prefix from the
items that we started with or throwing away a suffix from the items that we
started with. So in light of this observation, that the
value of an optimal solution depends only immediately on sub problems that you
obtain by throwing out a prefix with a, or a suffix of the items, what I want you
to think about on this quiz is, what is the entire set of relevant sub problems?
That is, for which subsets s of the original items one through n is it
important that we compute the value of an optimal binary search tree on the items
only in s? So before I explain the correct answer
which is the third one, let me talk a little bit about a very natural but
incorrect answer, namely the second one. Indeed, the second answer seems to have
the best correspondence to the optimal substructure lemma.
The optimal substructure lemma states that the optimal solution must be
composed of an optimal solution on some prefix and an optimal solution on some
suffix, united under a common root r. So we definitely care about the solutions
to all prefixes and suffixes of the items but we care about more than just that.
So maybe the easiest way to see that is to think about the recursive application
of the optimal substructure lemma. And again relevant subproblems at the end
of the day are going to correspond to all of the different distinct subproblems
that ever get solved over the entire trajectory of this recursive
implementation. So, I mean just think about one sort of
example path in the recursion tree, right?
So in the outermost level recursion you've got the whole item set, let's say
there's 100 items one through 100, you're going through and trying all
possibilities of the root. So at some point you're trying out root
number 23 to see how it does. You have to recurse once on items one
through 22 to optimally build a search tree for them, and similarly for items 24
through 100. Now let's sort of drill down into this
first recursive call where you recurse on item just one through 22.
Now here again, you're going to be trying all possibilities of the route, those 22
choices. At some point you'll be trying route
number seventeen. There's again going to be two recursive
calls. And the second recursive call is going to
be on items eighteen through 22. It's going to be the items that were
passed through this recursive call. A prefix of the original items.
And then the second recursive call here, locally is going to be on some suffix of
that prefix. So in this case, the items eighteen
through 22. A suffix of the original prefix, one
through 22. So, in general, as you think through this
recursion multiple levels. At every step, what you've got going for
you is, you're either deleting a chunk of items from the beginning, a prefix.
Or you're deleting a chunk of items from the end.
But you might be interleaving these two operations.
So it is not true that you're always going to have a prefix of a suffix of the
original set of items. But.
What is true is that you will have some contiguous set of items.
It's going to be. If you, if you have I as your smallest
item in the subproblem and J is the biggest, you're going to have all of the
subproblems in between. And that's because you were only plucking
off items from the left or from the right.
So that's why C is the correct answer. You need more subproblems than just
prefixes and suffixes. Alright, so that was a little tricky,
identifying the relevant sub problems but now that we've got them in our grubby
little hands the dynamic programming algorithm as usual is just going to fall
in to place, the relevant collection of sub problems unlocks the power in a very
mechanical way of its entire paradigm. So let's now just fill in all the details.
The first step is to formalize the recurrence.
That is, the way in which the optimal solution of a given subproblem depends on
the value of smaller subproblems. This is just going to be a mathematical
formula which encodes what we already proved in the optional substructure
lemma. And then we're going to use this formula
to populate a table in a dynamic programming algorithm to solve,
systematically, the values for all of the subproblems.
So let's have some notation to put in our recurrence, in our formula.
We're going to be indexing sub-problems with two indices I and J and this is
because we have two degrees of freedom where the continuous interval of item
starts I, and where the continuous interval of items ends, J.
So for a given choice I and J, where of course I should be the most J.
I'm going to denote by capital C sub IJ, the weighted search cost of an optimal
binary search tree just in the contiguous set of in, items from I to J.
And of course, the weights or the probabilities are exactly the same as in
the original problem they're just inherited here, PI through PJ.
So now let's state the recurrence. So, for a given sub problem cij, we're
going to express the value of an optimal binary search tree in terms of those of
smaller sub problems. The optimal sub structure lemma tells us
how to do this. The optimal substructure lemma says, that
if we knew the roots, if we know the choice of the root r which here is going
to be somewhere between the items I and j, then in that case, the optimal
solution has to be composed of optimal solutions to the two smaller sub-problems
united under the root. Now we don't know what the root is.
There's a j - I + one possibilities. It could be anything between I and j
inclusive. So as usual, we're just going to do brute
force search over the relatively small set of candidates that we've identified.
So brute force search we encode by just explicitly having a minimum.
So I choose some route R somewhere between I and J inclusive.
And given a choice of R we're going to inherit the weighted search cost of the
optimal solution on just the prefix of items I through R minus one.
So on our notation that would be C of I. R minus one.
Similarly we pick up the weighted search cost of an optimal solution to the suffix
of items R plus one through J. And if you go back to our proof of the
optimal substructure lemma you'll see we did a calculation which gives us a
formula for what, how the weighted search cost of a tree depends on that of its
subtrees. And in addition to the weighted search
cost contributed by each of the two search trees we pick up a constant,
namely the sum of all of the probabilities in the items we're working
with. So here that's sum of.
P sub K, where K ranges from the first item in the sub problem I to the last
item in the sub problem J. So one extra edge case we should deal
with is if we choose the root to be the first item, then the first recursive term
doesn't make sense, then we'll have C, I, I minus one, which is not defined.
Similarly, if we choose the root to be J, then this last term would be C of J plus
1J which is not defined. Remember indices are supposed to be in
order. So in that case, we'll just interpret
these capital C's as zero. And so why is the recurrence correct?
Well all of the heavy lifting was done and our proof of the optimal substructure
lemma. What did we prove there?
We proved the optimal solution has to be one of just J minus I plus one possible
things. It depends only on the choice of the
root. Given the root, the rest is determined
for us. The recurrency is by definition doing
brute force search through the only set of candidates.
So therefore, it is indeed a correct formula for the optimal solution value,
in terms of optimal solutions to smaller sub problems.