Tip:
Highlight text to annotate it
X
So now that we have motivated and formally defined the optimal binary
search tree problem, lets think about how to solve it.
After settling on dynamic programming as the paradigm we are going to try to use
we're going to proceed in the usual way, turning to the optimal solution for
clues, asking in what way is it composed of optimal solutions to smaller
sub-problems. So let me just remind you of the formal
problem statement. There's n objects we got to store in a
search tree, and let's just name them in the order of their keys, and let's name
them one, two, three, all the way up to n, for simplicity, and we're also given.
frequencies or weights reflecting how often the different objects are searched
for. So that's p1 up to pn positive numbers.
Canonically we think of these summing to one, being probabilities, but actually
won't use that fact so in general they're just arbitrary positive numbers.
The goal is to output a search tree. It should satisfy the search tree
property, it should contain all of these objects one through n, and amongst all
such search trees it should minimize the weighted search time.
So the sum over all of the items I of the probability of I times the search time
for I, namely its depth in the tree plus one.
Now in case you're feeling cocky about the fact that the greedy algorithm works
to solve the seemingly similar optimal prefix-free binary code problem in the
form of Huffman's algorithm, I want to spend a little time pointing out that
greedy algorithms are not sufficient, are not correct, to solve the optimal search
tree problem. So if we were to design a greedy
algorithm what kind of intuition would motivate a particular greedy rule.
Well, staying at the objective function it's very clear that we want the objects
that high frequency of access to be at or near the root and we want items of low
frequency access to be in the bottom levels of the tree, like the leaves.
So what are some ways we can compile this intuition into a Greedy algorithm?
Well one, perhaps motivated by the success of Huffman's algorithm, is we
could think about a bottom up approach. Now I'm not going to define what I mean
here precisely, but informally we want to start with the bottom most levels, with
the leaves and the nodes you want to put there are the objects that are accessed
the least frequently. Any reasonable way of implementing this
kind of body of greedy rule is not going to work.
Let me show a simple counter example. So, let's just assume we have four
objects, one, two, three, four. What I'm showing here on the right in
pink is two possible search trees valid for those four keys.
And let's assume that, the frequencies are as followed.
Object one is searched for two% of the time.
Object two for 23% of the time. Object three, the bulk of the time 73%.
An object for only one% of the time. Any greedy algorithm which insists on
putting the lowest frequency objects in the very bottommost level of the tree is
not going to produce this tree on the right, which has the two% object below,
at a lower level than the !% object. Instead, such a greedy algorithm would
presumably output a searchtree like the one on the left, which has the two as the
root, and then the object four, at the lowest level, at depth two.
But you should be able to easily convince yourself that, for these probabilities,
it's the tree on the right which is the one you want, that's optimal, because the
object three is the one that's searched for the bulk of the time, that's the one
you want at the root as opposed to the object two.
So I realize I'm being a little informal here but I hope you get the idea that a
naive bottom-up implementation of a greedy algorithm, which if you think
about it is really what we did in Huffman's algorithm, is not going to work
here. The same can be said about a top-down
approach. Perhaps the simplest top-down approach
would be just to take the most frequently searched for object and put that at the
root and then recursively develop an appropriate left and right sub-trees
under that most frequently accessed element.
So let me show you again and, formally, just the same kind of counter-example.
We're going to use the exact same four objects, the exact same two trees.
I, I will however, change the numbers. Now, let's imagine that object one is
searched for almost never, just one percent of the time and each of the other
three objects are searched for roughly a third of the time each.
But, let me sort of break ties, so that the object number two is the most
frequently one. Searched for 34%.
So that, in that case the Greedy algorithm will put the 34% node up in the
roots when really what should happen is you want a perfectly balanced sub-tree
for the objects two, three and four because each accounts for roughly a third
of the searches. So let's give object three 33% of the
searches and object four 32% of the searches.
And again I'll leave it for you to convince yourself that this is indeed a
counter example the tree's spit out by the greedy algorithm on the left, we have
an average search time of roughly two, where as the search tree on the right we
have an average search time of roughly five thirds.
So we'd like to produce the tree on the right but the greedy algorithm proposed
here will spit out the tree on the left. This of course doesn't exhaust the list
of potential greedy algorithms. You could try others but it's not going
to work. There's no known greedy algorithm that
successfully solves the optimal binary search tree problem.
So in particular if we focus on the top down approach.
The choice of the route. The choice of what to do at the uppermost
level. Has very hard to predict repercussions,
for what the two different sub-problems look like.
So this is what stymie is, not only the top down gritty approach, but also a
naive divide and conquer approach. So for example if we just wanted to split
the keys into the first half, and the second half, recursively compute and
optimal B is I need on each of those two half's, and then put them back together.
The search tree property would say that we have to, unite those two sub-solutions
under a root which is the median, in between the two sub-problems.
And who is to say that the median is a good choice for the root.
Again, because the ramifications further down the tree, maybe that's a stupid
root. But, boy, is it tempting to try to solve
this problem recursively, right?
We're trying to output this binary tree. It has recursive structure.
If only we knew which root we should pick.
We would love to recurse twice. Once to construct an optimal left
subtree. Once to construct an optimal right
subtree. Okay,
so if only we knew the right route. this is starting to sound familiar,
actually. How did it work in all our dynamic
programming solutions? We always said, oh.
If only a little birdie told us this one little piece of the solution.
Well, then, you know? Then we could, sort of, look up or
recursively compute the rest of the solution.
And extend it back to one for the original problem, easily.
So maybe the same thing's true here. Maybe, if only a little birdie told us
what the root was. Then we could look up or recursively
compute optimal solutions to smaller subproblems.
And paste everything back together, and be done.
That would be great. So as usual we want to make this precise
with an optimal substructure lemma. We want to understand the way in which an
optimal solution to an optimal BST problem must necessarily be constructed
from optimal solutions to smaller sub-problems.
So in the following quiz I'm going to ask you to guess what the appropriate optimal
substructure lemma is and then after that quiz once we've identified the right
statement at that point I will show you the proof.
Okay, so the answer I'm looking for is the fourth one, is D.
Which is the strongest statement of all. So the first point is that each of the
trees T1 and T2, now as subtrees of a binary search tree these of course are
themselves binary search trees, valid for the trees that they contain.
And not only can they be viewed as search trees on the keys they contain, but the
claim which we'll prove on the next couple slides is that they are indeed
optimal. They minimize the weighted search time
over all possible search trees for the objects contained in those two trees.
So that gets us down, it rules out A, it rules out B.
We can say something stronger than that, but we can even say something stronger
than C, that each of the two trees is optimal for the items that they contain.
We actually know exactly which items are in T1 and which items are in T2.
And this is by the search tree property. Search tree property said that every node
and in particular here at the roots everything to the left of the root is
less than that of everything to the right of the node is bigger than it.
So the root being R by assumption we know the objects one through R minus one, but
they got to be somewhere. The only way they can be is in the left
sub-tree, t1. So that's exactly the contents of T1.
Similarly, the contents of t2 are precisely the objects r+1 through n.
So the two sub-trees are optimal. And we know exactly which keys they are,
it's everything less than r on the left and everything bigger than r on the
right. Okay,
so here's where things stand at the end of this quiz.
We've identified a statement that we're really hoping is true.
We're really hoping that an optimal BST, binary search tree, must necessarily be
composed in this way of optimal binary search trees for the key sets to the left
of the root and the right of the root. If that's true, hopefully with the
experience we now have, we can sort of envision what a dynamic programming
algorithm might look like. I'll just fill in the details in the next
video. If this weren't true, if we didn't have
this optimal substructure, honestly, I have no idea how we'd get started.
It's really not clear what an algorithm would look like if this weren't true.
So the next couple slides, I'm going to prove this to you.
The format, you know, will not be radically different than the ones we've
already seen. I don't think there'll be any big
surprises, but it's so important, this really is the whole reason why the
algorithm is going to work, I'm still going to give you a fool proof of this
optimal substructure lemma.