Tip:
Highlight text to annotate it
X
In this sequence of videos, we'll discuss our last but not least data structure
namely the Balanced Binary Search Tree. Like our discussion of other data
structures we'll begin with the what. That is we'll take the client's perspective and
we'll ask what operations are supported by this data structure, what can you actually
use it for? Then we'll move on to the how and the why. We'll peer under the hood of
the data structure and look at how it's actually implemented and then
understanding the implementation to understand why the operations have the
running times that they do. So what is a Balanced Binary Search Tree good for?
Well, I recommend thinking about it as a dynamic version of a sorted array. That
is, if you have data store in a Balanced Binary Search Tree, you can do pretty much
anything on the data that you could if it was just the static sorted array. But in
addition, the data structure can accommodate insertions and deletions. You
can accommodate a dynamic set of data that you're storing overtime. So to motivate
the operations that a Balanced Binary Search Tree supports, let's just start
with the sorted array and look at some of the things you can easily do with data
that happens to be stored in such a way. So let's think about an array that has
numerical data although, generally as we've said, in data structures is usually
associated other data that's what you actually care about and the numbers are
just some unique identifier for each of the records. So these might be an employee
ID number, social security numbers, packet ID numbers and network contacts, etcetera.
So what are some things that are easy to do given that your data is stored as a
sorted array, most a bunch of things? First of all, you can search and recall
that searching in a sorted array is generally done using binary search so this
is how we used to look up phone numbers when we have physical phone books. You'd
start in the middle of the phone book, if the name you were looking for was less
than the midpoint, you recurse on the left hand side, otherwise you'd recurse
on the right hand side. As we discussed back in the Master Method Lectures long ago,
this is going to run in logarithmic time. Roughly speaking, every time you recurse,
you've thrown out half of the array so you're guaranteed to terminate within a
logarithmic number of iterations so binary search is logarithmic search time.
Something else we discussed in previous lectures is the selection problem. So
previously, we discussed this in much harder context of unsorted arrays.
Remember, the selection problem in addition to array you're given in order
statistic. So, if your order statistic that your target is seventeen, that means
you're looking for the seventeenth smallest number that's stored in the
array. So in previous lectures, we worked very hard to get a linear time algorithm
for this problem in unsorted arrays. Now, in a sorted array, you want to know the
seventeenth smallest element in the array. Pretty easy problem, just return whatever
element happens to be in the seventeenth position of the array since the array is sorted,
that's where it is so no problem. It's already sorted constant time, you can
solve the selection problem. Of course, two special cases of the selection problem
are finding the minimum element of the array. That's just if the order statistic
problem with i = 1and the maximum element, that's just i = n. So this just
corresponds to returning the element that's in the first position and the last
position of the array respectively. Well let's do some more brainstorming. What
other operations could we implement on a sorted array? Well here's a couple more.
So there are operations called the Predecessor and Successor operations. And
so the way these work is, you start with one element. So, say you start with a
pointer to the 23, and you want to know where in this array is the next smallest
element. That's the predecessor query and the successor operation returns the next
largest element in the array. So the predecessor of the 23 is the seventeen,
the successor of the 23 would be the 30. And again in a sorted array, these are
trivial, right? You just know that predecessors just one position back in the
array, the successor is one position forward. So given a pointer to the 23, you
can return to 17 or the 30 in constant time. What else? Well, how about
the rank operation? So we haven't discussed this operation in the past. So
what rank is, this has for how many key stored in the data structure are less than
or equal to a given key. So for example, the rank of 23 would be equal to 6.
Because 6 of the 8 elements in the array are less than or equal to 23. And if you think about it,
implementing the rank operation is really no harder than implementing search. All
you do is search for the given key and wherever it is search terminates in the
array. You just look at the position in the array and boom, that's the rank of
that element. So for example, if you do a binary search for 23 and then when you
terminates, you discover it is, they're in position number six then you know the rank
is six. If you do an unsuccessful search, say you search for 21, well then you get
stuck in between the 17 and the 23, and at that point you can conclude that
the rank of 21 in this array is five. Let me just wrap up the list with the final
operation which is trivial to implement in the sorted array. Namely, you can output
or print say the stored keys in sorted order let's say from smallest to largest.
And naturally, all you do here is a single scan from left to right through the array,
outputting whatever element you see next. The time required is constant per element
or linear overall. So that's a quite impressive list of supported operations.
Could you really be so greedy as to want still more from our data structure? Well
yeah, certainly. We definitely want more than just what we have on the slide. The
reason being, these are operations that operate on a static data set which is not
changing overtime. But the world in general is dynamic. For example, if you
are running a company and keeping track of the employees, sometimes you get new
employees, sometimes employees leave. That is one of the data structure that
not only supports these kinds of operations but also, insertions and
deletions. Now of course it's not that it's impossible to implement insert or
delete in a sorted array, it's just that they're going to run way too slow. In
general, you have to copy over a linear amount of stuff on an insertion or
deletion if you want to maintain the sorted array property. So this linear time
performance when insertion and deletion is unacceptable unless you barely ever do
those operations. So, the raison d'etre of the Balanced Binary Search Tree
is to implement this exact same set of operations just as rich as that's
supported by a sorted array but in addition, insertions and deletions. Now, a
few of these operations won't be quite as fast or we have to give up a little bit
instead of constant time, the one in logarithmic time and we still got
logarithmic time for all of these operations, linear time for outputting the
elements in sort of order plus, we'll be able to insert and delete in logarithmic
time so let me just spell that out in a little more detail. So, a Balanced Binary
Search Tree will act like a sorted array plus, it will have fast, meaning
logarithmic time inserts and deletes. So let's go ahead and spell out all of those
operations. So search is going to run in O(log n) time, just like before. Select runs
in constant time in a sorted array and here it's going to take logarithmic, so
we'll give up a little bit on the selection problem but we'll still be able
to do it quite quickly. Even on the special cases of finding the minimum or
finding the maximum in our, in our data structure, we're going to need logarithmic
time in general. Same thing for finding predecessors and successors they're not,
they're no longer constant time, they go with logarithmic. Rank took as logarithmic
time and the, even the sorted array version and that will remain logarithmic
here. As we'll see, we lose essentially nothing over the sorted array, if we want
to output the key values in sorted order say from smallest to largest. And
crucially, we have two more fast operations compared to the sorted array of
data structure. We can insert stuff so if you hire a new employee, you can insert
them into your data structure. If an employee decides to leave, you can remove
them from the data structure. You do not have to spend linear time like you did for
sort of array, you only have to spend the logarithmic time whereas always n is the
number of keys being stored in the data structure. So the key takeaway here is
that, if you have data and it has keys which come from a totally ordered set
like, say numeric keys, then a Balanced Binary Search Tree supports a very rich
collection of operations. So if you anticipate doing a lot of different
processing using the ordering information of all of these keys, then you really
might want to consider a Balanced Binary Search Tree to maintain them. Well then,
keep in mind though is that we have seen a couple of other data structures which
don't do quite as much as balanced binary search trees but what they do, they do
better. We already, we just discussed in the last slide of the sorted array. So, if
you have a static data set, you don't need inserts and deletes. Well then by all
means, don't bother with Balanced Binary Search Tree that use a sorted array
because it will do everything super fast. But, we also sought through dynamic data
structures which don't do as much but do it, but what they do, they do very well.
So, we saw a heap, so what the heap is good for is it's just as dynamic as a
search tree. It allows insertions and deletions both in logarithmic time. And in
addition, it keeps track of the minimum element or the maximum element. Remember
in a heap, you can choose whether you want to keep track of the minimum or keep track
of the maximum but unlike in a search tree, a heap does not simultaneously keep
track of the minimum and the maximum. So if you just need those three operations,
insertions, deletions and remembering the smallest, and this would be the case for
example in a priority queue or scheduling application as discussed in the heap
videos. Then, a Binary Search Tree is over kill. You might want to consider a heap
instead. In fact, the benefits of a heap don't show up in the big O notation here
both have logarithmic operation time but the constant factors both in space and
time are going to be faster with a heap then with a Balanced Binary Search Tree.
The other dynamic data structure that we discussed is a hash table. And what hash
tables are really, really good at is handling insertions and searches, that is
look ups. Some, sometimes, depending on the implementation also handle deletions
really well also. So, if you don't actually need to remember things like
minima, maxima or remember ordering information on the keys, you just have to
remember what's there and what's not. Then the data structure of choice is definitely
the hash table, not the balance binary search tree. Again, the Balance Binary Search
Tree would be fine and we'd give you logarithmic look up time but it's kind of
over kill for the problem. All you need is fast look ups. A hash table recall will
give you constant time look ups. So that will be a noticeable win over the Balanced
Binary Search Tree. But if you want a very rich set of operations for processing your
data. Then, the Balanced Binary Search Tree could be the optimal data structure
for your needs.