13.1 - Balanced search trees applications - Binary search trees - [dsa 1] by tim roughgarden

In this sequence of videos, we'll discuss our last but not least data structure namely the Balanced Binary Search Tree. Like our discussion of other data structures we'll begin with the what. That is we'll take the client's perspective and we'll ask what operations are supported by this data structure, what can you actually use it for? Then we'll move on to the how and the why. We'll peer under the hood of the data structure and look at how it's actually implemented and then understanding the implementation to understand why the operations have the running times that they do. So what is a Balanced Binary Search Tree good for? Well, I recommend thinking about it as a dynamic version of a sorted array. That is, if you have data store in a Balanced Binary Search Tree, you can do pretty much anything on the data that you could if it was just the static sorted array. But in addition, the data structure can accommodate insertions and deletions. You can accommodate a dynamic set of data that you're storing overtime. So to motivate the operations that a Balanced Binary Search Tree supports, let's just start with the sorted array and look at some of the things you can easily do with data that happens to be stored in such a way. So let's think about an array that has numerical data although, generally as we've said, in data structures is usually associated other data that's what you actually care about and the numbers are just some unique identifier for each of the records. So these might be an employee ID number, social security numbers, packet ID numbers and network contacts, etcetera. So what are some things that are easy to do given that your data is stored as a sorted array, most a bunch of things? First of all, you can search and recall that searching in a sorted array is generally done using binary search so this is how we used to look up phone numbers when we have physical phone books. You'd start in the middle of the phone book, if the name you were looking for was less than the midpoint, you recurse on the left hand side, otherwise you'd recurse on the right hand side. As we discussed back in the Master Method Lectures long ago, this is going to run in logarithmic time. Roughly speaking, every time you recurse, you've thrown out half of the array so you're guaranteed to terminate within a logarithmic number of iterations so binary search is logarithmic search time. Something else we discussed in previous lectures is the selection problem. So previously, we discussed this in much harder context of unsorted arrays. Remember, the selection problem in addition to array you're given in order statistic. So, if your order statistic that your target is seventeen, that means you're looking for the seventeenth smallest number that's stored in the array. So in previous lectures, we worked very hard to get a linear time algorithm for this problem in unsorted arrays. Now, in a sorted array, you want to know the seventeenth smallest element in the array. Pretty easy problem, just return whatever element happens to be in the seventeenth position of the array since the array is sorted, that's where it is so no problem. It's already sorted constant time, you can solve the selection problem. Of course, two special cases of the selection problem are finding the minimum element of the array. That's just if the order statistic problem with i = 1and the maximum element, that's just i = n. So this just corresponds to returning the element that's in the first position and the last position of the array respectively. Well let's do some more brainstorming. What other operations could we implement on a sorted array? Well here's a couple more. So there are operations called the Predecessor and Successor operations. And so the way these work is, you start with one element. So, say you start with a pointer to the 23, and you want to know where in this array is the next smallest element. That's the predecessor query and the successor operation returns the next largest element in the array. So the predecessor of the 23 is the seventeen, the successor of the 23 would be the 30. And again in a sorted array, these are trivial, right? You just know that predecessors just one position back in the array, the successor is one position forward. So given a pointer to the 23, you can return to 17 or the 30 in constant time. What else? Well, how about the rank operation? So we haven't discussed this operation in the past. So what rank is, this has for how many key stored in the data structure are less than or equal to a given key. So for example, the rank of 23 would be equal to 6. Because 6 of the 8 elements in the array are less than or equal to 23. And if you think about it, implementing the rank operation is really no harder than implementing search. All you do is search for the given key and wherever it is search terminates in the array. You just look at the position in the array and boom, that's the rank of that element. So for example, if you do a binary search for 23 and then when you terminates, you discover it is, they're in position number six then you know the rank is six. If you do an unsuccessful search, say you search for 21, well then you get stuck in between the 17 and the 23, and at that point you can conclude that the rank of 21 in this array is five. Let me just wrap up the list with the final operation which is trivial to implement in the sorted array. Namely, you can output or print say the stored keys in sorted order let's say from smallest to largest. And naturally, all you do here is a single scan from left to right through the array, outputting whatever element you see next. The time required is constant per element or linear overall. So that's a quite impressive list of supported operations. Could you really be so greedy as to want still more from our data structure? Well yeah, certainly. We definitely want more than just what we have on the slide. The reason being, these are operations that operate on a static data set which is not changing overtime. But the world in general is dynamic. For example, if you are running a company and keeping track of the employees, sometimes you get new employees, sometimes employees leave. That is one of the data structure that not only supports these kinds of operations but also, insertions and deletions. Now of course it's not that it's impossible to implement insert or delete in a sorted array, it's just that they're going to run way too slow. In general, you have to copy over a linear amount of stuff on an insertion or deletion if you want to maintain the sorted array property. So this linear time performance when insertion and deletion is unacceptable unless you barely ever do those operations. So, the raison d'etre of the Balanced Binary Search Tree is to implement this exact same set of operations just as rich as that's supported by a sorted array but in addition, insertions and deletions. Now, a few of these operations won't be quite as fast or we have to give up a little bit instead of constant time, the one in logarithmic time and we still got logarithmic time for all of these operations, linear time for outputting the elements in sort of order plus, we'll be able to insert and delete in logarithmic time so let me just spell that out in a little more detail. So, a Balanced Binary Search Tree will act like a sorted array plus, it will have fast, meaning logarithmic time inserts and deletes. So let's go ahead and spell out all of those operations. So search is going to run in O(log n) time, just like before. Select runs in constant time in a sorted array and here it's going to take logarithmic, so we'll give up a little bit on the selection problem but we'll still be able to do it quite quickly. Even on the special cases of finding the minimum or finding the maximum in our, in our data structure, we're going to need logarithmic time in general. Same thing for finding predecessors and successors they're not, they're no longer constant time, they go with logarithmic. Rank took as logarithmic time and the, even the sorted array version and that will remain logarithmic here. As we'll see, we lose essentially nothing over the sorted array, if we want to output the key values in sorted order say from smallest to largest. And crucially, we have two more fast operations compared to the sorted array of data structure. We can insert stuff so if you hire a new employee, you can insert them into your data structure. If an employee decides to leave, you can remove them from the data structure. You do not have to spend linear time like you did for sort of array, you only have to spend the logarithmic time whereas always n is the number of keys being stored in the data structure. So the key takeaway here is that, if you have data and it has keys which come from a totally ordered set like, say numeric keys, then a Balanced Binary Search Tree supports a very rich collection of operations. So if you anticipate doing a lot of different processing using the ordering information of all of these keys, then you really might want to consider a Balanced Binary Search Tree to maintain them. Well then, keep in mind though is that we have seen a couple of other data structures which don't do quite as much as balanced binary search trees but what they do, they do better. We already, we just discussed in the last slide of the sorted array. So, if you have a static data set, you don't need inserts and deletes. Well then by all means, don't bother with Balanced Binary Search Tree that use a sorted array because it will do everything super fast. But, we also sought through dynamic data structures which don't do as much but do it, but what they do, they do very well. So, we saw a heap, so what the heap is good for is it's just as dynamic as a search tree. It allows insertions and deletions both in logarithmic time. And in addition, it keeps track of the minimum element or the maximum element. Remember in a heap, you can choose whether you want to keep track of the minimum or keep track of the maximum but unlike in a search tree, a heap does not simultaneously keep track of the minimum and the maximum. So if you just need those three operations, insertions, deletions and remembering the smallest, and this would be the case for example in a priority queue or scheduling application as discussed in the heap videos. Then, a Binary Search Tree is over kill. You might want to consider a heap instead. In fact, the benefits of a heap don't show up in the big O notation here both have logarithmic operation time but the constant factors both in space and time are going to be faster with a heap then with a Balanced Binary Search Tree. The other dynamic data structure that we discussed is a hash table. And what hash tables are really, really good at is handling insertions and searches, that is look ups. Some, sometimes, depending on the implementation also handle deletions really well also. So, if you don't actually need to remember things like minima, maxima or remember ordering information on the keys, you just have to remember what's there and what's not. Then the data structure of choice is definitely the hash table, not the balance binary search tree. Again, the Balance Binary Search Tree would be fine and we'd give you logarithmic look up time but it's kind of over kill for the problem. All you need is fast look ups. A hash table recall will give you constant time look ups. So that will be a noticeable win over the Balanced Binary Search Tree. But if you want a very rich set of operations for processing your data. Then, the Balanced Binary Search Tree could be the optimal data structure for your needs.