Tip:
Highlight text to annotate it
X
The topic of this lecture is balanced search trees. And as we have been discussing that
in order to make search trees effective, we must try and get the worst case height to
be log n where n is the number of key values and nodes which are stored in the search tree.
And we mentioned among binary search trees, the AVL tree or the red black tree or the
types of data structures where the basic operation after every insertion or deletion is reorganization
or process of balancing.
Now search trees can be other types also, you just need not have 2, you can have may
be, you could have 3 children or you could have 4 children or may be you could have 10
children. Why you necessarily should have two children to do a search tree? Because
if you had got say 3 children then the height would be log 3 to the base n. See if you have
got 2 children, it is height worst case log 2 to the base n. if you have 3 then it is
log 3 to the base n but then to take sorry, log n to the base 3 but to take any decision
in a binary search tree, you have to make one comparison and you can move left or right.
But in a 3 array tree you have to make 2 comparisons because in one comparison you will know decide
exactly which one to go into, you have to make two comparisons. One is what is the value
whether you will go this side or whether you will go this side or whether you will go this
side. And here in a binary search tree, what did you store? You stored a node which was
say 5 then all less than 5 were stored here, all greater than 5 were stored here.
But if you have got 3 then you will have to store 2 values say 7 and 9. This means that
all less than 7 will be this side, greater than 7 and less than 9 will be this side and
greater than 9 will be this side. Otherwise you will not be able to make a 3 array effective
because you have to decide which one to go into. so you have to, your height will be
less but comparison at every node will be more and both ways you will reach that you
can show that more or less you are going to reach in a similar situation. The exact analysis
is left for you.
Now in 2 3 trees or similar to the tournament style data structures which we did, in the
binary search tree here say 2 and 7 the keys are stored at the nodes but if you recall
the tournament data structures the actual node was stored at the leaf and only comparison
values say for example 5 7 1 3 9 11 then you compare these two and store the largest here,
compare these two and store the largest here, compare these two and store the largest here.
It was like this which we stored the tournament. We could store it in an array but we stored
it something like this, all right. That is the nodes were at the leaf and these contained
internal comparison operations.
The 2 3 tree is a similar structure where the nodes are at the leaves and internally
you make a comparison and store it but it is not like a tournament because at the leaves
they may be unordered in a tournament. In a 2-3 tree, the leaf will also be at the ordered.
We will slowly develop what is a 2-3 tree. additionally a 2 3 tree can have either 2
or 3 children, a internal comparison node, these are the leaf nodes, a comparison, these
are the internal nodes, non leaf nodes.
A non-leaf node can have 2 or 3 children, all right. It cannot have 1 child or it cannot
have more than 4 children. And what do these comparison nodes store? So let us have a look
at two examples. These are two 2-3 search trees. The value here is 1 2 3 4 5 6 and every
internal node, these are the internal non leaf nodes and these are the leaf nodes. You
need not take notes whatever material I have given you, I am following exactly that you
just listen. It is given exactly in that book.
These are the non-leaf nodes and these are the leaf nodes. Now a non-leaf node can have
2 or 3 children, all of them can be 2, all of them can be 3, some can be 2, some can
be 3 and they store the comparison of these elements. Now what is stored here? At every
internal node, let us say for the time being we have got two values. In that classical
2 3 trees there are two values, one is called the l value, one is called the m value. The
l value is the largest leaf node.
So there is a left, there is a middle, there is a right. So the structure of a node is
like this. Node, node can be leaf or non leaf, all right. It can be leaf or non leaf. If
it is a non leaf, it will have three pointers left, middle and right and it will have two
values l value and m value, all right. The l value is the largest element along the left
pointer, the m value is the largest element along the middle pointer. so at this node
the largest the l value, there is no right pointer, the right pointer is null, so the
largest value here is 1, the largest value in the middle pointer is 2, clear.
At this node the largest value in the left pointer is 3, the largest value of the middle
pointer is 4, here it is 5, here it is 6. Here the largest value at the left pointer
is 2, so 2 is l here, here it is 4 so 4 is here all right. So let us take another example.
Here there are 3 children, 2 children, 2 children and this is also a 2 3 tree, here l is 1,
l value is 1, m value is 2. Here l value is 4, m value is 5, here l value is 3, it is
the largest on the left and m value is… Where did I get 6? 5, its 5 because it is
the largest on the right, okay. And if you just reverse it like this and traverse the
leaf node from left to right that is you do a traversal similar to left to right traversal,
you will get all the nodes in printed order. Move left you will get the minimum, move totally
right you will get the maximum and so on. The height of this tree will be between log
n to the base 3 in the best case and log n to the base 2 in the worst case.
Because in a binary tree if all of them are 2, if there are n nodes the height will be
log n. should I prove that? If there are 2 nodes, the height will be 1. In a binary tree
if there are 4 nodes, the height will be 2, if there are 8 nodes, the height will be 3
and so on. and if you take a 3 array tree for 3 nodes height will be 1, for 9 nodes
height will be 2 up to 9, 4 to 9 then 5 to 27 height will be 3 so on.
So the height of a 2-3 tree will lie between. so my operation to find maximum, find minimum
and even the operations for moving right and moving left or insert and delete will all
of them will be log n. In fact you can make them order n by maintaining the maximum and
the minimum pointers which can be updated during insertion and deletion. Even in a simple
binary search tree, you can make them order 1 but you will every insertion or every deletion
which inserts or delete maximum element, you will have to update it.
Now let us see what we do. First let us see the definition, okay. A 2-3 tree is a tree
in which each non leaf vertex has 2 or 3 sons or children what ever and every path from
root to leaf is of the same length.
Here note that every path from root to leaf is of the same height in a 2-3 tree unlike
in a binary search tree where the height may be different, even in AVL tree there may be
differences in height. Here the height like in a tournament from root to leaf will be
same, will be exactly same for all the nodes okay. Is that okay?
Now we will see what are the operations for find, what is the algorithm for find? Find
max, min, insert, union, intersection can be easily done also. First let us see find.
Suppose you are asked to find 5. What will you do? Come here find 5 at this root node,
check with the l value. If this value is less than equal to the l value, you move on the
left pointer else if it is less than equal to the middle value you move along the middle
pointer else you move along the right pointer. See here you will move along the middle pointer,
come here move along the right pointer and reach sorry, left pointer. If it is 12 say
if it is 4, it will come here, you will move along the middle pointer, come here move here,
once you reach a leaf node and it’s not the value that you expect, you will say does
not exist.
So the find operation will be the height of the tree which is log of the number of nodes
because this tree, the height of all nodes will be log of n either to the base between
base 2 and base 3, so it is or order log n. the maximum and minimum, obviously you move
left and right you can get it that is no problem. The interesting part comes in insertion like
in the earlier case, suppose we have to insert an element 2. So first thing what will you
do in insert? Find, find 2 because if it exists it will not do anything or if you are maintaining
duplicates, you will add the counter value. So find operation like we mentioned in the
previous case will return the parent, in order to insert the find operation was returning
the parent of the node where we wanted to insert.
So here it will return this and then there will be 2 or 3 children. I have taken a case
where there are only 2 children, 3 is a little more complex because 3 will, if you want to
insert, it will make it 4. Isn’t it? So if there are 2 children there cannot be 1
child, by definition there cannot be 1 child, there can be 2 or 3 children. In this case
there are 2 children, you want to insert 2 so you will reorder the pointer and then place
it in the proper position. So what would you do? You would place 2 here, make this the
middle, make this the right. Anything else? Update this value, go up and see if this value
needs to be updated. Here this value does not need to be updated, okay.
So this is what will happen after insertion of 2. Suppose you want to insert 10. Then
what would have happened? Move here, move here, 10. Nothing happens here, nothing happens
here, you are storing maximum in the left, maximum in the middle. Suppose you store the
maximum on the right also then you would have stored, you would have to update 10 but this
will be sufficient.
Now suppose you want to insert 4, yes. That is the tough one. So let’s go back to our
original situation and try and insert 4 because we have already inserted many. So let’s
insert 4. To insert 4 we will find 4, it will come here. Isn’t it? And we will try and
insert 4 and immediately you cannot put the fourth child. So to put a fourth child, you
will end up in a problem. So what will you do? Okay. Suppose, this part is not there,
suppose this part is not there that should simplify our problem. Suppose this is not there and you want to insert 4.
So here 4 will get, what will you do, can you tell me, guess.
You will have to split this node into 2 nodes, you have to make one more node, all right.
So you will make one more node and here you will keep 4 and 5 and here you will keep 6
and 7. Okay, so this will be 6 7, this will be sorry, 4 will be the left and 4 will be
here and 5 will be here. Now left and right let us ensure and then this will be inserted
here and made the…and this value will become now 5, then what will happen.
So when you go to insert, the first rule is when you go to insert a node and you see that
there are only 2 children its fine. But as soon as you cross 2 children, you have to
create another node and redistribute the children 2 2, you cannot distribute 1 3, you have to
distribute 2 2. And then you have to insert now this node above, you have to insert this
node above. Is that clear? Here it was simple, this inserting this node above was simple
but it may not be always simple because when you try to insert this node then again 4 pointers
may be there, then again you split that, create another new node and move up. That is what
will happen in this original case.
Let us redraw this original case; I think I have it here. That is here if you want to
insert 4, here you want to insert 4, okay is it visible. So again originally we will
find 4 here. So to insert 4, we will split this node into 2 nodes, push this here, push
this here, push this here, this will become 4: 5 6:7, this is first step. So insert 4,
insert 4 in this pointer. Insert this node in this pointer, so after doing this you have
created a new node. So recursively you will go from this parent to this parent and you
will insert this node here. Now in the previous case there were only 2 children like this,
so inserting was very simple but now to insert, you have to again split, create a new node,
put this here, this will be 7 9.
Now you have ended up at the root and now you have got 2 roots, so you have to create
a root. If you end up with 2 nodes and you end at the root then obviously you have to
create a root and this will be now, what will be this value? This is no longer 7 now sorry,
this is 5, this will be 5 9. This side actually 6 and 7 are moved this. Maintaining two values,
the programming effort may be a bit tedious, if you maintain all the 3 values then the
maximum in the left, the maximum in the middle, the maximum in the right then your programming
task will be easier because l value at a node is the largest value of its left side, m value
at a node is the largest value of the middle child and largest value of a node is the largest
value of its right child.
So if you maintain all the 3 values, then your programming task will become easier.
So what is the basic insertion routine? Insertion first will be p equal to find. Given a value,
we will find the parent. The pointer to the parent right, then you will write add this
k which is a pointer to this node to parent p. This is what you will do. Now come to the
add routine which adds into p, which is a pointer to a node, another node. Let us assume
these are two pointers okay and you are trying to add. Then what is case 1? p has 2 children,
this case is easily solved, just put it there update the values. So add p and update values,
obviously update values is there.
These are the l value, m value, r value updating has to be done. I am not going into that let
us first understand this. And what is case 2? p has 3 children, this case is slightly
more involved. What are, what is to be done in this case? Create a node, create a new
node. Second is redistribute, create a node let us call it x, redistribute the children.
And third can anybody tell me what is the third step? Add, this is the overall approach.
Add what? x into p, parent. Isn’t it? Add this to the parent f, add this new node to
the parent and recursively this is the overall structure.
Yeah, you have to now, maintain the parent pointer that becomes obvious that you have
to be able to somehow reach that parent. So let us assume that parent pointers are there.
There is no problem in creating a parent pointer, what is there. Initially make all your data
structures required efficiently to move up and down then you can become more stingy on
space but initially there is no need to be stingy on space. That is why even in a 2-3
my suggestion is though 2 values are kept as intermediate values, it is always advisable
to keep all the 3 values.
Well, maximum at the left, maximum at the middle and the maximum at the right, it helps
you to compute all these 3 values very easily and maintain all parent pointers and children
pointers. So assume they are all there, you can do it in log n times without that also
but in a practical scenario may be its fine both of them are okay. So this is adding and
let us note that for adding, you have moved once down and you will move once up only.
So you will take log n time to insert a node. So this is the basic structure of insertion
okay. This is not the exact code, other than this when you complete adding, you have to
check up another thing.
If now there are root has to be created or not. New root, if new root has to be created
then you have to create a new root, that’s all. Clear? So this is the overall algorithm
for insertion. There is no rotation nothing here involved, its just a purposely showing
this to you to give you an idea that all search trees need not be binary search trees with
rotation and reflection, another that this 2-3 structure of trees specializes to other
trees which are called b trees and b plus trees and they are used when you have to store
in data bases where these are stored in disk, rather we are discussing all these we just
stored in the memory.
In the data structure files have to be stored, huge huge records have to be stored in the
disk and that point of time modified versions of these are used. They are called b trees,
b plus trees but the algorithms are very similar. So this will help you if you ever go to read
a database problem in file handling in data bases, there you will see the structure of
the data structures which they use are like this. The difference we are now working on
the memory, all these are stored in the memory where everything is stored in the disk. So
there are certain other issues but the overall philosophy is the same.
So having seen insert, max, min, is easy, let us have a quick look at delete. I have
made a different tree now, forget it lets meet these 2 trees. Suppose it is this one,
so this is also 2 3 tree 1 3 5 6 7 8 1. Let us store for the time being all the three
values, all right. It will be easier for us to work. This is null, this value will be
the maximum of this, it’s very easy to compute if you have all the 3 and this doesn’t make
any sense because this is null so anything can be stored there. This is null, so whatever
is stored there I don’t care.
Suppose I have to delete any of these elements, it is very simple because if there are 3 children,
deleting is no problem. If you have to delete 1, you just remove it, this becomes left,
this becomes middle. So if you have to delete this one, this becomes if this is null then
the maximum still remains 5. If you have to delete 8 but the problem will come now if
you want to delete 3 or you want to delete anything, it will violate the principle. It
will violate the principle of… so what will you do? Suppose you have to delete 3, so the
first thing is find. In any delete operation first find it and then remove it.
So if you find it and you see there are 3 children that is case 1. In the previous case,
case 1 the simple case was 2 children. Now the simple case is 3 children. If they are
3 children problem is solved but if they are 2 children, what will you do? Give me a guess.
Suppose 3 is removed then what will you do? Then merge these two, the previous one was
split, this one will be merge. But when you go to merge, you have to see, before you merge
you have to check if this has 3 then will you merge? Just move one fellow from here.
So when this is 1 you check what is called the right sibling, right brother okay. And
once you check the right brother, if the right brother has got 3 then you move the left most
and make it the middle and make this the left and you just update the values. But if the
right brother has 2 then you move and make this the left and remove this okay and then again remove this node from
here.
So once this node gets removed, this again becomes the pathological case of one just
throw it away. Got it? We will take 2 examples to be more convenient. So let us delete 3.
let us delete 3, if I delete 3, so delete n on the p. so this is p, this is n say k,
so delete k in p. there are 2 cases, in case 1 this does not hold I am taking the bad case.
Case 2, so check with this side, check with its right sibling. If it is the right most
node then check with the left sibling. You may have deleted 9, so you cannot check with
the right sibling this side, you have to check with the left sibling. Here there are 3, so
in this case just move this fellow here, update this to 5, update this to 5, problem is solved.
Now if you want to delete 6, fine that’s interesting. Now suppose you want to delete
6, so delete this, so you have to find 6. Delete this in this node right, look at it
right brother it has got two. Since it has got two, you will push this one here. This
will be 7 8 9 and then you will delete so this node has got no children. Now once it
has got no children, you will delete this node in this node recursively like you have
added, you will delete p in p pointer dot parent.
Yeah, yeah, you could have shifted this side also, you can choose. Those cases you can
choose all will work, there is no problem all of these cases will work. Similarly you
can try this one and in this one, we delete anyone, you delete say 3, this will shift
here. Okay. So this will shift, 1 will shift here, 3 will get deleted, this will have to
get deleted. To delete this, this will shift here, this will get deleted. Now once this
gets deleted, the root gets deleted, so you reduce the height by 1, like you increase
the height at certain times you will reduce the height at certain times. But all these
operations move once down to find and once up. So, log n plus log n which will give you
log n time. So I will just try to explain this data structure which is interesting in
its own way.
The delete function I think you should sit down and write it out. The delete function
will be similar to the add function. Unfortunately as it always happens, the delete function
is not given in the nodes. Anyway delete a node k in p, all right. So here what is case
1? p has 3 children. This matter is solved easily. case 2: p has 2 children and their
first point is redistribute to right or left according to your choice and then once you
redistribute, if you see p becomes empty that is no children. If p has no children then
what do you do? Delete p in… it has 2 children, now if it is right… it says that p will
have to become empty, no no if your right brother has got 3 children then you can just
take one and push it here.
If it’s right brother has got 3, you just take one and push it here then p will not
become empty. But if your right brother has got only 2 then p will become empty. So that
way and then you delete. So this is how, this is the overall structure for delete, you can
work it out. And here in a 2 3 tree all operations will take log n time. So this is to give you
another flavor of a search tree and we will conclude our discussion on search trees and
balanced search trees with this, just to give you an important idea because this idea is
still one of the most important ideas in data structuring, specially when you have got ordered
elements. That is the use of balanced search trees. So you must be, you must use it in
appropriate place, you should not use a cannon to kill a mouse but you must appropriately
use balanced search trees.