Data Structures - Binary tree

In our previous lesson, we introduced you to tree data structure. We discussed tree as a logical model and talked briefly about some of the applications of tree. Now, in this lesson we will talk a little bit more about binary trees. As we had seen in our previous lesson, binary tree is a tree with this property that each node in the tree can have at most 2 children. We will fist talk about some general properties of binary tree and then we can discuss some special kind of binary trees like binary search tree which is a really efficient structure for storing ordered data. In a binary tree as we were saying, each node can have at most 2 children. In this tree that I have drawn here, nodes have either 0 or 2 children. We could have a node with just one child. I have added one more node here and now we have a node with just one child. Because each node in a binary tree can have at most 2 children, we call one of the children left child and another right child. For the root node, this particular node is left child and this one is right child. A node may have both left and right child. These 4 nodes have both left and right child or a node can have either of left and right child. This one has got a left child, but has not got right child. I'll add one more node here. Now this node has a right child, but does not have a left child. In a program, we would set the reference or pointer to left child as NULL. So, we can say that for this node left child is NULL and similarly for this node, we can say that the right child is NULL. For all the other nodes that do not have children, that are leaf nodes, a node with 0 child is called leaf node. For all these nodes, we can say that both left and right child are NULL. Based on properties, we classify binary trees into different types. I'll draw some more binary trees here. If a tree has just one node, then also its a binary tree. This structure is also a binary tree. This is also a binary tree. Remember, the only condition is that a node cannot have more than 2 children. A binary tree is called strict binary tree or proper binary tree if each node can have either 2 or 0 children. This tree that I am showing here is not a strict binary tree because we have 2 nodes that have one child. I'll get rid of 2 nodes and now this is a strict binary tree. We call a binary tree complete binary tree if all levels except possibly the last level are completely filled and all nodes are as far left as possible. All levels except possibly the last level will anyway be filled. So, the nodes at the last level, if its not filled completely must be as far left as possible. Right now this tree is not a complete binary tree. Nodes at same depth can be called nodes at same level. Root node in a tree has depth 0. Depth of a node is defined as length of path from root to that nodes. In this figure lets say nodes at depth 0 are nodes at level 0. I can simply say L-0 for level 0. Now, these two nodes are at level 1, these 4 nodes are at level 2 and finally these 2 nodes are at level 3. The maximum depth of any node in the tree is 3. Maximum depth of a tree is also equal to height of the tree. if we will go numbering all the levels in the tree like L-0, L-1, L-2 and so on, then the maximum number of nodes that we can have at some level i, will be equal to 2 to the power i. At level 0, we can have 1 node, 2 to the power 0 is 1. Then at level 1, we can have at max 2 nodes. At level 2 , we can have 2 to the power 2 nodes at max which is 4. So, in general at any level i, we can have at max 2 to the power i nodes. You should be able to see this very clearly. Because each node can have 2 children, so if we have x nodes at a level, then each of these x nodes can have 2 children. So, at next level, we can have at most 2x children. Here in this binary tree, we have 4 nodes at level 2 which is the maximum for level 2. Now, each of these nodes can possibly have 2 children. I am just drawing the arrows here. So, at level 3, we can have max 2 times 4 i.e 8 nodes. Now, for a complete binary tree, all the levels have to be completely filled. We can give exception to the last level or deepest level. It doesn't have to be full. But the nodes have to be as left as possible. This particular tree that I am showing here is not a complete binary tree because we have 2 vacant node positions in left here. I'll do slight change in this structure. Now this is a complete binary tree. We can have more nodes at level 3, but there should not be a vacant position left. I have added one more node here and this still is a complete binary tree. if all the levels are completely filled, such a binary tree can also be called perfect binary tree. In a perfect binary tree, all levels will be completely filled. If h is the height of a perfect binary tree, remember height of a binary tree is length of longest path between root to any of the leaf nodes or i should say number of edges in longest path from root to any of the leaf nodes. Height of a binary tree will also be equal to max depth, Here, for this binary tree, max depth is 3. Maximum number of nodes in a tree with height h will be equal to, we will have 2 to the power 0 nodes at level 0, 2 to the power 1 nodes at level 1 and we'll go on summing for height h, we will go till 2 to the power h. At deepest level, we will have 2 to the power h nodes. Now this will be equal to 2 to the power h plus 1 minus 1. h+1 is number of levels here. We can say that 2 to the power number of levels minus 1. In this tree, number of levels is 4. We have L0 till L3. So, number of nodes, maximum number of nodes will be 2 to the power 4 minus 1 which is 15. So, a perfect binary tree will have maximum number of nodes possible for a height because all levels will be completely filled. Well, I should say maximum number of nodes in a binary tree with height h. Ok, I can ask you this also. What will be height of a perfect binary tree with N nodes. Lets say N is number of nodes in a perfect binary tree. To find out height, we'll have to solve this equation n = 2^h+1 - 1 because if height is h, number of nodes will be 2 to the power (h+1) minus 1. We can solve this equation and the result will be this. Remember n is number of nodes here. I'll leave the maths for you to understand. Height will be equal to log (n+1) to the base 2 minus 1. In this perfect binary tree that I am showing here, number of nodes is 15. So, n is 15. (n+1) will be 16. So, h will be log 16 to the base 2 minus 1. log 16 to the base 2 will be 4. So, the final value will be 4-1 equal 3. In general, for a complete binary tree, we can also calculate height as floor of log n to the base 2. So, we need to take integral part of log n to the base 2. Perfect binary tree is also a complete binary tree. Here n is 15. log of 15 to base 2 is 3.906891. if we'll take the integral part, then this will be 3. I'll not go into proof of how height of complete binary tree will be log n to the base 2. We'll try to see that later. All this maths will be really helpful when we will analyze cost of various operations on binary tree. Cost of a lot of operations on tree in terms of time depends upon the height of tree. For example, in binary search tree which is a special kind of binary tree, the cost of searching, inserting or removing an element in terms of time is proportional to the height of tree. So, in such case we would want the height of the tree to be less. Height of a tree will be less if the tree will be dense, If the tree will be close to a perfect binary tree or a complete binary tree. Minimum height of a tree with n nodes can be log n to the base 2 when the tree will be a complete binary tree. if we will have an arrangement like this, then the tree will have maximum height. With n nodes, minimum height possible is floor of or integral part of log n to the base 2 and maximum height possible with n nodes in n-1 when we will have a sparse tree like this which is as good as a linked list. Now, think about this. If I'm saying that time taken for an operation is proportional to height of the tree or in other words I can say that if time complexity of an operation is big-oh of h where h is height of the binary tree, then for a complete or perfect binary tree, my time complexity will be big-oh of log n to the base 2 and in worst case for this sparse tree, my time complexity will be big-oh of n. Order of log n is almost best running time possible. For n as high as 2 to the power 100, log n to the base 2 is just 100. With order of n running time, if n will be 2 to the power 100, we won't be able to finish our computation in years even with most powerful machines ever made. So, here is the thing. Quite often, we want to keep the height of a binary tree minimum possible or most commonly, we say that we try to keep a binary tree balanced. We call a binary tree balanced binary tree, if for each node, the difference between height of left and right sub-tree is not more than some number k. Mostly k would be 1. So, we can say that for each node, difference between height of left and right sub-tree should not be more than 1. There is something that I want to talk about height of a tree. We had defined height earlier as number of edges in longest path from root to a leaf. Height of a tree with just one node where the node itself will be a leaf node will be 0. We can define an empty tree as a tree with no node and we can say that height of an empty tree is -1. So, height of tree with just one node is 0 and height of an empty tree is -1. Quite often, people calculate height as number of nodes in longest path from root to a leaf. In this figure, I have drawn one of the longest paths from root to a leaf. We have 3 edges in this path. So, the height is 3. If we will count number of nodes in the path, height will be 4. This looks very intuitive and I have seen this definition of height at lot of places. if we will count the nodes, height of tree with just one node will be equal to 1 and then we can say that height of an empty tree will be 0, but this is not the correct definition and we are not going to use this assumption. We are going to say that height of an empty tree is -1 and height of tree with one node is 0. The difference between heights of left and right sub-trees of a node can be calculated as absolute value of height of left subtree minus height of right subtree and in this calculation, height of a sub-tree can be -1 also. For this leaf node here in this figure, both left and right sub-trees are empty, so both hleft or height of left sub-tree and hright or height of right sub-tree will be -1, but the difference overall will be 0. For all nodes in a perfect tree, difference will be 0. I have got rid of some nodes in this tree and now by the side of each node, I have written the value of diff. This is still a balanced binary tree because the maximum diff for any node is 1. Lets get rid of some more nodes in this tree and now this is not balanced because one of the nodes has diff 2. For this particular node, height of left sub-tree is 1 and height of right sub-tree is -1 because right sub-tree is empty. So, the absolute value of difference is 2. We try to keep a tree balanced to make sure its dense and its height is minimized. If height is minimized, cost of various operations that depend upon height are minimized. Ok, the next thing that I want to talk about very briefly is how we can store binary trees in memory. One of the ways that we had seen in our previous lesson which is most commonly used is dynamically created nodes linked to each other using pointers or references. For a binary tree of integers, in C or C++, we can define node like this - data type here is integer, so we have a field to store data and we have two pointer variables, one to store address of left child and another to store address of right child. This of course is the most common way. Nodes dynamically created at random locations in memory linked together through pointers, but in some special cases, we use arrays also. Arrays are typically used for complete binary trees. I have drawn a perfect binary tree here. Lets say this is a tree of integers. What we can do is, we can number these nodes from 0 starting at root and going level by level from left to right. So, we will go like 0, 1 , 2 , 3 , 4, 5 and 6. Now, I can create an array of 7 integers and these numbers can used as indices for these nodes. So, at 0th position, I'll fill 2, at 1th position I'll fill 4, at 2th position, we will have 1 and I'll go on like this. We have filled in all the data in the array, but how will we store that information about the links. How will we know that the left child of root has value 4 and the right child of root has value 1? Well, in case of complete binary tree, if we will number the nodes like this that for a node at index i, the index of left child will be 2*i+1 and the index of right child will be 2i+2 and remember this is true only for a complete binary tree. For 0, left child is - 2i+1 for i = 0 will be 1 and 2i+2 will be 2. Now, for 1, left child is at index 3, right child is at index 4. For i =2, 2i+1 will be 5 and 2i+2 will be 6. We will discuss array implementation in detail when we will talk about a special kind of binary tree called heap. Arrays are used to implement heaps. I'll stop here now. In our next lesson, we will talk about binary search tree which is also a special kind of binary tree that gives us a really efficient storing structure in which we can search something quickly as well as update it quickly. This is it for this lesson. Thanks for watching !!