Lec - 9 pipeline concept - Iii

Now let us see that how to take care of collision and what should be your scheduling strategy in a generalized pipeline, so that collision does not occur. Now we will concentrate on collision avoidance. So before I go for this collision avoidance, let me introduce few terms, a term which is called latency. Latency is the time difference or the time step difference between successive initiation of two tasks. So if I initiate a task at the time t0 and if I find that I can initiate another task at time t5 and this initiation does not lead to any collision. Collision can be at any of the computation stages. It can be at computation stage s1, at computation stage s2 or s3 but I have to find out that I have to initiate the tasks in such a way that it does not lead to collision at any of the computation stages. By analyzing, if I find out that I can initiate a task and if I initiate a task at t0 that is the first task, the second task I can initiate only at t5 nowhere in-between. Then the latency is the time step difference between initiation of these two tasks. So in this case, the latency will be 5 or 5 times steps. So once I defined this latency, I can also define what is called a latency sequence. A latency sequence is the sequence of latencies of successive initiation of tasks. It may so happen that initially I find that I can initiate the second task with latency 5 but it is also possible that the next task that I can initiate after say 2 time steps. That is at t0 I initiate one task, at t5 I initiate the second task then at t7 I can initiate the third task. So in that case the latency sequence becomes 5, 2. So it is the sequence of latencies of initiation of successive tasks. I can also define what is called a latency cycle. A latency cycle is a latency sequence which repeats that means if I have a sequence like this say 5, 2, 5, 2, 5, 2 like this then this 5, 2 these becomes a latency cycle or I can say 2, 5 is a latency cycle because that is a sequence which repeats. Of course I have to ensure that if I follow this latency cycle for initiation of tasks, this does not lead to any collision. For considering this collision avoidance, I consider another reservation table. Let me assume that I have a pipeline with 5 computation stages. So let us say competition stages 1, 2, 3, 4 and 5. For computation of a particular task, a particular function it makes use of say 9 time steps which are from t0 t1 t2 t3 t4 t5 t6 t7 and t8. So my reservation table appears like this. So this is the reservation table for computation of a particular task on a particular pipeline. For simplicity, I will assume that the pipeline is uni-function pipeline that means only one function can be computed on that pipeline. However the concept that I am discussing can also be generalized for a multi function pipeline like the one that we have just discussed where the pipeline can compute two functions a and b with two defined reservation tables. Now even if the pipeline is a uni function pipeline then also the collision can occur. So here at time t0, I initiate a task. Then at time t8, I cannot initiate another task because during time step t8, the first task that was initiated at t0 makes use of the computation stage one. So during time t8, if I initiate the second task then this t8 is nothing but t0 of the second task. So the second task will also make use of the same computation stage one during the same time instant which leads to a collision. If I initiate a task at t0, at t8 I cannot initiate the second task. Similarly here for two consecutive time steps t1 and t2, the first task make use of computation stage two. So I should ensure that the second task that I initiate should never be initiated in such a way that use of this computation stage two is simultaneous for both task one and task two. I have to identify that what are the time steps, when I can initiate the second and subsequent task by analyzing this reservation table. So for that what I have to do is I have to find out what is called a collision vector from this reservation table. For finding out the collision vector as I said that the time gap, time step difference between two successive initiation of tasks is called a latency. I have to find out that what are the latencies which are not allowed. That means I have to find out what is called a set of forbidden latencies. I can find out the set of forbidden latencies from this reservation table itself, so that is called a set of forbidden latencies. So that set, I term as set F. Now set of forbidden latencies can be found out very easily. What I have to do is I have to find out that what is the column difference between every pair of crosses in each group. See as I said that in this particular case if I initiate a task at t0, I cannot initiate another task at t8 that means I cannot have a latency which is t8 minus t0 that is 8. So latency of 8 is not allowed that is forbidden. So to find out the set of forbidden latencies, what I have to do is I have to find out the column difference. This 8 is nothing but the column difference between these two crosses in row one. What I have to do is I have to find out the column difference between every pair of crosses in each row of this reservation table. So from the first row, obviously I get a forbidden latency which is 8. From the second row 1, 5 and 6, third row there is only one cross so I don’t bother. In the fourth row 1, fifth row is also 1. So the set of forbidden latencies becomes 1, 5, 6 and 8. From this set of forbidden latencies, what I have to compute is what is called a collision vector. Let us call this collision vector as C, this collision vector is a binary vector. The number of states will be same as the maximum forbidden latency in the set of forbidden latencies. So in this case the maximum forbidden latency is 8. So number of bits in the collision vector C will also be 8. So this collision vector C is of this form C equal to… In general let me put it as Cn, Cn-1 like this, C2 and C1 where obviously n is the maximum forbidden latency. In this collision vector my strategy for deciding, as I said that this collision vector is a bit vector or a binary vector. That means each of these components in this vector can be either be 0 or 1. So my decision will be I will set Ci is equal to 1. A particular bit in this vector Ci equal to 1, if i is a member of F. So in this case 8 is a member of F, so C8 will be set equal to one. 5 is a member of this set, C5 will also be set to 1, C6 will also be 1, C1 will also be 1. Rest of the bits in this collision vector will be equal to 0. So Ci equals to 1, if i belongs to F else Ci will be equal to 0. So in this particular case what will be C, collision vector C? 1 0 1 1 0 0 0 1. So this is the collision vector for this reservation table. Now see that once I have this collision vector, I can very easily find out that what are the time steps when I can initiate a new task. What I do is whenever I start the first task, this collision vector is loaded in to a shift register. At the end of every time step, you just give a right shift to the shift register. You find that what is coming out of the shift register. If a bit 1 comes out of the shift register then I cannot initiate a new task. If a 0 comes out of the shift register then I can initiate a new task. Now that is not all. This takes care of only initiation of two tasks because whenever I initiate a new task, the collision vector is going to be different because the reservation table of the new task is going to be superimposed with the remaining part of the reservation table of the previous tasks. So I have to re-compute this collision vector to decide when next a new task can be initialized. So the strategy is simple like this. Whenever you give a right shift to the collision vector, what comes out of the shift register? I will just forget about that. Using that I decide whether I can initiate a task or I cannot initiate a task. The left most bits which are becoming vacant, you fill up those bits with zero. Simply because if this is the only task that is to be executed then when the task is complete say in this case, the task will be completed after 8 shifts. So when the task is complete, at any time I can initiate a new task, that does not lead to a collision. So all the left most bit which becomes vacant because of right shift, you fill up those bits with zero. So this remaining collision vector that you get, whenever I initiate a new task, I simply bit wise or the collision vector of the new task with whatever is remaining in the shift register and that becomes my new collision vector. So coming to this particular example where I get a collision vector like this 1 0 1 1 0 0 0 1. Once the time step t0 is complete, I have to give one bit shift in the right direction to this collision vector. So the new collision vector that I get is 1 0 1 1 0 0 0 and what comes out of the shift register is bit 1 that was this bit. The left most bit which has become vacant, you fill up with 0. Now because a bit of 1 has come out of the shift register, so I cannot initiate a new task. So the new collision vector remains as it is. Then you give the second shift at the end of time t1 and the new collision vector that I get is 0 1 0 1 1 0 0 and a 0 comes out of the shift register from the right most position, the left side I fill up with 0. Now since this is 0, so since a 0 has come out of the shift register from the right most bit so that I can initiate a new task. The moment I initiate a new task, the new reservation table becomes active and that is superimposed with the remaining part of the reservation table of the first task. That means the collision vector is also going to be changed. So for this what I do is the collision vector for the new task that is initiated will be same as this. So I simply bit wise or these two collision vectors. So the new collision vector that I get is 1 0 1 1 1 1 0 1. So you find that my first latency is 2. The second latency is going to be decided by this. So to find out when a new task, next task can be initiated, I have to perform the same right shift operation on this resultant collision vector. So after one time instant I cannot allocate a new task. I cannot initiate a new task because the right most bit is 1. Only after the two time instants I can initiate a next task because the second bit from the right is 0 and once I do that I initiate a new task. Again this collision vector has to be recomputed and you follow the same logic to find out when again the next task, next to the second one can be initiated. So if I follow this, you find that I can generate a state diagram. If I say that each of this collision vectors represent a state of the pipeline. I can generate a state diagram of the pipeline and for this particular example, you will find that the state diagram will appear something like this. Our initial collision vector is this one, 1 0 1 1 0 0 0 1. So this is the initial collision vector or the initial state. Here we have said that after the latency of two when I initiate the second task, the collision vector becomes 1 0 1 1 1 1 0 1 so that is another state. I can reach from this initial state to this state with a latency two. From here again you find that after a latency two I can initiate another task because the second bit from the right is 0. So if I perform similar thing, you will find that the new state or the new collision vector of this pipeline will be something like this. So this again you get after a latency two. From here after a latency 7 or more, so I put it as 7 plus. 7 or more so I put it as 7 plus. I will come back to the initial latency. Here again after a latency of 7 or more, I will also come back to this initial state. In this itself after a latency of 7 or more, I will remain in the same state. So I will put it as 7 plus. Now from the initial state, one option was that with latency two, I can initiate a new task and I come to this state and here you find that 2 3 4 they are all zeros. That means I can have a latency 2. If I don’t want to initiate a task with latency 2, I can initiate a task with latency 3. I can also initiate a task with latency 4. I can also initiate a task with latency 7. So if I initiate a task with latency 3, if you compute you will find that the new collision vector will be 1 0 1 1 0 1 1 1 and this is the state that you can reach from the initial state with latency 3. With latency 4 again from the initial state, I can reach a state of 1 0 1 1 1 0 1 1 and this is a state that I can reach from the initial state with latency 4. Now from this state I can come to this state, if you analyze with a latency 4 because the fourth bit from the right is 0. From this state I can go back to this state with a latency 3 because you see that the third bit from the right in this collision vector is equal to 0. So once am in this state, if I initiate a task with latency 3, the next state will be this one. From this state with latency 7 plus, I can go back to this initial state. From this state also with latency 7 plus that is 7 or more to this initial state and of course this is the initial state. So as per convention I just put an incoming arrow to this state. So we find that we get what is called a state diagram and this state diagram tells you that all possible latencies or all possible collision vectors that we can have for a pipeline executing a function given by this reservation table. Now this particular state diagram also tells me some more important things because as I said that the latency; as I defined the latency, the latency is the time state difference between two successive initiation of tasks. So naturally our pipeline will be more efficient or the throughput of the pipeline will be more if the latency is less or our latency is less. If the latency is more that indicates that the initiation difference, time difference between initiation of successive tasks is very high that means the throughput of the pipeline will be low. Now you find that from this state diagram I can also find out that what are the states in which I should confine so that throughput of the pipeline will be maximum. You find one more thing that in this state diagram, I have a number of cycles. So those cycles indicate that there is a stable sustainable way in which the tasks can be initiated and that is stable. So here I can initiate a task starting from the initial state with latency 2. From this I can also initiate a task with latency 2. After that with latency 7 or more, I can initiate a task and I come back to this initial state and this can be repeated again and again. This also can be repeated again and again. If first I initiate a task with latency 2, after that I initiate tasks with latency 7 or more. This also can be sustained and this is a stable cycle. This is also a stable cycle 3, 7, 3, 7; 4, 3, 7 that is also a stable cycle. 4, 7 this is also a stable cycle, 4, 3 that is also a stable cycle. So from this state diagram, I can find out that if there is a cycle then what is the average latency in that cycle. Now what is the average latency? I have to find out a cycle with few number of states. I have to find out what is the total latency in that cycle divided by the number of states that is involved in the cycle that gives me average latency of that cycle. So if you do that, you find that if I take just this cycle, here the latency is 2 plus 7 or more that means minimum latency involved with cycle is 9 divided by 2. So average latency is 4.5. If I come here 4 plus 3, 7 divided by 2 that is 3.5 and you will find that the average latency, minimum average latency for any cycle that you get in this is 3.5. So if I decide a scheduling strategy such that I will confine only within these two states. I will not make use of any other state in this state diagram. Of course the initial state I cannot avoid because the first task that has to be initiated puts the pipeline in this state that I cannot avoid. After that with latency 3, I come to this particular state. Once I am in this particular state, I will initiate the tasks following this latency sequence of this latency cycle 4, 3, again 4, again 3, again 4, again 3. So this way I will continue which gives me an average latency of 3.5 and that is the minimum. So if I follow this scheduling policy on this pipeline, the throughput of the pipeline is going to be maximum. So that is an important property of this state diagram which can be used for optimum scheduling. Now in this particular case I have used only one function that is assumed that the pipeline is designed for execution of one function only and such pipelines are called uni functional pipeline. Whereas the other example that I have taken or I have said that in this generalized pipeline, I can compute two functions either A, I can compute A, I can also compute B. These kind of pipeline are called multi-functional pipelines. Now as I have seen that in case of uni functional pipeline, I can avoid the collisions following this state diagrams of collision vectors. Similarly for a multi-functional pipeline also, I can generate a similar such thing but that is more complicated. I will not go in to that but only one hint that I will give you that if I use a multi functional pipeline and because it is a multi functional pipeline, any function can be initiated at any time. I can initiate function A, next time I can initiate a function A, I can initiate a function B also. So there what you have to take care is instead of considering the collision vector of one function, I have to merge both the reservation tables and what I have to find out is what is called a cross collision vector. So there I can have collision for two reasons, for initiations of A followed by initiation of A. I can have collision for initiation of B followed by initiation of B. I can have collision for initiation of A followed by initiation of B or I can have collision for initiation of B followed by initiation of A. So all these four different combinations I have to consider and I have to generate what is called cross collision vectors. So by considering all those cross collision vectors and because there are so many options, in that case I will not have a vector. What I will have is a matrix that becomes a cross collision matrix. So by analyzing the cross collision matrix, I have to determine that what are the latencies which are permitted for which sequence of jobs. So that is a much more complicated task. For this course I am not going to go in to that. So this is a simple situation with an uni functional pipeline, of this tells you that what are the advantages of pipeline that you can extract and what are the problems that you can face and how to solve those problems. So with this I will stop today.