Tip:
Highlight text to annotate it
X
Welcome to part 3 of the lecture on the Static Single Assignment form. To recap a bit, here
is an example of the SSA form and the non-SSA form as well.
The non-SSA form is just a flow graph; whereas, in the SSA form some of the join nodes will
have phi functions for the incoming parameters. For example, i in the original flow graph
flows from B1 into B2 and also from B7 into B2. So in B2, the SSA form we have a phi function
which really takes two parameters I3 and I1; I3 corresponds to the first parameter coming
from B7 and I1 corresponds to the second parameter coming from B1. Similarly, the variable n
is read here, so that is like a definition and later, we have definitions of n in B5
and B6 as well. So in B7, we have a phi function for n which
has two parameters n3 and n4, corresponding to these two predecessors. Then, in B2 we
have another phi function for n, which takes care of n coming from B1 and this n6 which
is coming from B7? So that is how phi functions are. So, we also saw how to insert phi functions
and how to rename the parameters of the phi function etcetera.
So, we were looking at the optimizations with SSA forms. There are many optimizations which
are very fruitful on such forms. For example dead-code elimination, simple constant propagation,
copy propagation, conditional constant propagation, constant folding, global value numbering these
are all optimizations which can be done very effectively on the SSA form.
Simple Constant Propagation is really simple. Take all the statements put them in a statement
pile and then take one at a time from the statement pile. If they are trivial phi functions
with all parameters being equal and constant, then such statements can be replaced by x
equal to c. Otherwise, if there is a statement x equal to c then we take the du-chain of
that particular x. Then for all uses of x, we can really substitute c and then the new
statement is also added to the statement pile, so that we can propagate constants further.
Copy propagation is very simple, if there is a single argument function x equal to phi
y or a copy statement x equal to y, these can be deleted. We can substitute y for every
use of x. This is possible because every use is read exactly by one definition in the SSA
form.
The Conditional Constant Propagation is special. Let us recapitulate the transfer function
of the Conditional Constant Propagation Frame work. So this is monotonic, but it is not
distributive and here is the lattice of the constants. So, all the constants are in comparable
there is an undefined value at the top and not a constant value at the bottom.
So, how does conditional constant propagation on SSA forms work? So, SSA forms along with
extra edges corresponding to the definition use information are used here. So edge from
every definition to each of it uses in the SSA form hence, we call these as SSA edges.
So, they are also available in the graph. It uses both flow graph edges and the SSA
edges and maintains two different work lists. This is a work list based approach exactly
like simple constant propagation. So, we have a Flowpile and an SSApile.
So, it is important that flow graph edges are used to keep track of reachable code.
So, whatever code cannot be reached will have all the edges incoming into it marked as non-executable
and therefore, we can never reach that node. SSA edges are helpful in the propagation of
values, whenever there is a change of value in a node; the SSA edge is used to activate
that particular node and put it into a SSApile. So, the flow graph edges are added to the
flowpile, whenever a branch node is symbolically executed or whenever an assignment node has
a single successor.
Whereas SSA edges coming out of a node are added to the SSA work list, whenever there
is a change in the value of the assigned variable at the node. So the algorithm really needs
only one lattice cell per variable not on a per node basis and two lattice cell per
node to store expression values, so not too much of extra space.
The Conditional expressions at branch nodes are evaluated and depending on the value,
either one of the outgoing edges corresponding to true or false or both edges corresponding
to not a constant are added to the work list. So, if you are able to evaluate it to either
true or false, only one edge is added otherwise, both edges have to be added. At any join node,
the meet operation considers only those predecessors which are marked executable. So that is an
extra point to be noted here.
So let us look at the algorithm, we saw an example. We will learn through that example
again a little later. So, G is the SSA graph N Ef and Es; so Ef is the flow graph edges,
Es are the SSA edges. Now, V is the set of variables used in the SSA graph that is the
program itself. So, we initialize the flowpile with the first edge start to n, so all the
edges which go out of the start node are added to the flowpile, whereas the SSA pile kept
empty. For all the edges in the Ef set that is, all
the flow graph edges; they are made as e.executable equal to false, so that nothing is marked
as true to begin with. Then v.cell is the cell associated with the variable v, so we
must initialize that also it is initialized to top, that is the undefined value. Then
y.oldval and y.newval store the lattice values of expressions at a particular node y. So
these need to be initialized as well. So both y.oldval and y.newval are initialized to top
the undefined value.
Then there is a big loop which goes on until both flowpile and SSA pile become empty. So
the first one is, if flowpile is not equal to phi that is flowpile is not empty, what
we do? We remove an edge from the flowpile. If the edge is not marked as executable that
means, we are now entering through an edge which is not yet seen before. Now mark that
particular edges executable that is executable equal to true for that edge.
Now, we check couple of things; is it a phi node so if phi present y, so this is true
if it is a phi node then we call the function visit phi y, we are going to see some details
little later. So if it is not a phi node, then it is an ordinary expression node. So,
first time visit y is checked and is saying are we visiting it for the first or are we
visiting it second third time etcetera. If it is first time then visit expression;
the point is visit expression is called on y only on the first visit y through a flow
edge. Subsequently it is called on y on visits through SSA edges only. So, if first time
visit y is false then we do not visit that expression right now, we are going to visit
it later, if any values of the parameters in the expression change.So, if flow-outdegree
is 1; that means, we have exactly 1 successor for the node after doing some processing,
then we just add that to the flowpile.
Then, if the edge has already been marked that is this part so is it first time y etcetera
is where we are.
Now we look at this SSApile. Alternately, we are going to look at the SSApile and the
flowpile. So, remove an edge from the SSApile again check whether it is a phi node then
call visit phi y; if it is already visited then visit expression again because, we are
now coming through the SSA, so nothing wrong with that. The point is, if it is not visited
already - the node y - is not visited already that means, it is not yet reachable through
any flow edges. Therefore, we are not going to visit it at all. Unless a node is already
visited, we are not going to visit it when we visit through a necessary edge because
it may not be reachable at all. So this loop goes on until both piles are empty. Now let
us look at the details.
So phi present y is simple; if y is a phi node then returns true otherwise return false.
So what does visit phi y do? So y is a node, take y.newval as undefined value. Let y.instruction.inputs
be the number of parameters of the phi instruction at the node y. Now, we are going to take each
one of the inputs and then check whether the edge corresponding to that input is executable.
Why we want to actually take the meet of only those parameters, whose corresponding edges
are marked as executable otherwise we do not want to touch them. So, let pi be the ith
predecessor of y, i running from 1 to y.instruction.inputs. If pi.y is executable then take the corresponding
input y.instruction.inputs I, so the ith input is taken in the i.
Then take y.newval and meet it with ai.cell, so you get the new y.newval - updated value.
This is done for all the inputs and we leave out those inputs which come through edges
not marked as executable, we do not want to touch them. So it is easy to see that if a
node has not been visited at all not even once, then nothing gets done at the phi node,
this loop will run many times but in an empty fashion doing nothing.
Now, if the new value is less than y.instruction.output.cell the old value that means, the value has changed.
Remember, we go from undefined to constant to not a constant that is downwards in the
lattice cell that is y is less than. If y.instruction.cell equal to y.newval, take the new value as the
output value and add the outgoing SSA edges to the SSApile. So, y comma z, where y comma
z is in es, so all the edges going out of phi are added to the SSA pile because the
value has changed. If the value has not changed, there is nothing to do, you do not add anymore
edges to the SSA pile. So already visited it is simple, it simply
checks the incoming edges of y. If one of them is marked as executable then it is already
visited otherwise it is not visited at all. Check incoming edges of y for all e in x y
such that x y in ef, so y is in our node so x y is the incoming edge. If e.executable
is true for at least one edge e, then return true otherwise return false. So, it is a fairly
straight forward function.
What does first time visit y do? Exactly one of the edges must be marked as executable
and other should not be. So, if e.executable is true for more than one edge then return
false, otherwise return true. At least one incoming edge will have executable true, because
the edge through which the node is entered is marked as executable before calling this
function. The first time you come to y, it will be one of the edges - incoming edges
- will be true, so you will get something as true. But if more than one edges is true
then we are entering for the second time, so this will be returned as false.
What does visit expression do? It really processes the expression whether it is an assignment
statement or a branch condition. Take the 2 inputs; input one equal to y.instruction.inputs
1 and input two, as y.instruction.inputs 2. If I recall the transfer function, if one
of the inputs in x plus y either x or y is not a constant, then the output is also not
a constant. So, if input 1.cell equal to n a c or input 2.cell equal to n a c then y.newval
is n a c not a constant. We have covered the n a c part. If one of
them is undefined input 1 or input 2 is undefined, then y.newval is undefined. If this is also
not true then both are really neither top or bottom values in the lattice, so we can
do some evaluation. Evaluate the expression as per the lattice evaluation rules, evaluate
y, so the expression is evaluated. Of course, it is easy to modify this to handle instructions
with one operand. So copy instructions are easy to handle.
If y is an assignment node then if y.newval is less than y.instruction.output.cell. So
the newval that we got here by evaluating the expression is less than the old value.
That means, the value has changed. So remember, we always go down in the lattice. Take the
new value as y.newval, store it in the instruction.output.cell of y and add all the edges going out of y
to the SSApile so for it is as we did before.
What if it is a branch node if the value has changed, y.newval is less than oldval then
y.oldval is equal to y.newval. Now check whether, what is the value of y.newval, if it is n
a c not a constant that means, both a true and false branches are equally likely. We
add both branches to the flowpile. If it is evaluated to true then we add only the true
branch edge to the flowpile; in the case of false, we add the false branch edge to the
flowpile. This is where, if the condition has become
a constant and as evaluated to either true or a false. Then, we can avoid some of the
nodes which can be entered through either the true or false edges. So, we actually remove
those as read code finally.
So here is a simple example, let us run through the more difficult example, because this is
a2 simple example, which we ran through last time. So this example, we start with the first
node B1 after start, a1, b1, c1 are all initialized. This particular edge actually is the only
one coming of B1, we add this edge to the flowpile and that makes this particular node
to be interpreted next.
So what happens here, see that b4, b1; b1 is the only parameters which is defined. This
part is not yet to marked as executable, so nothing is coming out of this.
So only b1; b1 is 1, so b2 becomes 1 phi b1 of is just b1, which is 1. Similarly, phi
c1 is c1 is 0, so c2 becomes 0. Therefore, c2 less than 100 is obviously true, 0 less
than 100 and that makes this particular edge to be added to the flowpile and this is not
yet added; it does not mean it will never be added it may be added little later.
Now, we evaluated b2 less than 20 that is also true, so again the true part is true
edge is taken and become to B5. Now in node B5 b3 evaluates to 1 and c3 evaluates to 1
both are constants as so far. This particular edge which is the only successor is taken
we come to B7.
Once we evaluate B7, again this parameter is not yet available, only B3 is available
and similarly, only c3 is available when evaluate b4 it becomes 1 and c4 becomes 1. This edge
is added to the flowpile that means, we come to B2 once more a second visit.
So second visit change in the value of c2 to not a constant, but there is no change
in the value of b2. Even with this available b4 is 1, b1 is 1, so this b2 becomes 1 but,
in the case of c2; c4 is 1 but c1 is 0, so phi function is not a constant.
Now, this s h j edge actually activates the node B5 and B6 both of them. Let us take this
particular node, b3 now evaluates to 1 and c3 becomes not a constant because c2 plus
1 is the value; c2 is less than 100 which is not known. So at this point, we do not
know whether c2 is 100 or not it has been evaluated to bottom. Actually, we add both
edges to the flowpile, so this edge and this edge both are added.
We actually, in this particular basic there is nothing happening which has been entered
through this particular SSA edge. So B6 there is nothing happening because this particular
edge is not marked as executable. After this we evaluated this B5, c3 becomes not a constant
again, b3 is a constant. We would have already come here, through the SSA edge we come here
again.
Now evaluate B7 again, so when we evaluate B7 as a second time, b4 remains as 1 but c4
becomes not a constant. There is change value that means we need to propagate the value
to this through the SSA, this goes on the SSA pile.
Now, we come to the third visit of B2. There is no change in b2 and there is no change
in c2. The b2 remains at 1 c2 remains at not a constant value. So there are no more SSA
edges added to the SSApile, there were no flowpile edges added either this was evaluated
long back or nothing was done for this particular edge.
Finally, we can remove some of these dead edges; this is dead, this is dead so this
node is removed Then we get a simplified SSA graph. Here b2 is 1, c2 has not been evaluated,
so it became not a constant and it remains as phi of c4, c1. If c2 is less than 100 remains
as it is, in this case b3 was evaluated to 1; c3 was not a constant so it remained as
c2 plus 1. Here b4 was evaluated to 1 and c4 was evaluated
as phi of c3, which is c3 itself but no for the value as such. Now we can do some copy
propagation and removal of course such as b2 equal to 1, b3 equal to 1, b4 equal to
1 and simplify the entire flowgraph to every small one like this. You see that the b variable
is completely drawn; this was possible because of conditional constant propagation.
So that is the conditional constant propagation algorithm, that we saw until now. Let us move
on to the next optimization known as value numbering. We have seen value numbering before;
we did value numbering on basic blocks. We used a hash table, we entered expression into
that hash table and whenever we found another expression with the same value number we said
these expressions are identical. So, the variables are also entered into the
table but not exactly the same table, it was entered into the name table. Then we saw value
numbering with extended basic blocks that actually found a few more cost the commerce
of expressions and did few more copy propagation etcetera. The reason was the scope of the
expressions and so on and so forth were extended. Now with SSA forms we can do even better.
This is a global value numbering scheme which is very similar to the scheme with extended
basic blocks. But the scope of the tables is over the dominator tree. So it is not the
extended basic block that rules scope of the tables but it is the dominator tree. Therefore,
more redundancies can be caught for example, I am going to show you picture now, in block
B8 suppose you had d1 equal to u1 plus v1 as extra which are equivalent to a1 in the
block B1.
Suppose, let us stay look at this picture, in block B8 we had d1 equal to u1 plus v1
we had something here and a1 equal to u1 plus v1 is here, so B1 dominates B8. If we had
d1 equal to u1 plus v1, we could have use to a1 as the value of d1 directly that was
possible.
So with dominator as the leading theme; the scope of the table is over the dominator tree.
Therefore, we can catch a few more redundancies. We do not need any d-u or u-d change; definition
use or use definition, changes are edges are not needed here. They were needed for the
conditional constant propagation but they are not needed here. Another interesting feature
is it uses the reverse post order on the DFS tree, so what we will do is we take the SSA
graph I will show you that also in a minute.
This was the SSA graph, so we do a DFS-depth first search on this. This is the DFS tree,
now do a post order on this DFS tree, we get stop B9, B8, B4, B2, B6, B7, B3, B1, and start.
Now take the reverse of this so that gives your start B1, B3, B7, B6, B2, B5, B4, B8,
B9 and stop. So, start from the start node and use this order on the dominator tree.
We can see that we do a start, then we do B1, then we do B3, then we do B7, then we
do B6, then we do B5, then B4then B8, then finally B9 and then stop.
The idea is by the time you actually look at these children; you would have finished
the processing of their dominators. Therefore, the expressions which were defined in these
dominators are all available for use in these children, so that is what really is the basis
of this ordering.
This ensures that the definitions are processed before use that is a succession clip. The
back edges they may be present in the SSA graph, our example does not have a back edge
right now, but later I will show you an example with a back edge. So back edges make the algorithm
find fewer equivalences. So some expressions which we know are equivalent will not be marked
as an equivalent when there is a back edge. This is bad but there is not much we can do
about it.
The hash table that we used to store expressions is scope over the dominator tree. For example,
an assignment a10 equal to u1 plus v1 in block B9, if it is present, this is not present
in the example but suppose it present, it can use the value of the expression u1 plus
v1 of block B1 since, B1 is the dominator of B9.
Let me show you that if you had any d1 equal to u1 plus v1 or what was that? That was a10
equal to u1 plus v1 here. B1 is the dominator; it defines u1 plus v1. So, we would have use
that directly need not have defined yet and again we could have just used a1 in place
of a10 this is possible because B1 dominates B9.
So that is how the hash table works. If you had not used to scoping over the dominator
tree, we would not have caught this.
Now recalls that each name is unique in an SSA form, so variable names are not reused
in SSA forms at all. There is no need to store old entries in the scoped hash table when
the processing of a block is completed. Remember in the recall that in the case of extended
basic blocks, we actually had to remove all the new entries and restore the old entries
when we went out of scope. So when we return to the parent we had to remove the new entries
which were inserted by the children and then we had restored the old entries also.
That is not necessary here because the old entries are corresponded to old definitions
of the same variable, which were redefined in the new scope, here that cannot happen.
The each name is going to be defined exactly once so no more redefinitions, just deleting
new entries will be enough there is no question of restoring old entries here.
So any copies generated because of common sub expressions can be deleted immediately.
For example, we will see that a2 equal to u2 plus v2, b1 is u2 plus v2 so this becomes
a2 equal to b1; b1 is a-eter of b2 see. This is a copy we do not have to retain this copy
at all wherever a2 occurs we will be able to use b1 directly, we do not have to worry
whether there is a conflict of interest are something like that.
So how we do that? We are actually going to replace the value number of a2 with the value
number of b1, so whenever we want to search for a2 automatically get b1.
So that is how these copies can be deleted immediately. Copy propagation is carried out
during value numbering itself the way I just now mention. Copy statements generated due
to value numbering in the blocks B2 B4 B5 B6 B7 B8 can be deleted. So we are going to
see how the deletion happens? The valnum table store the SSA name and its value number and
is also is a global table. It is not scoped over the dominated tree I will show you the
reasons for it very soon. Value numbering transformation retains the
dominance property of the SSA form. What is the dominance property? Recall this every
definition dominates all its uses or predecessors uses in the case of phi functions. So the
condition constant propagation did not violate any of this, it actually preserved the dominance
property the value numbering transformation, also preserves the dominance property.
So I wanted to show you the picture of the valnum table that is here, so hash table entry
has expression and it is indexed by expression hash value; when we say an expression, a phi
function is also an expression; along with the value number of the expression we must
also have the parameters of the phi function, there are many parameters those are also stored
these will be useful later. There is something called defining a variable
that we need to store here. So the first time that the expression has occurred and of course
we take that and the variable on them left hand side of the expression is stored here.
Whenever we find expressions equivalent to the expression here within the scope of course,
we can use the defining variable in place of the new variable that we have encounter.
The valnum table is simple; it stores the variable name and it stores the value number
also, it is actually indexed by name hash value. This constant or not, etc is stored
here; if it is a constant, it is a constant value and then the replacing variable. As
I told you copies need not be kept, for each copy that variable name we need to have the
variable replacing the variable as well.
This is our example that we are going to run through; it has many many redundancies so
u1 plus v 1 is defined here, u2 plus v2 is defined here, so u2 plus v2 is here, again
here, where u3 plus v3 another u3 plus v3, but remember B2 and B3 do not dominate each
other nothing at all. For example, see here B2 and B3 are not dominators of each other.
So the u3 plus v3 which has defined here and B2 cannot be reused here, if it was actually
define here like u2 plus v2 it could have been reuse, but not now. Now whatever is defined
in B1 can be used in B2 and whatever is defined in B1 and B2 can be used in B4 the reason
being B1 and B2 dominate B4. Similarly, whatever is defined in B1 and B2 can be used in B5.
Finally, what about B8? B1 of course dominates B8 so whatever is here can be used here, B2
dominates B8 so whatever here can be used here, but neither B4 nor B5 dominate B8, so
whatever is used here defined here, cannot use as a commerce of expression here. But
we must remember that B4 and B5 supply parameters to the phi function here. For example, a5
is phi of a4 comma a2, a4 comes from here and a2 comes along this path. So it would
actually come with this way, a2 is defined here, but it comes this way.
Similarly, B6 gets B2 from here like that and B3 directly; the same is true for B5,
B6 and B7. When we come to B9 we get one parameter from here that is for a7, a5 is defined in
B8 so that comes here, a6 is defined in B6 that comes here, and a3 is defined in B3 is
at comes via B7, for B7 it is similar, c4 is also similar
Let us see why the valnum table should be unscoped and then run through the example,
so the unscoped valnum table is really needed for processing phi instructions. For example,
a phi instruction receives inputs from several variables along different predecessors of
a block; these inputs are defined in the immediate predecessors or the dominators of the predecessors
of the current block.
This is obviously true because of the dominance property, so phi gets many variables along
its incoming edges and they could be defined in immediate predecessors. For example here
a4 is defined in the immediate predecessor, but a2 is defined in the dominator that is
upwards that is what it is saying or dominators of the predecessors of the current block.
Of course, they may be defined in general in any block that has a control path to the
current block subject to the dominates property. For example, while processing block B9 we
need a5, a6, and a3. Let us look at that a5, a6, and a3. I already showed that a5 comes
from here, a6 also comes from here, but a3 comes from this point Whereas if you take
b7, b6 comes from here, b1 actually comes from all the way from the top and b5 comes
from here. So in general the variable could come from
anywhere subject to the dominants property of course, anyone of the dominators. However
each incoming arc corresponds to exactly one parameter of the phi instruction. Since, the
parameter can come from any of the dominators or the predecessors, we need an unscoped value
number table it is not possible to use a simple scoped table, we need an unscoped valnum table
for this.
This is the picture just now I showed you, let us run through this particular example
and see exactly how value numbering happens. There are more points to be noted as we go
along and we will define them at the end of the example.
The processing order is always given here, we are now processing the block B1 It has
two assignment statements a1 and b1 equal to u1 plus v1 and u2 plus v2. So u1 plus v1
and u2 plus v2 are entered into the hash table and then a1 and b1 actually get entered into
the valnum table no problem at all.
Then the next node to be visited is B3, so a3 is defined here as a same treatment u3
plus v3 is entered into the hash table and into the valnum table. Then B7, so this defines
u3 plus v3 which was already defined before, because the hash table is scoped over the
dominator B3 dominates B7. We find this b5 equal to u3 plus v3 is nothing but defined
already, so we have b5 equal to a3 as equivalent a3 is right here then c3 is equal to u3 plus
v3 that is also equivalent to a3 So we delete these two statements because
copy propagation can be directly done, we can store in place of b5 and c3 as a3 itself
and that value number automatically takes care of these two usages later.
Then we process B6 in that reverse post order, here we find a6 as b1 so u2 plus v2 which
is defined here in the dominator scope and c2 as b1 again, so that is also defined here
and we can delete these two Then we process B2, so here we find a2 as
b1 which was defined before, b2 as u3 plus v3 which is not defined before, because this
is a different dominator scope as such so B3 would have already gone out of scope, B2
is a new dominator scope so this is retained as it is and entered into the tables, but
c1 as u2 plus v2 is already equivalent to b1, c1 is equivalent to b1, because u2 plus
v2 is already defined.
Then we come to B5 here we find u2 plus v2, b3 is equivalent to b1 so this is deleted.
Here in B4, a4 is equivalent to b1 this is also deleted.
Then we come to B8, so in B8 a5 is a phi function with a4 and a2 as parameters. If you trace
back when you search the valnum table a4 is equivalent to b1, so that is what we have
replaced here and a2 is equivalent to b1 again, the second parameter is also b1. So such phi
functions which have all parameters as equal the same are meaningless phi functions. So
a5 is meaningless phi function we do not need it here, we can simply replace the phi function
by the parameter b1. So a5 is equal to b1, but b1 is already there before. This is a
copy statement and we can delete this as well.
But b6 is not so b6 is remains as phi of b2 comma b1. When we come to B9, a7 is a phi
function with a5, a6, and a3 as parameters; a5 is famous b1 so the first parameter is
b1; a6 is same as b1 second parameter is also b1; a3 is a3 there is no change, so it remains
as it is the phi function does not become meaningless.
The second phi function b7 has b6 b1 and b5 as 3 parameters; b6 is as it is; b1 is as
it is; and b5 is equivalent to a3, it is replaced by a3 so this phi function is also not meaningless
it remains as it is. Whereas c4 equal to phi of c1 comma c2 comma c3 c1 c2 c3 are equivalent
to b1 b1 and a3; c1 is b1 here, c2 is b1 and c3 is a3.
So now observe that a7 and c4 are exactly identical. Therefore, one of these needs to
be retained and the other can be removed c4 will be removed and it is called as a redundant
phi function, which is already covered by some other phi function in the same basic
block.
This is the simplified SSA graph after the copy statements and redound expression etc
are all removed. So this is how we are able to simplify.
Let us look at the SSA value numbering algorithm in a formal manner mark the beginning of a
new scope, so the basic block B is the parameter for each phi function f of the form x equal
phi of y1 to yn in the basic block B do search for the function f in the hash table name
is x and the parameters are y1 to yn. So this involves getting the value numbers of the
parameters, you have to dig into valnum tables get their parameters etc etc. Then use a special
hashing function for phi and there were many number of them available and then enter into
the hash table. Suppose you find that f is meaningless this
will be defined later, all yi are equivalent to some w that is all the parameters are equal.
Now replace the value number of x by that of w because this becomes x equal to w, all
these are w in the valnum table and delete f suppose, it is not meaningless but it is
redundant, so redundancy is there is some other phi function, which is equivalent to
this f of the form z equal to phi of u1 to un in the same basic.
So replace the value number of x by that of z in the valnum table and delete f so I already
showed you this otherwise, simplified f into hash table and the valnum table.
Then this is for phi function, what do you do for assignments? If it is x equal to y
plus z search for y plus z that implies take the value numbers of y z etc etc, apply a
hash function and search a hash table. If it is already present with value number n
then replace the value number of x by n in the valnum table and delete a. I showed this
if there is an expression already defined we do not have to keep the copy later. Otherwise,
add the simplified y plus z to hash table and x to the valnum table, so this is as before.
For each child, now we have finished processing B, so what about the children of B? In the
dominator tree in the reverse post order of DFS as I told you about the SSA graph call
SSA value numbering for each of the children. Finally, once we want to get out of B clean
up the hash table after leaving this particular scope, so initially we supply start as the
parameter and then call the function.
Let us look at some details of how phi functions are processed? One or more inputs of the phi
functions may not yet be defined for example, they may reach through a back edge of phi
of a loop, and such entries will not be found in the valnum table. Let me show you an example,
this B3 have been replaced by a new block which has u6 and u7 as new definitions, the
old one had only a3 equal to u3 plus v3. Now u6 is phi of b1 comma a7 and u7is phi
of b1 comma c4. Unfortunately, we found that a7 and c4 even though are equivalent we have
not processed this block, so we have not found that a7 and c4 are equivalent. We are processing
this block we have not processed this block and there is a back edge here. So because
of that we do not find u6 and u7 as equivalent, they are not redundant phi functions at all
because a7 and c4 are not yet found as equivalent.
So because of this back edge a7 and c4 will be treated as distinct, separate. Therefore,
this u6 and u7 will be actually entered into the hash table as if they are two different
phi functions, which are not equivalent to each other. So simply assign a new value number
to the phi instruction and record it in the valnum table and the hash table along with
new value number and the defining variable that is what we do.
So we do not really go through the value numbering scheme again and again. So that is why this
is not done, we just want to do it once it takes too much time. If all the inputs are
found in the valnum table, then replace the inputs by the respective entries in the valnum
table. Check whether the phi function is either meaningless or redundant if neither enters
the simplified function into the tables.
Now what about meaningless and redundant phi functions? All inputs are identical so for
example, in block B8 as I showed you if all the inputs are equal, then they can be deleted
this particular instruction can be deleted. Occurrence of defining variable can be replaced
by the input parameter; only valnum table needs to be updated. We saw this example already
for example, here this and this they become meaningless therefore, both this parameter
become equal
Redundant phi instruction means, we have already discussed this instructions in the same basic
block with exactly the same parameters. So redundant phi instructions can be deleted
in all occurrences of the defining variable in the redundant instruction can be replaced
by the earlier non redundant one tables are updated.
So this is how value numbering actually happens in SSA forms. Now let us look at something
somewhat interesting we saw value numbering on SSA graphs, which actually preserved the
dominance property. Now we will see how important it is for Liveness analysis.
What is liveness? Liveness is whether a variable is going to be used at a particular sometime
later that is, informally what liveness is of a variable? Really is for a variable which
is defined is there any use later on. So how do you find it in the SSA graph? It is very
simple for each variable v; walk backwards from each use of v, stopping when the walk
reaches the definition of v. So, collect block numbers on the way and the variable v is live
at the entry or exit, one or both as the case maybe, of each of these blocks.
So let me show you a simple example, here is our original flow graph and here is our
SSA flow graph. Let us look at some usage let say i2, i2 is used here and i2 is used
here also Now we want to find out all the blocks where i2 is live. Let us take this
which is very simple, we go backwards till the definition of i so we go up to his point
i2 is defined here from here to here. So in block B4 it is actually live at the
entry of block before it is live, but the end of block it is not because there are no
more usages here. At the output of block B2 it is live and at the entry of block B2 it
is not live, because i2 is defined immediately. So B4 in and B2 out is collected, then we
start here so B7 in is collected, then B5 out is collected, B5 in is collected, B3 out
is collected, B3 in is collected, and then B3 out is already there and finally, we have
reached the definition point of i2 so all these blocks are deemed as live blocks for
i2. We can do the same thing in along the other
path also from here B7 in B6 out B6 in then B3 onwards we have already collected. So B2
and B3 both in and out, B4 only in B5 B6 both in and out and finally B7 in these are the
points where i2 is live. What about n2? So you can look at n2 which is here, which is
using here also, for n2 you go on you just defined here so B5 in B3 out B3 in and B2
out.
Similarly, here B6 in and then onwards we have already collected. Whereas if you look
at n3 or n4 we start n3 is defined here, so only this much n4 is defined here. This is
how the liveness is computed? We just go walk backwards from each use of v stopping when
the walk reaches the definition of v, collect the basic block numbers on the way and the
variable v is live at the entry and exit, one or both of these basic blocks. So I already
showed you this particular example. Why this procedure works? It works because
the SSA forms and the transformations we have discussed satisfy or preserve the dominance
property, so again we must recall that the dominance property is definition of variable
dominates each use or the predecessor of the use, when the use is in a phi function.
Suppose the dominance property was not satisfied then there is a problem what may happen is
that the whole SSA graph may have to be searched, we do not know dominance property is not satisfied.
Here we are 100 percent certain that if we reach the definition that is sufficient because
the definition dominates all the uses or at least the predecessors of the uses.
So because of that we stop at the definition, we do not have to go beyond that whereas if
the dominance property is not true, then the definition could be anywhere it is not necessarily
up to the dominator only. So the whole SSA graph may have to be searched and to find
the corresponding definition. This is a sample of how liveness analysis
is possible. It is also possible to actually do partial redundant elimination and a few
others, but they are much more complex than what I have presented so far those are outside
the scope of this particular, of course. So this is the end of the lecture and in the
next time we will look at parallelization thank you.