31 DAG Construction, reordering and labeling
Rajeswari Sridhar
In this module, we will understand to construct a directed acyclic graph (DAG) for every basic block. The construction of DAG for arrays and pointers will be discussed as there is a difference in the construction. The concept of rearranging of instructions will be dealt in the context of DAG. The nodes of the DAG will also be labeled to have optimized use of registers.
31.1 Directed Acyclic Graph – DAG
DAG is a useful data structure for representing the basic blocks. DAG can be very useful in identifying and implementing structure preserving and algebraic transformation on basic blocks. DAG gives all the necessary details of how the value is computed for each statement of the basic block. Applications of DAG include identifying common sub-expressions in a basic block, variables which are defined inside the block but evaluated outside and which statements of a block gets their value computed outside the block.
31.1. DAG construction
A DAG is typically constructed for each basic block. A DAG is a directed acyclic graph and thus has nodes and edges. Leaves are labeled by unique identifiers such as variable or constants of the three-address instruction and thus the corresponding r-values of the expression. Interior nodes are labeled by an operator. Nodes are optimally given a sequence of identifiers. Algorithm 31.1 elucidates the DAG construction.
Algorithm 31.1DAG construction
• Input: Basic block
• Output: DAG having label for each node – operator or variables. For each node a list of attached identifiersDAG_Construct( )
{
1. x := y op z
2. x := op y
3. x := y
4. If node (y) is undefined create a label ‘y’ and let it be node (y) and so for node (z) if ‘z’ is not there
5. In case (i) check if there is a node labeled ‘op’ whose left and right children are ‘y’ and ‘z’ respectively. If not create such a node and let it be ‘n’. In case (ii) ignore ‘z’ and do the same thing. In case (iii) let the node(y) be ‘n’
6. Delete x from the list of attached identifiers for node(x). Append ‘x’ to the list of attached identifiers for the node ‘n’ and set node(x) to n
}
Lines 1-3 of the algorithm, gives the possibilities of the expression for the construction of DAG. Line 4 checks whether the operands of the RHS expression is available in the DAG. If it is available, we use the same nodes; else we create new nodes for these operands. Line 5 checks whether the DAG has a operator node with the one in the RHS of the expression whose left and
right children are ‘y’ and ‘z’ respectively. If it is available we use it, else we create a new node with ‘op’ and attach edges from this node to the operands ‘y’ and ‘z’ respectively. For the expressions defined in line 2 we repeat the same process but with one operand. For the case define in line 3, we name the node(x) as node(y). Consider the following example involving three-address code instructions:
- t1 := 4 * i
- t2 := a[t1]
- t3 := 4 * i
- t4 := b [t3]
- t5 := t2 *t4
Instruction 1, checks if the constant ‘4’ and ‘i’ are available. As it is not available, we create these two nodes. Since, these are new nodes, a new internal node with operator, ‘*’ is created with left and right children as ‘4’ and ‘i’ respectively and the interior node is labeled as t1. For line 2, the nodes are created with ‘a’ as the left child and since t1 is already there it is used and a new interior node is created and is labeled as t2. For the instruction at line 3, since nodes ‘4’, ‘i’ and ‘*’ whose left and right children are ‘4’ and ‘i’ are already there, they are used and thus t1 and t3 corresponds to the same node. This aspect of the DAG helps in identifying the common sub-expression. Proceeding in a similar fashion, we end up with the DAG as available in figure 31.1
31.2 DAG construction involving Arrays, Pointers and Procedures
Consider the following sequence of statements involving array operations
x := a[i]
a[ j] := y
z := a[i]
These sequence of statements can be interpreted as same as the following sequences.
x := a[i]
z := x
a[j] := y
If i = j, then the value chosen is different for ‘z’ and thus we have committed a mistake. So, while processing array, we kill all nodes labeled [] to prevent confusion in computation. Similar scenarios are adopted for pointers and procedures. Thus we propose a modification to the algorithm for DAG construction and the following are updated:
• Evaluation of assignment to an array must follow the previous assignment if there is one
• Any assignment to an array element follow any previous evaluation of ‘a’
• Any use of an identifier must follow the previous procedure call or indirectly through a pointer if there is one
• Any procedure call or indirect assignment through a pointer must follow all previous evaluations of an identifier
31.2 Code generation from DAG
In the previous module we understood the simple code generator which generates code from the basic blocks. In this module we will start understanding the algorithm to generate code from the DAG. The steps involved in the algorithm to generate code from DAG include
• Rearranging the order – To optimize the code generation, the instructions are rearranged and this is referred to as heuristic reordering
• Labeling the tree for register information – To know the number of registers required to generate code, the labels of the nodes are numbered which indicate the number of registers required to evaluate that node.
• Tree traversal to generate code – This reordered labeled tree is traversed to generate code based on the target language’s instruction. In this module we will see how to rearrange the nodes and label them for optimal register usage. In the next module, we will look at the code generation algorithm from DAG.
31.3. Rearranging the order – Heuristic reordering
Rearranging the nodes involves changing the order of independent statements of the DAG which will help efficient utilization of the registers. This rearranging of nodes also helps in reducing the final cost of assembly level code. The algorithm is called Node_listing and is given in Algorithm 31.2
Algorithm 31.2 Node listing algorithm
Node_listing ( )
{
while unlisted interior nodes remain do begin
- select an unlisted node n, all of whose parents have been listed ; list n;
- while the leftmost child m of n has no unlisted parents and is not a leaf do /* since n was just listed, m is not yet listed*/
begin
- list m;
- n = m
end
end
}
Initially all nodes are unlisted. The outside while loop ensures that all nodes are being listed. A flag is used to keep track whether all the nodes are listed or not. We start from some node whose parents are all visited. In the first iteration, none will be listed and we list the root. We start by looking at the leftmost child of an interior node ‘n’ and if it is not a leaf, we list that. We then iterate by considering the node that is visited as the child and try to list its parent. If there are no more nodes to list, we then consider an arbitrary node as given in the outside while loop. After listing the nodes, we will reverse the string which is the order to execute the nodes.
Consider the following DAG given in figure 31.2 where the nodes are numbe red. Let us try rearranging this DAG for optimal ordering using the node_listing algorithm
We start from the root and list the root as “1”. We then find and list its left child and list node “2” based on the inside while loop. ‘6’ is the left child of ‘2’, but ‘6’ has an unlisted parent ‘5’ and therefore we exit the inside while loop. The outside while loop is continued by listing node ‘3’. Its left child is ‘4’ and we list it. It’s left child is ‘5’ and we list followed by listing ‘6’ now as both of ‘6’’s parents are listed. Nodes 7, 9, 10 are leaf nodes and so we don’t list them. We continue with the outside while loop to list node ‘8’ and nodes 11 and 12 are leaves and hence not listed.
Thus, the listed nodes are “1234568”. This string is reversed to yield, “8654321”. This indicates we need to evaluate node 8 followed by 6, 5, 4, 3, 2 and finally 1. The following is the sequence of instruction after rearranging.
• t8 := d +e
• t6 := a +b
• t5 := t6 – c
• t4 := t5 * t8
• t3 := t4 – e
• t2 := t6 + t4
• t1:= t2 + t3
31.4 Labeling algorithm
Register allocation is a core work in the context of code generation. Using DAG many optimizations and evaluation could be performed. Common sub-expressions identification, heuristic reordering are some of the optimizations that were discussed. In addition, the nodes of the DAG could be labeled which is used to identify the number of registers necessary to compute every node. The code generation for DAG uses this label to generate the target lan guage’s instruction. Here, “left leaf” means a node that is a leaf and the left most descendent of its parent. All other leaves are referred to as “right leaves”.
Labeling can be done by visiting node in a bottom up order so that a node is not visited until all its children all labeled. The order in which the first three nodes are created is suitable if the parse tree is used as intermediate code, so in this case the labels can be computed as syntax directed translation.
A node ‘n’ is labeled using the following equation The algorithm for labeling is given in Algorithm 31.3. Algorithm 31.3 Node labelling
Node_labelling( )
{
if n is a leaf then
if n is leftmost child of its parents then
label (n) = 1
else
label (n) = 0
else begin /* n is an interior node */
let n1, n2 , …. , nk be the children of n ordered by label ,
so label (n1) >= label (n2) >= ….>= label (nk) ;
label (n) = max (label(ni) + i – 1)
end
}
The outside if-else checks if a node is a leaf or interior. The algorithm starts by assigning the leaf node a label of 0 or 1 depending on whether it is a right leaf or a left leaf. The traversal then proceeds to the interior nodes. The interior nodes are labeled in the “else” of the outside if- else block. The interior node is labeled by computing the maximum of the labels of its children if they have different values and is assigned “1” value more if they have same values. Consider the following DAG given in figure 31.3. Let us see the computation of the labels using the node_labeling algorithm
We theoretically use post order traversal for label computation. Node ‘a’ is labeled 1 since it is the left most leaf node. ‘b’ is labeled 0 as it is the right leaf node. Parent of a, b is assigned max
(1,0) which is 1. We then assign ‘e’ with a value of ‘1’ as it is a left leaf node, ‘c’ and ‘d’ with the values of ‘1’ and ‘0’ as they are the left and right leaf nodes. Node t2 is labeled ‘1’ which is the maximum of nodes ‘c’ and‘d’’s label. Node t3 is assigned ‘2’ as its children have a label ‘1’ and this node’s label is computed as ‘1’ + label (e). Root’s label is given as ‘2’ as its right child has a maximum value of ‘2’
Summary: In this module we looked at DAG construction procedure for simple expressions and in the context of array and pointers. We also started the code generation procedure for DAG which involves heuristic reordering and labeling. The algorithm for heuristic reordering and labeling were also discussed. In the next module, we will discuss the procedure to generate code from this reordered and labeled DAG.