39 Loop Transformation and Aliases

After discussing the various algorithms for function preserving transformations, we need to optimize the loops. In the previous module, we discussed to perform loop invariant computation and in this module we will continue to optimize the loops further. Another section that requires optimization is handling of aliases. Aliases could be defined with respect to pointers and in this module we will discuss algorithms to handle pointer aliases.

39.1 Induction variable identification and elimination

Induction variable elimination is carried out by first identifying the induction variable, followed by strength reduction of the induction variable and then induction variable elimination. During strength reduction, other state ments that require strength reduction are also carried out.

A loop is executed through an induction variable. Thus a variable ‘x’ is induction if every time the value of ‘x’ is changed by a constant ‘c’. So, to identify the induction variable, we first look for basic induction variable i := i ± c. Then we look for derived induction variable ‘j’ which are defined in terms of the basic ‘i’.

Algorithm 39.1 gives the procedure for induction variable identification.

Algorithm 39.1 – Induction variable identification algorithm

• Input: A loop L with reaching definitions and loop-invariant computation

• Output: a set of induction variables

Identify_Induction_Variable()

{

• Find basic induction variable based on loop- invariant computation (i,I, 0)

• Search for a variable k having the following forms

• k := j * b, k := b * j, k := j / b, k:= j ±b, k:= b ±j

• b – constant, j is an induction variable

• Triple for k is (j, b, 0)

• Compute the triple and accumulate to the list of inductions variables

}

After identifying the induction variables, if these variables are not going to be used after the iteration then it could be replaced by some useful variable. If such a useful variable is not possible to be identified then the induction variable is retained. If an induction variable that could be eliminated is identified, then Algorithm 39.2 is used to eliminate this induction variable.

Algorithm 39.2 – Eliminate induction variable

• Input: A loop L with reaching definition information, loop-invariant computation and live variable information

• Output: a revised loop Eliminate_Induction_Variable()

{

• Take some induction variable ‘j’ in ‘i‘s family with (i,c,d) and modify each test that ‘i‘ appears in to use ‘j’ instead. Constant ‘c’ is positive.

• if ‘i’ relop ‘x’ goto B is replaced as

• r := c*x, r := r+d, if j relop r goto B

• if i1 relop i2 is also replaced with new variable if j1 relop j2

• Delete all assignments to the eliminated induction variables from the loop L

• Consider every new statement j:= s

• Verify that no assignment to ‘s’ between the introduced statement and the use of ‘j’

• Replace all uses of ‘j’ by uses of ‘s’ and delete j := s

}

The output of induction variable elimination will be discussed after carrying out strength reduction as strength reduction will involve removal of the induction variable.

39.2 Strength Reduction

After identifying the induction variables, they are checked to see if we could mod ify them to use additions / subtractions against multiplication / division. If such a transformation exists we replace it in the loops for these induction variables and this procedure is termed as strength reduction. Algorithm 39.3 discusses strength reduction of induction variables where the input in a loop and output will be a revised loop which is strength reduced.

Algorithm 39.3 – Stre ngth reduction

• Input: A loop L with reaching definition information and induction variables computed

• Output: A revised loop

Reduce_Strength()

{

• For each induction variable i in turn, for every induction variable j in the family of i with triple (i, c, d): (j := i *c +d)

• Create a new variable ‘s’

• Replace the assignment to j by j:=s

• Immediately after each assignment i := i+n, append

• s := s + c* n

• Place ‘s’ in the family of ‘i‘ with triple (i,c,d)

• s is initialized to c*i+d

• s := c * i

• s := s+d

}

The algorithm works on the triple that is created as part of the induction variable identification procedure. We create a new variable‘s’ to store the initial computation of the higher strength computation. We then increment or decrement this variable‘s’ to take care of the actual computation.

Let us discuss the quick sort example flow graph which we had discussed earlier in the previous module. We had optimized two basic blocks B5 and B6 and that does not belong part of loops. So we have omitted those two blocks and the figure is represented in figure 39.1. In figure 39.1, the following are the modifications:

• B2 and B3 are inner loops

• Induction variable in B2 is ‘i’ and t2 (i, 4, 0)

• A new variable is constructed as s2

• t2 := 4 * i is replaced with t2:= s2

• Inserts the assignment s2 := s2 + 4 after i:= i+1 Similarly:

• Induction variable in B3 is ‘j’ and t4 (j, 4, 0)

• A new variable is constructed as s4

• t4 := 4 * j is replaced with t4:= s4

• Inserts the assignment s4 := s4 – 4 after j:= j-1

The revised flow graph is given in figure 39.2 with the variables s2 and s4 introduced and the higher strength computed expression moved to the pre-header.

After carrying out strength reduction, the statements, “i:=i+1” and “j:=j-1” which involves induction variables are considered to be useless as their values are not going to be used after the iterations and hence they are removed from basic blocks B2 and B3. The result is shown in figure 39.3.

As given in algorithm 39.2 the induction variables and the temporary assignment t2:=s2 and t4:=s4 are eliminated from the basic blocks B2 and B3.

39.3 Dealing with aliases

Pointer assignments and accessing variables through pointers is a common situation in any programming language. When we talk about arrays, the name of the array indicates the address of the array. Thus when a variable is declared using an array, the content of the ‘ith’ element o f an array “A” is accessed through one of the following ways:

A[i], A:= A+i, *A, *(A+i)

Thus, the value at A[i] could be modified also using multiple means. This is what we term as alias. If two or more expressions denote the same memory address we say that the expressions are aliases of one another. Thus we can say that presence of pointers which causes aliases to occur makes data-flow analysis more complex. Pointer ‘p’ can point to some variable and is to assume that an indirect assignment through a pointer can potentially change any variable. Consider a language having preliminary data types. If pointer ‘p’ points to a primitive data element, then any arithmetic operation on ‘p’ produces a value that may be an integer. If ‘p’ points to an array, addition or subtraction leads to ‘p’ somewhere in the array. If ‘p’ points to other array, then the impact of this would have to be dealt by the optimizing compiler to ensure allowing only correct assignments.

39.3.1 Effects of pointer assignments

Variables that could possibly be used as pointers are those declared to be pointers. In addition, pointers could also refer to temporary variables that receive a value is a pointer plus or minus a constant. Pointer can point to one of the following situations:

• If there is an assignment s: p := & a then immediately after s, ‘p’ points only to ‘a’. If a is an array, then p can point only to ‘a’ after assignment of the form p := &a ±c, where &a refers to &a[0]

• If there is an assignment, s: p:= q ±c , ‘p’ and ‘q’ are pointers, then immediately after ‘s’, ‘p’ can point to any array that ‘q’ could point to before ‘s’

• If there is an assignment, s: p := q, ‘p’ points to what ‘q’ points to

• Any other assignment to ‘p’, there is no object that ‘p’ could point to such an assignment is probably meaningless

• After any assignment to a variable other than ‘p’, ‘p’ points to whatever it did before the assignment

39.3.2 Alias computation

Thus as can be seen in the previous discussion, a pointer variable can be made to point to more than one location. We use data flow equations to compute aliases. Let ‘p’ be a pointer and let ‘a’ be a set of variables. The following components of the equation are defined:

• in[B] – (p, a) – set of variables {a} to which ‘p’ could point at the beginning of B

• transB – transfer function that defines the effect of block B. This function takes a set of pairs, S of the form (p,a) and produces another set T. transS is computed for every statement and transB is the union of trans S The following are the rules for computing transS for every statement S:

• if S: p:= & a or p:= &a ± c, where ‘a’ is array then – transS (S) = (S – {(p,b) | any variable b}) U (p,a) Then transfer function is computed by removing the previous definitions of ‘p’ forall the variables ‘b’ and the new definition of ‘a’ is added for variable ‘p’

• If s: p: = q ± c for pointer ‘q’ and ‘c’ is non-zero – transS (S) = (S – {(p,b) | any variable b}) U {(p,b) | (q,b) is in S and b is any variable} Here the previous definitions of ‘p’ is removed and new definitions for ‘p’ is added and it is added based on the definitions available for pointer ‘q’

If S: p:= q then– transS (S) = (S – {(p,b) | any variable b}) U {(p,b) | (q,b) is in S} This is similar to the previous case only where the definition of ‘q’ will be the definitions of ‘p’ only. To start with the previous definitions of ‘p’ are removed and the definitions of ‘p’ are added as the definitions of ‘q’.
If ‘S’ assigns to pointer ‘p’ another expression the – transS (S) = (S – {(p,b) | any variable b}
If s is not an assignment to a pointer the– transS (S) = She following data-flow equations are used to compute the alias which is the output of out[B]

out[B] = transB (in[B]) – this calls for computing the transfer function of all blocks
in[B] = U out(P) where P is a predecessor block – the output of all predecessors blocks are computed as union to determine the in[] of every block.
transB (S) = transsk (transsk-1 (transsk-2 … ))) – This is computed as a cumulative of all the transfer functions. Consider the following example shown in figure 39.4 and let us compute the aliases of all these blocks using the data flow equations.

As in the previous case we assume the in[B1] = Φ
• out[B1] = transB1 (Φ) – B1 has one statement and hence
out[B1] = transB1 (Φ) = {(q,c)}
• in[B2] = {(q,c)}
• p:= & c replace all pairs of’ ‘p’ with (p,c)
• remove all definitions of ‘q’ and replace with (q,a)
• out[B2] = transB2 ((q,c)) = {(p,c), (q,a)}
The remaining computations are shown in Table 39.1. This has to go through multiple passes as
in the case of other data flow equations and the II pass is shown in Table 39.2

Thus we have identified the aliases which indicates that the values of the variables ‘a’ and ‘c’ could be changed through pointers ‘p’ and ‘q’. Alias analysis are used for the following scenarios to perform code optimization

For live variable analysis
Dead variable analysis
Reaching definitions

Summary: In this module, we have understood the algorithms for performing loop optimizations like strength reduction and induction variable elimination. We also discussed the impact of aliases using pointers and the data flow equation to identify the same. In the next module we will discuss the optimizations that can be carried out in procedures.