40 Procedure Optimization

Rajeswari Sridhar

 

In this last module, we will try to understand the procedure optimization techniques. We shall discuss how a function could be optimized being recursive or non-recursive. We shall also discuss the concepts involved in Inter-procedural analysis.

 

40.1 Types of procedure optimizations

 

When it comes to procedure optimizations, we shall first indicate the definitions of the functions. A function could be recursive or non-recursive. Based on that, the following possibilities need to be considered while optimizing the functions.

 

•      Tail-Call Optimization vs. Tail- Recursion Elimination – A call is said to be “tail”, if calling another function is the last statement of the current function . If the same is carried on in a recursive call, it is said to be tail- recursion

•      Procedure Integration vs. In-line Expansion – Combining certain procedures to avoid the calling stack mechanism need to be studied as against in- line expansion of certain functions.

•      Leaf-routine Optimization vs. Shrink Wrapping – A function call is typically visualized as a call graph and if the leaf node is another function call we call this as a leaf-routine. This is done using a method called as shrink wrapping.

 

This module discusses all of these types of procedures as part of optimization.

 

40.2 Drawbacks of “Call”

 

Before we try to do optimization on procedures, let us first define the necessity to optimize these procedures. This can be well understood if we know the mechanism of a function call. In a calling convention, there are two players: a caller and a callee. A caller is one who calls another function and a callee is one who gets invoked by a function. The caller and callee need to handle the following:

 

•      Caller – need to take care of parameter passing, caller-saved register, return address branch after the callee returns

•      Callee – This has two components: a prologue and epilogue that is to be done before and after executing the current function.

–  prologue – save frame pointer, compute stack pointer, callee-saved register

–  epilogue : callee-saved register, return value, stack pointer, frame pointer, branch  with this in the caller / callee’s domain, from an optimization view point, there is very less chance to optimize between procedures. However, within a procedure there is some space for optimization.

 

40.3 Tail-call optimization vs Tail Recursion elimination

 

A tail-call is one where the last statement of a function call is a call to another function. If the same thing is applicable to a recursive function the situation is termed as tail recursion. Consider the following example:

 

void f(int x) {

 

g(x);

(return;)

 

}

 

In the above example, function f() calls another function g(x) as the last step before return statement. Similarly, the following indicates tail-recursion, where f(x) is called as the last statement of the function f().

 

void f(int x) {

 

f(x);

(return;)

 

}

 

Tail recursion results in procedure-call overhead. This can be eliminated by converting it to a loop and performing loop optimization. Consider the following function which has a tail-recursion:

void insert_node(int n, struct node *l)

{

 

if(n > l→values)

if(l→next==null)

 

make_node(l,n);

else

insert_node(n,l→next);

}

The final call to insert_node could be eliminated by converting the recursive call to a loop as given below

void insert_node(int n, struct node *l)

{

 

Loop:

if(n > l→values)

 

if(l→next==null) make_node(p,n);

else { l = l→next; goto Loop; }

}

 

We introduced a label loop which is the point to continue and removed the recursive call and converted it to an iterative call. Tail-call is termed as the routine performing the call does nothing after the call returns except return itself. During a tail call, the caller branches into the body of the other function and hence care should be taken care of scope of variables and their return values. The following is an example of tail-call.

void make_node(struct node *p, int n)

{

 

L0: struct node *q = malloc(…);

p→next = q;

 

}

void insert_node(int n, struct node *l) {

if(n > l→value)

if(l→next==null) make_node(l, n);

 

}

The function insert_node() makes a tail-call to the function make_node.

 

Tail-call optimization is performed by following the steps given below:

 

•      Both procedure bodies should be visible to the compiler.

–  It should be in the same compilation unit or should have their intermediate-code

•        Need to know about callee.

–  where it expects to find its parameters

–  where to branch

–  stack frame size.

•      Replace the call by three things,

– evaluation of the arguments and putting them where the callee expects to find them.

–  if callee’s stack frame is larger than caller’s,  an instruction that extends stack frame as difference.

–  a branch to the beginning of the body of the callee

•        Final step is to delete ‘return’ after a call. Consider the following example to perform tail-call optimization.

insert_node :

{ …

 

r2 ← r1

r1 ← r4

 

call make_node

return;

}

 

This will be converted to the following:

 

insert_node :

{

 

r2 ← r1

r1 ← r4

 

goto make_node

}

 

On the other hand, the following sequence of steps is performed for removing tail-recursion.

 

•      Replace the recursive call by

–  assign proper values to the parameters, and

–  branch to the beginning of the body of the procedure

•      Delete ‘return’ after recursive call

Consider one more example of tail-recursion elimination.

 

void replace(int n) {

if(n>=10) return;

 

if(A[n]==0) A[n]=1;

else replace(n+1);

}

 

In the above function replace(), the function gets called as the last step. The following is the optimized and tail-recursion eliminated function replace()

void replace(int n) {

Loop:

if(n>=10) return;

if(A[n]==0) A[n]=1;

 

else {

n = n+1;

 

goto Loop;

}

 

We have introduced the label loop to go to the beginning of the function. The body of the else has multiple statements, which increments the variable ‘n’ and then a ‘goto’ to the beginning of the body of the function.

 

40. 4 Procedure Integration and Inlining

 

Procedure integration is also called as ‘automatic inlining’ where the body of the procedure is available in the caller’s body itself. This replaces calls to the copy of the procedure body. A call can have unknown effect of the objects in the procedure on aliased variables and the local code gets expose and thus could enable more optimization. This is however better than ‘inline’ of C++ where it is optimized by user’s intuition. The following are some of the issues with procedure integration:

 

  • §   Range of inlined procedure – one has to save intermediate-code representations
  • §     Languages of caller and callee (cross compilation units)
  • §    Need to handle the different languages and their various parameter passing conventions
  • §    Need to handle the “external language_name procedure_name” declaration to specify source languages
  • §     Saving intermediate-code of in- lined routines
  • §     Need to know the purpose of saving intermediate-code
  • §    Need to compile a copy of the whole in- lined procedure
  • §     address of the procedure has been taken
  • §     calls from other compilation units, currently invisible and need to be addressed
  • §    Inlining on recursive procedures – To be avoided as running out of calls to them could be infinite process. It is valuable to inline once or twice

The goal of inlining procedures should be to reduce e xecution time and avoid the calling stack mechanism. If every procedure is inlined then the following overhead need to be addressed:

  • §   decrease overhead costs of call
  • §    increase in object code size
  • §    more cache misses
  • §    compilation terminates only by exhaustion of resources
  • §     We need heuristics or profiling feedback

The following need to be considered while inlining procedures:

  • §    The size of the procedure is small,
  • §    The procedure which is called less,
  • §     The procedure that is called inside a loop
  • §     The procedure whose call includes constant-valued parameter

To perform inlining we need to handle the three major issues:

 

•      Different parameter passing conventions

–  “external language_name procedure_name” declaration

–  call-by-reference vs. call-by-value

•      Name conflicts

–  conflicts between source symbol names need to be resolved

–  detect conflicts and rename symbols of called procedure

•      Static variables

–  makes only one copy of static variable

–  initialized once

 

40.5 Leaf-Routine Optimization

 

Leaf routine is similar to the statement that is appearing at the end of a function. Typically a call graph is constructed for a program to visualize the parameters that gets passed between the different functions. A leaf routine is one which is part of the leaf node in the call graph of a

 

program. It is a routine that calls no procedures and hence many procedures are leaf routines. Leaf routine optimization, simplify the way parameters are passed, remove procedure prologue / epilogue and it is highly desirable with little effort. Leaf routine optimization is architecture-dependent. It depends on how many registers and stacks the procedure needs. It typically needs no more registers than caller-saved registers and thus requires no stack space with sufficient registers.

 

Shrink Wrapping is one of the leaf routine optimization methods. It involves, moving prologue and epilogue code to contain a smaller part of the procedure inside a loop and thus making many copies of codes. Figure 40.1 shows the prologue and epilogue in a flow graph and how that is typically brought closer using shrink wrapping.

 

 

40.6 Inte r-procedural optimization

 

Inter-procedural optimization could be attempted by performing inter-procedural analysis. This involves gathering information about the whole program instead of a single procedure. Inter-procedural Optimization is a program transformation method that involves more than one procedure in the program. Typically two Inter-procedural problems are solved: MOD Analysis and Alias Analysis. MOD analysis involves identification of modification to variables or side-effects that could be caused to the variables. Alias analysis is to verify whether the function or expressions are aliases to others.

 

40.6.1 Modification and Reference Side-effect

 

This is one inter-procedural problem. Consider the following code:

S0:

 

S1:

COMMON X,Y

DO I = 1, N

 

CALL P

 

X(I) = X(I) + Y(I)

 

ENDDO

In this function the call to P can be vectorized if P

 

1.      neither modifies nor uses X

2.      does not modify Y

 

MOD(S): set of variables that may be modified as a side effect of call at S

 

REF(S): set of variables that may be referenced as a side effect of call at S DO I = 1, N

S0:

S1:

CALL P

 

X(I) = X(I) + Y(I)

 

ENDDO

 

Since there is a call at P, it is not possible to vectorize S0 but can vectorize S1 as it is just some addition computation and doesn’t have the modification or reference side effect.

 

40.6.2 Alias Analysis

 

ALIAS(p,x): set of variables that may refer to the same location as formal parameter ‘x’ on entry to ‘p’. This is similar to the aliases that were discussed in the context of pointers in the last module. Consider the following example:

 

SUBROUTINE S(A,X,N)

COMMON Y          /* Y is global variable */

DO I = 1, N

 

S0:X = X + Y*A(I)

ENDDO

 

END

In this subroutine, it would be efficient to keep X and Y in different registers and store in X outside the loop. When there is a call, CALL S(A,Y,N), then Y is aliased to X on entry to S. The procedure cannot delay update to X in the loop any more to avoid aliasing.

 

40.7 Graph Construction in designing compiler

 

Typically there are three types of major graphs that are constructed as part of the designing of a compiler.

  • Control-Flow graph G=(N,E) – Used to indicate the flow of control among the various  instructions

–  N: the set of all instructions (or basic blocks)

–  E: the set of control flow edges

  • Edge (p,q) is in E if instruction ‘p’ may be followed by instruction q
  • Call Graph G=(N,E) – Used to indicate the data flow between procedure calls

–  N: one vertex for each procedure

–  E: one edge for each possible call

  • Edge (p,q) is in E if procedure p calls procedure q
  • Binding graph GB=(NB,EB) – used to indicate the parameter bindings

–  One vertex for each formal parameter of each procedure

–  Directed edge from formal parameter, f1 of p to formal parameter, f2 of q if there

exists a call site s=(p,q) in p such that f1 is bound to f2

–     |NB| <=   |E|, |EB| <=      |E|, while G=(N,E) is a call graph

 

Other graphs like Dependency graph to indicate the interaction between the non-terminals, the DAG, are part of the compiler design process.

 

A Call Graph G=(N,E) is an important data structure where the nodes indicate the number of procedures and the edges E are defined one edge for each possible call. In addition, Edge (p,q) is in E if procedure p calls procedure q. It looks simple but construction is difficult in presence of procedure variables.

 

As these are advanced topics, it has been skipped in this version of the online material.

 

Summary: In this module we discussed procedure optimization and inter-procedural analysis methods that could be adapted. As a thumb rule, the cost of optimization should not exceed the cost of computing without optimization.

 

In this course, we have discussed the following:

  • Phases of the compiler
  • Tools of the compiler – LEX, YACC
  • Lexical Phase – Automata, Regular expression
  • Syntax Phase – Top Down and Bottom up Parser
  • Semantic Phase – Type Checking, Run-time storage management
  • Intermediate Code Generation – 3 address code, Backpatching
  • Code generation – Simple, DAG, Dynamic Programming, Template Base
  • Code Optimization – Transformation, Data- flow, Control flow analysis, Procedure optimization