36 Global Data Flow Analysis

Rajeswari Sridhar

After knowing the concepts of optimizations in basic blocks and loops, in this module we will study about how to perform optimization by means of data flow analysis. We shall discuss the need and usage of data- flow analysis from a code optimization perspective. We shall also discuss the computation of data-flow equation parameters for the various types of statements.

36.1 Global Data-Flow Analysis

Knowledge about the behavior of a variable is essential for performing function preserving transformations. In addition to the behavior of the variables, control flow information is also required to do transformations across basic blocks. In addition, to apply global optimizations on basic blocks, data- flow information is collected by solving systems of data- flow equations.

36.1.1 Data flow equations

A variable is defined at a particular statement. The value of this variable may reach subsequent sequence of instructions. The variable when it is defined, we call it as being generated and is indicated using gen[S], which means, statement S generates this variable. Thus this variable with the value it has gets forwarded to all other statements until some other statement kills this definition of the variable. Thus to determine the reaching definitions for a sequence of statements “S” the following equation is defined in terms of 4 functions.

out[S] = gen[S] È (in[S] – kill[S]) à (36.1)

In the above equation 36.1, out[S] indicates the variables which are available after statement S and this is computed in terms of the variables the current statement generates and including the statements that enter a block eliminating the definitions of variables the current statement kills. The information at the end of a statement S is either generated within the statement or enters at the beginning and is not killed as control flows through the statement.

36.1.2 Factors for setting up data -flow equations

The notion of killing and generating statements depends on the desired information and on the data-flow analysis problem to be solved. For some problems the function out[S] needs to be defined in terms of in[S] and for some in[S] needs to be defined in terms of out[S]. Data flow is affected by the control flow of the program. The function out [S] is based on the assumption that there is a unique end point. In addition, we need to consider that the variable assignments through pointer variables, procedure calls, assignments to array variables influence the data flow. Consider the following example given in figure 36.1, involving three basic blocks. We will look at each statement as a definition of the LHS variable.

We define two important terminologies before discussing the data flow equations. A “point” is defined as the position between two adjacent statements. In addition, a “point” can also be the position above the first statement and following the last statement. For example, in figure 36.1, basic block B1 has 3 points while B2 has 2 points. If we consider all the blocks then each will have many points. If we try to merge the last point of a current block with the first point of its successor block then we could end up with the sequence of statements that can be looked together to perform some optimization. On the other hand, a path is the sequence of statements between any two points.

36.1.3 Reaching Definitions

As discussed already, a definition of a variable ‘x’ is a state ment that assigns or may assign a value to ‘x’. This definition of the variable ‘x’ is unambiguous if a simple assignment holds good. On the other hand, if ‘x’ is used as a parameter of a procedure or through pointer then ‘x’ is said to have defined with ambiguity. Definition ‘d’ reaches a point ‘p’ if there is a path from the point immediately following ‘d’ to ‘p’ and ‘d’ is not killed in that path. A “kill” is defined as the position between two points, where the variable is defined and is redefined. Consider the figure 36.1 for which we have three definitions. Table 36.1 gives the definitions of the various blocks.

36.2 Data flow analysis – structure d programs

Let us consider the data flow analysis of structured programs. In order to understand the data flow, we need to know the various forms of statements. The assumption with the statements is that there is a single entry and single exit point.

Statements can be simple assignment statements, if-else, or a do-while statement or a sequence of these statements. A while statement could be interpreted in terms of the do-while statement itself. The following productions define the various types of statements where S is the start symbol and E is the expression. For simplicity, consider that this expression could be addition of variables of just the variable itself. We also assume that there is a unique header for all these types of statements which is the beginning of a control flow.

S à id := E | S ; S | if E then S else S | do S while E
E à id + id | id

Let us consider one statement after the other and to begin with consider the state ment defined by the following production.

Consider figure 36.2 which is a control flow for a simple assignment statement having a single definition of the variable ‘a’. Then, the data-flow equations for S are:

gen [S] = {d} – This statement defines the definition ‘d’ and hence it is included.

kill [S] = Da – {d} – This definition kills all other definitions of ‘a’ and is computed as