23 THREE ADDRESS CODE, SYMBOL TABLE AND ARRAYS

Rajeswari Sridhar

In this module, we learn how to write semantic rules for declarations by declaring scope information. The use of symbol tables for declaring scope information for symbol tables is also discussed in this module. In this module, we also discuss the grammar for array declarations.

23.1 Symbol Table

We have discussed the three address code generation for declarations. The attributes, offset and type helps to keep track of the amount of memory and address information necessary for every variable. In addition to the address and memory information, scope details are necessary for the variables. Consider the following production for declarations:

• P à D

• D à D ; D | id= T | proc id ; D ; S

The moment the compiler encounters proc id; D;S , a new symbol table is created. The details about the scope contents are achieved by the following functions:

· mktable (previous) – This function returns a pointer to a new table that is linked to a previous table in the outer scope. “previous” is the address of the previous table, and the first table is called with “null”.

• enter(table, name, type, offset) – This function is used to create a new entry in table. “table” refers to the address of the current table, “name” is the variable name, “type” refers to the data type of the variable and “offset” is the value used to compute the address of the variable. The attributes “type” and “offset” are the values generated using the three address code for declarations

• addwidth(table, width) – This function is used to determine the total width of all entries in table

• enterproc(table, name, newtable) – This function creates a new entry in table for procedure with local scope newtable. Here “table” refers to the current table, “name” refers to the name of the newtable, “newtable” refers to the address of the new table

• lookup (table, name) – This returns a pointer to the entry in the table for name by following linked tables.

Let us consider an example to demonstrate the linking of the various tables using the above discussed functions. Consider a structure “S” to have two integer variables and let ‘s’ be an instance of this structure.

struct S

{ int a; int b;

} s;

Let us define a function “swap” with two reference integer variables and the body of the function is defined as follows:

void swap(int& a, int& b)

{ int t; t = a; a = b;

b = t;

}

Consider the following main program which has calls the function swap() and another function foo().

main()

{struct S s; swap(a, b);

foo();

}

The symbol table integrated information is shown in figure 23.1

As can be seen from figure 23.1, the initial symbol table is created when the compiler encounters the main() function using the mktable(nil) function. Once the structure variable is declared using the table information for structure, another table is created as mktable(main) and thus there is a link from main table to the table of the structure. The variable‘s’ of the structure has its own symbol table to have information about its fields. The “enterproc” function helps enter the details about the symbol table of ‘s’ in the current symbol table. The details about‘s’ is entered in the current symbol table using the “enter” function. The width information about the structure variable is computed using addwidth() and in this example it is computed as “8” and s tored in the header field of the symbol table. Similarly, the function swap will have a total width as “12” and is stored in the table corresponding to the function “swap”. Let us see how the compiler builds the symbol table to remember the scope information. Consider the following function having calls to three functions A, B, C and assume function A in turn calls function D.

Void foo( )

{ call A()

call B()

call (C)

}

Void A()

{

call D()

}

The calling stack phenomenon is given in figure 23.2 (a) through (d)

As can be seen from figure 23 (a), when function A is called, it is pushed into the calling stack and in turn as A calls D, the information about D is pushed onto the stack. Once D returns, it is removed from the stack and so is the case with A. When B is called B is pushed onto the stack and a similar scenario happens with function call to C and is given in figures 23 (b) and (c). When C also terminates, it is also removed from the stack. The address of the symbol tables are maintained as a stack. This information is used to refer to the variables of the functions which are available in the top of the stack thus ensuring scope information. Once, a function terminates, the address of this symbol table is removed from this stack.

2 Semantic rules for generating Three-address code for scope information

The functions discussed in the previous section are used to generate three-address code for maintaining scope information. Table 23.1 gives the semantic rules for generating three address code for procedures to maintain scope information. The semantic rule makes use of the functions discussed in the previous section. A stack of addresses of the symbol table indicated as “tblptr” is maintained and is useful in accessing the topmost symbol table at any point of time.

Table 23.1 Semantic rules for maintaining scope information

The details in Table 23.1 gives the semantic rules for maintaining the scope of symbol tables using two stacks, one of the stack tblptr is used to keep track of the available symbol table and the other one, offset is used to determine the relative addresses of variables in a function. When a new procedure is called a symbol table is created and its pointer pushed to this stack with offset information in another stack. When the function call seizes this symbol table address is popped off the stack.

23.3 Semantic rules for Records in Pascal

After discussing the semantic rules for generating three-address code to maintain scope information, we will conclude with generating three-address code for records. The semantic rules for generating three-address code for records are given in Table 23.2

This uses a function “emit” which is used to generate three address codes to output file. As can be seen from Figure 23.1, the second statement involves generating a three address code, id.place := E.place whereas a sequence of statements doesn’t involve any code.

23.5 Three-address code for assignment statements

Names in the symbol table are the variables which are referred to addresses. The function Lookup(id.name) identifies variable “id” in symbol table. The function emit is used to emit three address statements to output file. Table 23.3 gives the set of semantic rules for the expression grammar using emit function. This is similar to the one already discussed in the previous module with “gen” function replaced with “emit” function.

From tables 23.3 and 22.6 of the previous module, it can be seen that if the expression is long, the number of temporary variables involved also increased. To tackle this situation, we keep track of a counter wherein the number of temporary variables could be reused. We modify newtemp() to use a “stack”. We keep a counter c initialized to 0, and is used to track the number of temporary variables. newtemp() increments c and returns temporary $c. We decrement counter on each use of a $i in a three-address statement.

Consider the production E1 + E2. The correct evaluation is to evaluate E1 into t1 and E2 into t2 and is shown below.

t3:= t1 + t2

As t1 is no longer required, we can reuse t1 instead of using new temp t3. This is the fundamental idea. This is explained in the following example.

Consider the expression x : = a * b + c*d – e * f. The use and re-use of temporary variables is given in table 23.4

– em = em-1 + im

– e1 = i1

This will be discussed in detail in the next module.

Summary: This module discussed the three-address code for declarations in the view of maintaining scope information using symbol table and keeping an access of all symbol tables. The grammar for arrays in Pascal has been introduced. The next module will discuss about defining semantic rules for generating three-address code for accessing arrays.

Grammar for arrays have been discussed