17 CALR Parsing

Rajeswari Sridhar

After discussing the SLR parsing and the problems associated with the SLR parsers, this module will discuss the powerful LR parser – Canonical LR parser. In this module, we will learn to construct LR(1) items which is necessary for constructing the CALR parsing table and using this table parse a given string using the CALR parser.

17.1 Need for CALR parsers

In the SLR parser, there is a problem of shift / reduce conflict even if the grammar is unambiguous. This is due to the fact that the SLR parsers uses the FOLLOW() information to perform a reduce action by matching the stack information with input symbol. However the FOLLOW() information alone is not sufficient to decide when to reduce. Hence, powerful parser is required.

The issues associated with considering the FOLLOW() information is discussed as follows:

• In SLR, if there is a production of the form A à α▪ , then a reduce action takes place based on FOLLOW(A).

• However, there would be situations, where, when state ‘I’ appears on the top of the stack, the viable prefix βα on the stack is such that βA cannot be followed by terminal ‘a’ in a right sentential form. Hence, the reduction A à α would be invalid on input ‘a’

This results in the shift/reduce conflict. To resolve this conflict, we will consider and check whether it is possible to perform more in the states that allow us to rule out some of the invalid reduction. This is done by introducing more set of items thus resulting in more states in the CALR parsing table. Thus we would be introducing exactly which input symbols to follow a particular non-terminal.

17.2 CALR Parser

The steps involved in the CALR parser are as follows:

• Construct LR(1) items – This is in contrast with the LR(0) items that is constructed for the SLR parser. This also uses Closure() and goto(), but the algorithm for these two functions are different.

• LR(1) items are used to construct the CALR parsing table involving action, goto.- The parsing table resembles SLR parsing table but has more states and there is little variation in the construction procedure.

• Use this table, along with input string and a stack is used to parse the string – The parsing action is same as the SLR parser’s algorithm.

The CALR parser uses the LR(1) items. The LR(1) items are constructed and this results in increased number of states. The states are increased by accommodating an extra symbol in the items to include a terminal symbol as a second component. Thus A à[α .β, a] will be the item in the LR(1) items collection, if A àαβ is a production and ‘a’ is a terminal or the right end marker. If there is no terminal available then the right end marker is $.

17.2.1. LR(1) item construction

LR(1) items are constructed that has a right end marker in addition to the format of the LR(0) items. The ‘1’ refers to the length of the second component which is the look-ahead of the item. This look-ahead has no effect in A à[α .β , a] where β is not ε, but will ensure that a conflict does not arise if A à[α . , a] calls for a reduction A à α if the next input symbol is ‘a’. This terminal ‘a’ will be subset of FOLLOW(A). A à [α .β , a] is a valid item for a viable prefix γ if there is a derivation S => δAw => δαβw where γ = δα and either ‘a’ is the first symbol of ‘w’ or ‘w’ is ε and ‘a’ is $.

LR(1) items construction requires computation of closure() and goto(). The computation of Closure(I) is given in algorithm 17.1. This algorithm is similar to the LR(0)’s closure in considering the augmented grammar to start this, but will accommodate the look-ahead component.

Algorithm 17.1

Closure(I, Augment grammar G’)

{

repeat

a. for each item [A à α▪Bβ, a] in I,

each production B à γ in G’and each terminal b in F IRST(βa)

such that [B à .γ , b] is not in I do

1. add [B à .γ , b]

until no more items can be added to I

}

Step ‘a’ of the algorithm 17.1 initially starts by adding the initial production of the augmented grammar as an item with the look-ahead as $. After that it considers the non-terminal that appears after the dot. This item is added and its look-ahead is computed by computing the FIRST() of the remaining symbols after this non-terminal including the current look-ahead. So, if β is ε, even than as ‘a’ is ‘$’ to start with, the look-ahead will be FIRST($). We keep adding till no more items can be added. Thus the difference between LR(1) and LR(0) is that in considering FIRST(βa) and adding ‘a’ as a look-ahead. It is interesting to observe that a single item-set may contain the same items with different look-aheads. The next algorithm is to compute the goto(I, X) where X is a grammar symbol. This is given in Algorithm 17.2. This algorithm is the same as LR(0)’s goto() but this incorporates the look-ahead symbol.

Algorithm 17.2

goto(I,X)

{

Let J be the set of items [AàαX.β , a] such that [Aàα.Xβ , a] is in I;

Return closure(J)

}

Step ‘a’ of algorithm 17.2, shifts the dot by one position to the right of an item, retaining the look-ahead of the original item. Then after shifting, we compute closure of the shifted item and add that to the set of items. So, if β is a non-terminal, we add more items to the current item set with different / same look-ahead and if it is a terminal we do not have any more new items to the current item set.

After computing the closure() and goto(), we use these two functions to compute LR(1) items and is given in algorithm 17.3

Algorithm 17.3

ITEMS(Grammar G’)

{C:= closure ( {S’ à .S, $}); repeat

for each set of items I in C and each grammar symbol X such that goto(I,X) is not empty

and not in C

add goto(I, X) to C

until no more set of items can be added to C

}

This algorithm is same as the LR(0) items construction algorithm, but it considers the items with look-ahead.

Example 17.1 Consider the following grammar and construct the LR(1) items

S à CC

C à cC

C à d

We form the Augmented grammar by introducing the new start symbol S’ and form the set of items and is given in table 17.1

The LR(1) items can also be represented as a DFA similar to the LR(0) items where the states correspond to the nodes and edges correspond to the grammar symbols.

17.2.2 CALR Parsing Table

If we could recollect the SLR parsing table requires the knowledge of the FOLLOW() of the non-terminals. This FOLLOW() set is used to populate the SLR parsing table for the reduce action. This is however not required here as the look-ahead which is conveyed by the FOLLOW() in the SLR parsing table, is available along with the item itself in the LR(1) items. The CALR parsing table also has two divisions: action and goto. The action() fields are constituted by the terminals and it has the shift, reduce, accep t and error actions. The goto() fields are constituted by the non-terminals and it contains the state numbers which is the result of the goto(). The procedure for the CALR parsing table construction is given in Algorithm 17.4.

Algorithm 17.4

CALR_ParsingTable (Augmented Grammar G’)

{

1. Construct C = {I0 ,I1 ,I2 … In } the collection of LR(1) items for G’

2. State ‘i’ of the parser is from Ii

i. if [A à α.aβ, b] is in Ii and goto(Ii, a) = Ij set action [i, a] = shift j, where a is a terminal

ii. if [ A à α . , a] is in Ii and A ≠ S’, then set action[i, a] = reduce by A à α a. //a conflict here implies the grammar is not CALR grammar

iii. if [S’ à .S, $] implies an accept action at action[i,$] = accept

iv. all other entries are error

3. If goto(Ii , A) = Ij then goto (i, A) = j

4. All other entries are error

}

Step 1 of the algorithm 17.4 calls for the construction of the LR(1) items. Step 2 has four actions which are used to construct the action field of the CALR parsing table. The first one is a shift action which is the same as the SLR table’s shift action. If goto(Ii, a) = Ij then at the intersection of [i, a] we set the action as “sj” to indicate “shift j”. Step 2 (ii) of the algorithm is for reduce action where the kernel items are considered. At the table entry of kernel item number and the look-ahead symbol we add the action reduce by the production indicated by the kernel item. The item number that has the initial kernel item is used to indicate the accept action as in the SLR parsing table. All other entries are considered as error in the action field of the CALR parsing table. The goto() field is the same as the SLR table’s goto field. If goto(Ii , A) = Ij then goto (i, A) = j is added to the CALR parsing table.

Example 17.2

For the grammar discussed in example 17.1, let us construct the CALR parsing table based on the set of items discussed in Table 17.1. The CALR parsing table is given in Table 17.2

17.2.3 CALR parsing

CALR parsing action is exactly same as the SLR parsing action but this is done with the help of CALR parsing table. The stack is initialized with the state 0. The stack contains alternately state number and the grammar symbol with the state number on the stack. The input is appended with $. The stack symbol and the input are compared in the table and the stack is manipulated accordingly. The CALR parsing table is given in Algorithm 17.5

Algortihm 17.5

CALR Parsing Table (Table T, Input w$)

{

Set input to point to the first symbol of w$
Repeat forever

– Let s be the state on the top of the stack

– Let a be the symbol pointed to by ip

– If action [s, a] = shift s’ then

Push a then s’ on top of the stack
Move input to the next input symbol

– Else if action [s, a] = reduce A à β then

Pop 2 * | β | symbols off the stack

– Let s’ be the state now on the top of the stack

Push A then goto [s’, A] on top of the stack
Output the production A à β

– Else if action[s, a] = accept then return;

– Else error()

}

Example 17.3

Table 17.2 could be used to parse the string “ccdd” and is explained in table 17.3

Example 17.3

We considered the pointer variable declaration grammar having the shift / reduce conflict in the SLR parsing table. Let us construct the CALR parsing table and verify whether it is a CALR grammar.

S’à S
S ® L = R
Sà R
L ® * R
L à id
R ® L

The LR(1) items is given in Table 17.4 and the parsing table is given in Table 17.5