11 Top down parser-parsing table,shift reduce parser
Rajeswari Sridhar
This module will discuss the algorithm to parse a given string using a LL(1) parser by using the parsing table constructed in the previous module. The errors that could occur in the input string need to be handled which is done by calling appropriate routines. This module will also introduce the preliminary Bottom up parser namely shift-reduce parser.
11.1 Parsing Algorithm
The primary objective of the non-recursive predictive parser, LL(1) avoids back tracking which is a typical problem in Top-down parsing. The LL (1) parser does this by constructing a parsing table. The parsing table constructed for the expression grammar is given in Table 11.1 for a quick reference. As already discussed, the rows of the parsing table are the non-terminals and the columns correspond to the terminals. The table entries correspond to the productions that match a given non-terminals with its FIRST() list.
Table 11.1 Parsing table for the Expression grammar
id | + | * | ( | ) | $ | |
E | E à TE’ | E à TE’ | ||||
E’ | E’à +TE’ | E’ à ε | E’ à ε | |||
T | T à FT’ | T à FT’ | ||||
T’ | T’ à ε | T’ à *FT’ | T’ à ε | T’ à ε | ||
F | F à id | F à (E) |
The algorithm for parsing is given in 11.1
Algorithm 11.1
LL(1)_PARSING(Input string s, Start symbol of Grammar, Parsing table)
{
1. push($)
push(S)
a := lookahead
2. repeat
X := pop()
if X is a terminal or X = $ then
match(X) // move to next token, a := lookahead
else if M[X,a] = X ® Y1Y2…Yk then
push(Yk, Yk-1, …, Y2, Y1) // such that Y1 is on top produce output and/or invoke actions
else error()
endif
until X = $
}
The input string is terminated with a “$” to serve as end-of input indicator. The parsing algorithm uses a stack memory to operate so as to parse the string. This stack memory is pushed initially with a “$” which acts as the initial stack symbol and it is used to indicate success/ failure parsing action. The start symbol of the grammar is then pushed onto the stack memory, as any derivation of a string belonging to a grammar need to start from the start symbol as already discussed in Module 9. After that the input pointer starts reading the first character of the input symbol. These initialization steps are carried out in Step 1 of the algorithm. Step 2 of the algorithm is a loop which continues till the stack is empty. The input symbol ‘a’ is compared with the symbol in the stack ‘X’ and the various actions for the possible combinations are given below.
- If ‘X’ is a terminal and if it is ‘a’, then this character is popped from the stack and the input pointer moves by one position to process the next character.
- If ‘X’ is a non-terminal, then the parsing table is referred for [X,a] and the corresponding production available in the table entry is retrieved. The input pointer stays in the same position pointing ‘a’. The stack symbol X is popped and since this will be a LHS non-terminal, the production which is retrieved from the parsing table is pushed onto the stack. The production is read from left to right and this should be available in the stack from top to bottom, thus ensuring the first symbol on the RHS of a production is on the top of the stack.
- If ‘X’ is a terminal and if it is ‘$’ and if the input character ‘a’ is also ‘$’, then the parser announces successful completion of the parsing action.
- Any other combination of ‘X’ and ’a’ would result in an error action.
Example 11.1: Consider the string id + id * id. Let us see how the parsing algorithm, parses this string. The details of it is given in Table 11.2
11.2 Error Recovery in LL(1) Parsers
A LL(1) parser has to recover from errors if an input string doesn’t belong to it. The parser should recover from the error and should continue with the next phase of compilations.
11.2.1 Error Recovery methods
The parser does not correct the errors. It simply recovers from the errors in one of the many ways available and possible to recover from errors so that the compiler can proceed with the next compilation phase. The following are the various modes a parser uses to recover from errors.
· Panic mode: In this mode of error recovery, the parser discards input until a token in a set of designated synchronizing tokens is found.
- Phrase-level recovery: In this method, the parser performs local correction on the input to repair the error
- Error productions: Error productions are added to the grammar to identify erroneous constructs
- Global correction: In this method, a minimal sequence of changes is made to the parsing algorithm to obtain a global least-cost error correction
11.2.2 Error recovery in LL(1) Parsers.
The LL(1) parsers adopts all the four methods listed above and handles it using the parsing table.
- Panic Mode: In this mode, the synchronizing actions are added to the undefined entries in the parsing table based on FOLLOW() already computed. This is given in Table 11.3
Table 11.3 Parsing table with synch action to recover from errors for Expression grammar
id | + | * | ( | ) | $ | |
E | E à TE’ | E à TE’ | synch | synch | ||
E’ | E’à +TE’ | E’ à ε | E’ à ε | |||
T | T à FT’ | synch | T à FT’ | synch | synch | |
T’ | T’ à ε | T’ à *FT’ | T’ à ε | T’ à ε | ||
F | F à id | synch | synch | F à (E) | synch | synch |
As seen from Table 11.3, the FOLLOW(E) = {$, )}. Therefore, synch actions are added to the intersection of [E, $] and [E, )]. So, when this combination occur in the parsing table and input, the parser enters into the panic mode and start deleting tokens from the input or stack till a valid combination of stack entry and input is available and logs this as error in the error table.
- Phrase Mode: In this mode, the LL(1) parser, assumes and inserts some missing characters to continue with the parsing action. For example: id id is changed into id * id or id + id
- Error Production: This is similar to the general parser recovery method. Error production are added to take care of incorrect input combinations
An ambiguous grammar is not LL(1) and may have multiple entries in the parsing table. In that case, the parsing algorithm can decide to choose to apply any one of the available productions in the table entry.
11.3 Bottom up Parsers
The LL(1) parser is not a very powerful parser and cannot handle all types of grammars. We have already seen that LL(1) parser cannot handle ambiguous grammars and some grammars are inherently ambiguous. Bottom up parsers build a derivation by working from the input back toward the start symbol. The parse tree is constructed by looking at the input string and matching the RHS with the LHS of the production and correspondingly substituting it. Thus the parse tree is built from leaves towards the root. Hence, we can say the bottom- up parser builds a derivation tree by applying rightmost derivation in reverse.
Bottom-up parsers are powerful than their top-down counterpart and does not require any pre-processing as in the case of LL(1) parsers. There are various types of bottom- up parsers.
• Shift-Reduce Parsing – Simply constructs the right most derivation tree by replacing RHS of the production with the LHS Non-terminal.
• Operator-precedence parsing – It is a special type of Bottom –up parser where a grammar has to obey the property of an infix expression.
• LR parsers – ‘L’ stands for the input string being scanned from left to right, ‘R’ stands for applying rightmost derivation. There are various types of LR parsers
– Simple LR (SLR)
– Canonical LR (CALR)
– Look Ahead LR (LALR)
As bottom- up parsers match the RHS of production with LHS, a concept called ‘handle’ is defined. A handle is a substring of grammar symbols in a right-sentential form that matches a right- hand side of a production. A handle’s reduction to the non-terminal on the LHS represents one step along the reverse of a rightmost derivation. For example, the following are some of the handles of the expression grammar:
• id
• E * E
• (E)
In this module, we will discuss the simple shift-reduce bottom up parser and other bottom-up parsers are discussed in subsequent modules.
11.4 Shift-Reduce Parsers
The shift-reduce parser is the simplest of the Bottom up Parsers. It also has a stack memory similar to a LL(1) parser. The input symbol here again is terminated with “$” to indicate end of input. The stack is initialized with the “$” symbol alone in this type of parser.
11.4.1 Parsing by Shift-Reduce Parsers
The shift-reduce parser is based on two actions: “shift” and “reduce”. The stack memory and the input are considered individually to perform the shift and reduce actions. The “shift” action simply shifts the input symbols until a handle is found into the stack by calling the push() routine. The reduce action, reduces the substring to the non-terminal on the LHS of the corresponding production in the stack, by popping the handle and pushing the LHS non-terminals. Thus the shift-reduce parser has 4 actions:
- Shift – the next input symbol is shifted onto the stack
- Reduce the handle that is at top of stack
- pop handle
- push appropriate LHS symbol
- Accept the input string, stop parsing and report success – The string is accepted if the input is completely consumed and the stack has the start symbol as its top of stack symbol.
- Error recovery routine is called if it was not able to do any of the other 3 actions.
Example 11. 2 Consider the following ambiguous expression grammar with productions numbered from 1 to 4. Let us see the parsing action for the string “id+id*id”
- S -> E
- E ->E + E
- E ->E * E
- E -> id
The parsing action is shown in Table 11.4
11.4.2 Issues in Shift-Reduce parser
The shift-reduce conflict as explained in Table 11.4 is a typical problem of the Shift-reduce parser. A reduce-reduce conflict may also occur where the parser will not know which production to choose from the list of available productions for a “long” or “short” handle. These issues are caused by
- The limitations of the parsing method (even when the grammar is unambiguous)
- Ambiguity of the grammar
These issues will be resolved by a powerful bottom up Parser and will be dealt with in subsequent modules.
Summary:
In this module we discussed the parsing action of LL(1) parser. The various error recovery strategies in LL(1) parsers are also discussed. This module introduced one of the simple Bottom up parser namely Shift-reduce parser and discussed the approach to parsing by this parser.