18 LALR parsing
Rajeswari Sridhar
After understanding the most powerful CALR parser, in this module we will learn to construct the LALR parser. The CALR parser has a large set of items and hence the LALR parser is designed that has lesser number of items but with reduction in the number of conflicts which is a problem of SLR parser. This module will discuss the construction of LR(1) items necessary for LALR parsing, LALR parsing table followed by parsing a string using the LALR parser.
18.1 Need for LALR parser
Though the CALR parser is powerful enough in avoiding the conflicts of the SLR parser, it suffers from a large set of LR(1) items. This increases the number of entries in the CALR parsing table and thus increases the time complexity of computation and parsing. Increase in the number of items is reduced in LALR parsing table by combining the items that have the same core items but different look-ahead. Thus this is less powerful than CALR parser but avoids shift/reduce conflicts as shifts do not use look-ahead. As we are combining the items with different look-ahead into one, the LALR parser may introduce reduce-reduce conflicts, but not much of a problem for grammars of programming languages.
18.2 LR(1) items
The algorithm for LR(1) items for the LALR parser is computed by first constructing the LR(1) items as in the case of the CALR parser and then combining the items that have the same item-set but differing look-ahead into one item. The algorithm for the CALR’s LR(1) items construction is discussed in module 17. Combining the items alone is discussed by means of an example.
Example 18.1
Let us construct the LR(1) items for the grammar given below to construct the LALR parsing table.
S à CC
C à cC
C à d
The augmented grammar is given below and the CALR’s LR(1) items are repeated here for a quick reference in Table 18.1
- S’ à S
- S à CC
- C à cC
- C à d
18.4 Conflicts in LL and LR Parsers
LL parsing tables are computed using FIRST/FOLLOW where the rows correspond to the non-terminals and the columns correspond to the terminals. To construct the parsing table the grammar need to be pre-processed to remove left recursion and need to be left factored and generate a modified grammar. This modified grammar is used to construct the FIRST and FOLLOW which are used to construct the parsing table.
LR parsing tables are computed using Closure and Goto, where the actions correspond to the shift, reduce, accept and error situation. The three types of LR parsers are SLR, CALR and LALR and all of them constructs a parsing table where the rows correspond to the states which are the result of LR() items and the columns corresponds to the terminals and non-terminals.
This parsing table is fundamental and is very important for the parsing action. An incorrect parsing table will result in an ambiguous parsing. A grammar is said to be LL(1) if its LL(1) parse table has no conflicts, SLR if its SLR parse table has no conflicts, LALR(1) if its LALR(1) parse table has no conflicts and CALR(1) if its CALR(1) parse table has no conflicts. The conflicts can be shift / reduce conflict or a reduce/ reduce conflict.
Conflicts in LL and LR parsers
Conflicts are resolved depending on whether operators / symbols involved are left/right associative or precedence. The following is the manner in which the parsers resolve the conflicts.
- For Left-associative operators the conflict is resolved in favor of reduce action.
- For Right-associative operators the conflict is resolved in favor of shift action.
- If the stack has a higher precedent operator the conflict is in favor of reduce action.
- If the stack has a lower precedent operator the conflict is in favor of shift action.
18.5 Error Detection and Recovery
Canonical LR parser uses full LR (1) parse table and will never make a single reduction before recognizing the error when a syntax error occurs on the input. SLR and LALR may still reduce when a syntax error occurs on the input, but will never shift the erroneous input symbol. An error is detected if the symbol on top of the stack and the input symbol do not have a LR parsing table entry. The parsers recover from errors so that the compilation can be carried forward and will not make it as a permanent change. The errors are recovered in one of the following ways:
- Panic mode: In this mode of error recovery, the stack symbols are popped until a state with a goto on a non-terminal A is found, where A represents a non-terminal of the grammar. From the input, the symbols are discarded until we find a symbol in the input that matches with the FOLLOW set of A.
- Phrase-level recovery: We implement individual error routines and call appropriate routines which will pop the stack / discard the input or both and log this information in an error log and recovers from error so that parsing could continue.
- Error productions: New error productions are added to the grammar. In the event of an incorrect state and table entry match, the symbols in the stack are popped until state has error production and this is pushed onto the stack. After that the input symbols are discarded till a parsing action could continue.
Summary:
In this module we discussed the construction of LR(1) items for the LALR parser which is a modified LR(1) items after constructing it for the CALR parser. Using the modified LR(1) items the LALR parsing table is constructed and is used to parser a given string. We also discussed the LALR parsing action along with error recovery in LR parsers.