Is there some standardized or widely accepted algorithm for picking up operators in shift/reduce conflicts in LALR parser? The question is naive, my problem is not with implementing my solution, but implementing the solution is already widely used.
For shift the operator is the next input token, for reduce, it depends — I consider all already read symbols (for given production) declared as operators:
- if there is one — it is the operator
- if there are more than one — I report grammar error
- if there is none I use the input token as operator
So for example:
E ::= E + E
E ::= E * E
E ::= num
in case of
E + E | * num
Considering first production I read +
, and since it is the only one operator read I pick this one for reduce operator. *
is the first token in input so it servers as shift operator.
But is my algorithm correct (i.e. the same as normally used for LALR parsers)? I am especially worried for the point when I have more than 1 operator in the read tokens.
Update
What I NOT am asking here
I don’t ask how to resolve shift/reduce conflict once you have the operators. I don’t ask how to set the precedence for operators.
What I am asking here
I ask how to extract operators from the “stream” of tokens. Let’s say user set some precedence rules for +
, -
and *
. The stack is: -
E
+
E
and input is E
. Or the stack is E
and input is *
.
What is the operator for reduce in first case? And in the second? Why?
The stack is really entire right hand side of production.
8
Converted from a comment, first, a rephrasing of your question:
Suppose we have the following ambiguous grammar:
E ::= E + E
E ::= E * E
E ::= num
In the LR automaton, we will have the following shift-reduce conflict (among others):
E ::= E * E •, +
E ::= E • + E
As is common knowledge, reducing here will make ‘*’ bind more tightly as ‘+’ (as normal), while shifting will make ‘+’ bind more tightly.
Parser generators such as Yacc and Bison allow the user to mark tokens as binding more tightly than other tokens. As normal, we say that ‘* > +’.
However, there seems to be a mismatch somewhere: the ‘>’ we use above is an operator between two tokens, but the shift-reduce conflict is a conflict between a token and a production.
In Yacc and Bison, not just tokens but productions also have a priority. The user can mark tokens with priorities (as is normally done), and productions (that have not been given an explicit priority) have their priority inferred. The priority of a production is equal to the priority of the rightmost token in the production.
In “E ::= E * E”, the rightmost token is ‘*’, so in the shift reduce conflict above the priority of the reduction is ‘*’, and the priority of the shift is ‘+’, and as the priority of ‘*’ is higher than ‘+’, we decide to reduce rather than shift. As another example, the rightmost token of “A ::= a b B c d C D E” is ‘d’ (with only capitalized letters being nonterminals).
Adding “A ::= a b B c d C D E %prec a” to a production in Yacc and Bison explicitly gives that production the priority of ‘a’, but this is not normally done. Note that some productions may not have a rightmost token, or the priority between tokens is not defined, in which cases Yacc and Bison fall back to always shifting to resolve conflicts (Yacc and Bison always create a fully deterministic table, unless in GLR mode).
For completeness: the action with higher priority wins, so if the production has a higher priority than the token, we reduce instead of shifting. If priorities are equal, we look at associativity: left associativity means reducing, right associativity means shifting. ‘nonassoc’ is the third type of ‘associativity’, and it means an error: the shift/reduce conflict is resolved by removing both entries in the table, so the parser rejects inputs entering that state with that lookahead.
0
Well, let me grab the first parser I can find that tries to resolve this type of issue automatically and see what it does. According to http://www.haskell.org/happy/doc/html/sec-conflict-tips.html their default is to choose the shift over a reduce if there is a shift-reduce conflict, and to choose the first rule in the grammar file if there is a reduce-reduce conflict. (They consider reduce-reduce conflicts to be more serious problems.)
3