So i understand the semantics of derivations as far as Backus Naur Form goes. One thing I cannot find in any text book or the various lecturers’ notes that are on-line is this.
When would a right most derivation be beneficial over a left most derivation? Would a compiler or generator always use either left, or right, or would it use both?
So here’s an example:
Suppose we had the following language sentence: A = B + C * A
Example Language Grammar Permitting the Above:
<assign> <id> = <expr>
<id> A | B | C
<expr> <expr> + <term>
|<term>
<term> <term> * <factor>
|<factor>
<factor> ( expr )
| <id>
Left Most Derivation:
<assign> => <id> = <expr>
=> A = <expr> + <term>
=> A = <term> + <term>
=> A = <factor> + <term>
=> A = <id> + <term>
=> A = B + <term>
=> A = B + <term> * <factor>
=> A = B + <factor> * <factor>
=> A = B + <id> * <factor>
=> A = B + C * < term>
=> A = B + C * <factor>
=> A = B + C * <id>
=> A = B + C * A
Right Most Derivation:
<assign> => <id> = <expr>
=> <id> = <expr> + <term>
=> <id> = <expr> + <term> * <factor>
=> <id> = <expr> + <term> * <id>
=> <id> = <expr> + <term> * A
=> <id> = <expr> + <factor> * A
=> <id> = <expr> + <id> * A
=> <id> = <expr> + C * A
=> <id> = <term> + C * A
=> <id> = <id> + C * A
=> <id> = B + C * A
=> A = B + C * C
4
The way it was explained in my compiler class, well over thirty years ago, is that a leftmost derivation scheme works best with a top-down (e.g. recursive descent) parser, while a rightmost derivation scheme works best with a bottom-up (e.g., SLR(1), LALR(1), YACC et al) parser. Using a leftmost derivation scheme with a bottom-up parser requires the parser to “stack up” the entire sentence, then process it from the right end, rather than recognizing short phrases at a time. Ditto the opposite pairing.
My Green Dragon Book and Red Dragon Book are currently in storage, or I’d go looking to see if they have examples or better explanations.
After thinking some more about it, the above discussion is about left recursive and right recursive grammars, but I think the effect is the same. At the time, I found the result very satisfying, in that it said there wasn’t one “best” way to do a grammar, but rather it depended on how your parser worked as well.
On their own, both left-most and right-most derivations are nothing but arbitrary rules that disambiguate which steps to take when parsing or generating with a CFG: many different orders of nonterminal expansion are imaginable which ultimately lead to the same tree, and LMD and RMD are simply two simple strategies to decide which route to take.
For computers, it is vital to have such arbitrary rules because they cannot make decisions on their own. All they can do is either follow some strict rule, or try all possibilities and back-track as necessary. Back-tracking amounts to a search of the possibility space, which is wasteful if a simple rule would also do the trick. Virtually all programming languages are defined so that you only need a deterministic parser, and often only a deterministic parser with very limited look-ahead scope.
1
A top down or recursive descent parser (LL(k)) constructs a left derivation. Your first diagram roughly outlines the steps of such a parse.
A bottom up (SLR(k) or LALR(k)) parser constructs a right derivation. Your second diagram roughly outlines the steps of such a parse.
What is missing from this discussion is the differences between suitable grammars. If the grammar looked like this:
<expr> := <term>
| <term> + <expr>
The left derivation is now shown to backtrack, and is less suitable for an LL(k) parser. The right derivation does not, and still suits bottom up parsing. There are other grammars in which the right derivation turns up problems.
The main reason for these methods, as I have used them in the past, is to show problems in the grammar which can be addressed by refactoring, to make them more suitable for a particular kind of parser.
Left most derivation are easy to construct and which can be used in the Top-Down parsing. Top-Down parsers are easy to implement but it can handle only some set of grammars i.e it can not handle
- Ambiguous grammars
- Left Recursive Grammars
- Non Deterministic Grammars (Left factoring)
so left most derivations are not going to use for all class of grammars to be parsing.
Bottom-Up parser can handle all the above grammars and BUP uses RMD in reverse. RMDs are used to parse for all class of grammars.
Finally RMD are more powerful and efficient than LMD.