The method is described in Dragon Book, however I read about it in “”Parsing Techniques” by D.Grune and C.J.H.Jacobs”.
I start from my understanding of building channels for NFA:
- channels are built once, they are like water channels with current
- you “drop” lookahead symbols in right places (sources) of the channel, and they propagate with “current”
- when symbol propagates, there are no barriers (the only sufficient things for propagation are presence of channel and direction/current); i.e. lookahead cannot just die out of the blue
Is that right?
If I am correct, then eof
lookahead should be present in all states, because the source of it is the start production, and all other production states are reachable from start state.
How the DFA is made out of this NFA is not perfectly clear for me — the authors of the mentioned book write about preserving channels, but I see no purpose, if you propagated lookaheads. If the channels have to be preserved, are they cut off from the source if the DFA state does not include source NFA state? I assume no — the channels still runs between DFA states, not only within given DFA state.
In the effect eof
should still be present in all items in all states. But when you take a look at DFA presented in book (pdf is from errata):
DFA for LALR (fig. 9.34 in the book, p.301)
you will see there are items without eof
in lookahead.
The grammar for this DFA is:
S -> E
E -> E - T
E -> T
T -> ( E )
T -> n
So how it was computed, when eof
was dropped, and on what condition?
Update
It is textual pdf, so two interesting states (in DFA; #
is eof
):
State 1:
S--- >•E[#]
E--- >•E-T[#-]
E--- >•T[#-]
T--- >•n[#-]
T--- >•(E)[#-]
State 6:
T--- >(•E)[#-)]
E--- >•E-T[-)]
E--- >•T[-)]
T--- >•n[-)]
T--- >•(E)[-)]
Arc from 1 to 6 is labeled (
.
3
I think I got it.
There are barriers. When you think about NFA (and DFA, but then think about items, not states), there closure transitions and shift transitions, right?
For example — when you have T -> . ( E )
(state 1, last item in the pdf file) to T -> ( . E )
(state 6, first item) this is shift transition, and all lookahead symbols (#
and -
) are flowing happily.
However, initial closure transition kills all lookaheads (x) — you have to compute them from scratch. Continuing closure transition does not kill lookaheads.
Take a look at state 6 — all except the first item, are effects of closure transitions. The second gets )
from the top one, and -
from itself. The third no matter which transition you consider gets )
and -
. And for the last 2 ones, continuing closure transition gives also )
and -
.
(x) There is an exception — if the pointer sits next to last symbol in production, you copy the lookaheads. Example A -> B . C
, when you create closure for C
you copy the lookaheads from A
, because C
in such case has to end with the same stuff as A
.
To sum up — you have shift transition, copy all lookaheads.
You have closure:
-
as from
X -> A B C D . E
, in such case copy lookaheads as for shift transition -
as from
X -> A . B C D E
, compute FIRST forC D E
(mind you, not justC
) and use it (copy) as lookaheads forB
closure, if entire chain can be empty (C D E
) copy lookaheads as for shift too