johnl@ima.UUCP (Compilers mailing list) (07/03/86)
Guido van Rossum relates: > I just read an article in Acta Informatica about parser generation for > "ELL(1) grammars."... > An ELL(1) grammar (Extended LL(1) grammar) is a context free grammar > where the right hand sides of rules are not strings of grammar symbols > but regular expressions over the set of grammar symbols... The idea of using REs for the right-hand sides is useful but hardly novel; it's been used in many EBNFs. In fact, I hope there's nothing novel here, since I've been using it in an LL(1) parser generator that's almost ten years old! > ...ELL(1) languages are languages for which an ELL(1) grammar exists; these > are the same as the LL(1) languages, so the ELL(1) paradigm does not add > essential power to the mechanism [I think],... Correct. I object to the name "ELL(1)" for just this reason--ELL(1) is just LL(1) in a convenient notation. The analysis required to produce a parser from an ELL(1)-notation grammar follows trivially from standard LL(1) techniques. The Acta Informatica article is surprising in how much it presents as ELL(1) technique that is common with LL(1) technique--such as the use of strongly-connected regions in computing the director sets. Regular expressions are a notational tool, but the full expressive power of REs is not available with the technique. Next, skipping forward to some questions from the parent article (to return later to the proposal): > - has anyone actually used ELL(1) grammars? experiences? Yes; as I mentioned above, I've done a parser generator (which is in the public domain) which uses REs on the RHS of rules. It has (), |, and * as you would expect, also + (one or more repetitions), [] (option), and a couple of other "operators" for notational convenience. Experience? Parsers are quite fast, parse tables are small, grammars can be readily understood by ordinary mortals. The notation is natural enough that I can generate either a parser or a good set of syntax diagrams from the same grammar without doing any transformations. Incidentally, this is a nice tool for checking what you've written in the grammar, since subtle errors in the grammar tend to be quite obvious when you see the syntax diagrams. > - any ideas on how to implement proper error recovery? It's no worse (nor better!) than error recovery for a standard LL parser-- except that the parse stack (or whatever state information you keep) will be cleaner because you don't have lots of intermediate nonterminals which are artifacts of the grammar notation. Looking at it differently, you could say that you don't have to do any work (either at parser-generation time or at compiler execution time) to get/keep a parser state useful for error recovery information. But error recovery is still sticky; simplistic approaches will work as badly with ELL(1) as with LL(1). And finally, the parent article (Guido van Rossum) proposes: > All this triggered a different idea in my mind. The ELL(1) property > requires (amongst others) that the regular expressions correspond to > Deterministic Finite Automata. In general r.e.'s correspond to > Nondeterministic Finite Automata. You can construct an NDFA for an RE, and it has some useful properties. However, a grammar is regular iff you can construct an FA for it, and... > ...*every* NFA can be transformed to a DFA (with possibly > exponentially many more states) that recognizes the same language... True; however, the exponential explosion of states isn't relevant here. The REs that you allow don't have the problem because of the LL(1) constraint. Aho/Hopcroft/Ullman give a constructive algorithm for NDFA-> DFA, and it does indeed generate a DFA with potentially as many as 2^n states for an n-state NDFA, at one point in the construction. (The NDFA has at most n = 2k states, where k is the number of symbols in the RE. Obviously if you're writing down k things, 2k is a nice number of objects to work with but 2^(2k) is not so nice.) However, A/H/U indicate that there's a cleanup afterward to eliminate unreachable states. If you think about the LL(1) property in relation to the RE, you can see that most of the new states go away since their purpose is (essentially) to encode state information. There isn't much state information to encode--you always know where you are and you know that it's only going to take one symbol to decide where to go next. This means that the 2^n estimate of states isn't close enough to be a useful upper bound. Another way to look at it is that since you're doing an LL(1) parse with a PDA instead of a DFA, you can carry the interesting state dynamically in the PDA's stack instead of trying to enumerate all the possible states and encode them statically in the DFA transition function. >...This > makes me feel that a larger groups of grammars could be called ELL(1) by > proper redefinition of that term,... But (as you pointed out later) you haven't changed the class of grammars, so you haven't gained anything in a theoretical sense. The LL(1) property, coupled with the fact that you're going to want to decorate the grammar with semantic actions that have to be performed in a predictable fashion at predictable times in the parse, will keep you from making much use of the nondeterminism of an NDFA. Finally, in terms of parser quality, I don't see any reason to choose an NDFA. While it's true that an NDFA has an advantage (compactness) over a DFA, I doubt that it would be more compact or faster than the PDA that drives an LL parse. --- Dick Dunn {hao,ucbvax,allegra}!nbires!rcd (303)444-5710 x3086 ...Simpler is better. -- Send compilers mail to ima!compilers or, in a pinch to Levine@YALE.EDU Plausible paths are { ihnp4 | decvax | cbosgd | harvard | yale | bbncca}!ima Please send responses to the originator of the message -- I cannot forward mail accidentally sent back to compilers. Meta-mail to ima!compilers-request