johnl@ima.UUCP (Compilers mailing list) (06/29/86)
I just read an article in Acta Informatica about parser generation for "ELL(1) grammars." --- Start explanation of ELL(1) grammars --- An ELL(1) grammar (Extended LL(1) grammar) is a context free grammar where the right hand sides of rules are not strings of grammar symbols but regular expressions over the set of grammar symbols. Regular expressions in this case use only ( ) | and *; this has not much to do with r.e.'s as used in Unix programs like ed or (e)grep. Many common extensions to grammar notation use this form of regular expressions, or further operators like + and ?; the operators mentioned here are the essential ones. Such a grammar is only called ELL(1) if it has the ELL(1) property, which is similar to the LL(1) property, and basically means that you can always choose which path next to take based on the next input token (unlike LR(1), where this decision can be delayed until a handle has been recognized). ELL(1) languages are languages for which an ELL(1) grammar exists; these are the same as the LL(1) languages, so the ELL(1) paradigm does not add essential power to the mechanism [I think], unlike LR(1) which really contains more languages. However, an ELL(1) grammar for a particular language is easier to write than an LL(1) grammar, as generally fewer non-terminals are needed (notably by the replacement of many forms of recursion by iteration). Also, the parse tree generated by parsing a program is often more usable for code generation and other activities such as static semantic checks or structured editing (my hobby). --- End explanation --- All this triggered a different idea in my mind. The ELL(1) property requires (amongst others) that the regular expressions correspond to Deterministic Finite Automata. In general r.e.'s correspond to Nondeterministic Finite Automata. However, my trusty Aho&Ullman tells me that *every* NFA can be transformed to a DFA (with possibly exponentially many more states) that recognizes the same language. This makes me feel that a larger groups of grammars could be called ELL(1) by proper redefinition of that term, and that generation of parsing tables should still be possible. (The method given in my reference is not easily adapted to do this, however, at least I don't see trivially how to do it.) (The class of languages is still the same.) And now ready for the questions: - has anyone actually used ELL(1) grammars? experiences? - any ideas on how to implement proper error recovery? - has anybody done table generation for the change I propose? - does the idea of ELL(1) appeal to you? if not, why not? Answers sollicited, discussion more than welcome! Guido van Rossum, CWI, Amsterdam <guido@mcvax.uucp> -- Send compilers mail to ima!compilers or, in a pinch to Levine@YALE.EDU Plausible paths are { ihnp4 | decvax | cbosgd | harvard | yale | bbncca}!ima Please send responses to the originator of the message -- I cannot forward mail accidentally sent back to compilers. Meta-mail to ima!compilers-request