jar@ileaf.com (Jim Roskind x3266) (12/29/89)
I have a reasonably complete C++ grammar, and I was planning to make it fairly public (copyrighted, but available at no charge). I was hoping to find a few reviewers to look at the grammars ahead of time and try to catch any errors I might have made. My privately gathered reviewers have been slow to respond, and so I'm looking for "motivated reviewers". I would define such a reviewer as someone who is actively trying to work with parsing C++, and is VERY interested in looking at clean methods of resolving some of the more bothersome ambiguities. WHAT IS SPECIAL ABOUT THE GRAMMARS 1) the grammars are CLEAN. Both are YACC grammars, but neither use any %prec or %assoc etc. directives. The result is a clear exposition of the ambiguities (and complexities) of each language. In contrast, the grammar provided in the draft ANSI C standard is not YACC-able (it has many conflicts), and semantics of the language do not match the syntax provided (the ANSI syntax's binding together of an init-decl-list BEFORE combining with declaration-specifiers is the simpliest example). (note that IMHO the ANSI C committee did a great job, but producing a machine readable grammar was not part of the job). Similarly, the publically available C grammars (that are nearly YACC-able) that I have seen typically avoid the real problems by getting the grammar wrong (simple test here is to check that typedefnames can be redeclared in an inner scope). Similar problems exist with the C++ grammar supplied in Stroustrup's C++ text, and in the C++ 2.0 ref manual. The attempted dpANSI C grammar that I have written has only 1 s-r conflict (I chose to leave in the if-if-else conflict). This grammar served as a base to distinguish complexities of C++ from those of C. The C grammar only requires a lexer that uses symbol table context to distinguish a typedef-name from an identifier (but of course this grammar ALLOWS redefinition of typedefnames). This grammar also demonstrates many cute techniques for satisfying a LALR(1) parser generator when the weak of heart would often shout for a LR(1), LALR(2), or a lex-hack. Since the C++ grammar is based on the C grammar, my C++ grammar still supports old-style function definitions (a feature the I believe gcc and cfront 2.0 have at least temporarily given up on). The bad news is that there are some very subtle ambiguities remaining in the definition of C++, and YACC is VERY good at bringing these items to the surface. In addition, some of the disambiguating techniques that I have developed require "inline expansion" of rules in order to defer a reduction until the choice IS unambiguous. This last fact provides a confusing multiplier, which leaves a grand total of 29 s-r conflicts, and 7 r-r conflicts in my current C++ grammar. Combining these ambiguities into equivalence classes (to remove the confusing multiplier), gives a total of 9 classes of ambiguities (one of which is the if-if-else conflict). I believe I resolve 6 of these classes CLEARLY correctly, and 3 of them "reasonably". I define "reasonably" to mean that by the time I disambiguate, most human parsers are thoroughly lost, and the language definition is beginning to stretch. (I believe there is a recursive decent parser hiding between lex and YACC in cfront, and so it is hard to compete :-). As another point of comparison, several postings to comp.lang.c++ that reported parsing difficulties in Zortech C++ v1.07 are disambiguated properly by my grammar. Aside from the sparse commentary in the grammar (which uses long descriptive names for nonterminals), I will also include some prose discussing the remaining ambiguities. If you want to have an early look at these grammars and make some comments, please drop me an EMAIL line. Please include some hint of what you do so I can have a feel for who could do the best job of reviewing the docs. Thanks. Respond to: jar@ileaf.com, ...!eddie.mit.edu!ileaf!jar, (before 1/6/90) 617-577-9813 x5570 or Jim Roskind 516 Latania Palm Drive Indialantic FL 32903 (407)729-4348 -- Send compilers articles to compilers@esegue.segue.boston.ma.us {spdcc | ima | lotus}!esegue. Meta-mail to compilers-request@esegue. Please send responses to the author of the message, not the poster.