[comp.lang.c++] yaccable grammars for C and C++: 2 of 4

jar@io.UUCP (Jim Roskind x5570) (03/08/90)
Copyright (C) 1989.1990, James A. Roskind, All rights reserved. 
Abstracting with credit is permitted.  The contents of this file may 
be reproduced electronically, or in printed form, IN ITS ENTIRETY 
with no changes, providing that this copyright notice is intact and 
applicable in the copy.  No other copying is permitted without the 
expressed written consent of the author.

FILENAME: CPLUSPLS.TXT
AUTHOR: Jim Roskind
	Independent Consultant
	516 Latania Palm Drive
	Indialantic FL 32903
	(407)729-4348
	jar@ileaf



    A YACC-able C++ 2.0 GRAMMAR, AND THE RESULTING AMBIGUITIES

ABSTRACT

This paper describes ambiguous aspects of the C++ language that have 
been exposed by the construction of a YACC-able grammar for C++. The 
grammar is provided in a separate file, but there are extensive 
references to the actual grammar.  In light of the fact that not all 
reader will have access to a YACC capable of processing my grammar, I 
have also included a significant excerpt from the verbose output of 
running YACC on my grammar.

Theoretical Note (skip if you hate theory): My C++ grammar does not 
make use of the %prec or %assoc features of YACC, and hence conflicts 
are never hidden within my grammar.  The wondrous result is that, to 
the extent to which my grammar can be seen to span the syntax of the 
language, in all places where YACC reports no conflicts, the grammar 
is a totally unambiguous statement of the syntax of the language.  
This result is remarkable (I think), because the "general case" of 
deciding whether a grammar is ambiguous or not, is equivalent to (the 
unsolvable) halting problem.

Although this paper is terse, and perhaps poorly formed, I believe 
that its content is so significant to my results that it had to be 
included.  I am sorry that I do not have the time at this point to 
better arrange its contents.


CONTENTS
        
	INTRODUCTION
	REVIEW: STANDARD LEXICAL ANALYSIS HACK: TYPEDEFname VS IDENTIFIER
	STATUS OF MY "DISAMBIGUATED" GRAMMAR
        22 EASY CONFLICTS, WITH HAPPY ENDINGS
        2 REDUCTION CONFLICTS WITH STRAIGHT FORWARD DISAMBIGUATION
        3 NOVEL CONFLICT THAT YIELD TO SEMANTIC DISAMBIGUATION
        THE TOUGH AMBIGUITIES: FUNCTION LIKE CASTS AND COMPANY
	SAMPLE RESOLUTIONS OF AMBIGUITIES BY MY C++ GRAMMAR
	DIFFICULT AMBIGUITIES FOR C++ 2.0 PARSER TO TRY
	COMMENTARY ON CURRENT C++ DISAMBIGUATING RULES
        SOURCE OF CONFLICTS IN C++ (MIXING TYPES AND EXPRESSIONS)
	FUNCTION LIKE CAST vs DECLARATION : THE TOUGH EXAMPLE
	CONCLUSION

	APPENDIX A:
	    PROPOSED GRAMMAR MODIFICATIONS (fixing '*', '&' and ':'
					conflicts)
	APPENDIX B:
	    AMBIGUITIES IN MY C++ GRAMMAR AS LISTED BY YACC VERBOSE OPTION



INTRODUCTION

This paper is intended to go into significant detail about the 
ambiguities that I have seen in the C++ 2.0 language, and exposed by 
attempting to develop a YACC compatible (i.e., LALR(1)) grammar.  I 
must point out that this is NOT meant as an attack on the language or 
the originators thereof, but rather an R&D effort to disambiguate 
some details. (I personally believe that Stroustrup et. al. are doing 
a GREAT job). I have the vague hope that the extensive hacking that I 
have done with YACC on the C++ grammar has given me a rather novel 
vantage point. (I have expressed my observations to Bjarne 
Stroustrup, and in doing so verified that other folks had not 
previously identified several of my points).  In light of my 
activities, this paper includes a fair amount of philosophizing. I 
hope that none of this paper assumes too greatly that I have insight 
that is beyond that of the readers, and certainly no insults are 
intended.

If you like investigating grammars directly (rather than reading what 
I have to say), I would heavily suggest you read the ANSI C grammar 
before looking at the C++ grammar.  The C++ grammar is pretty large, 
and you can expect the following stats if you have a similar YACC to 
what I am using:

123/127 terminals, 154/200 nonterminals
586/650 grammar rules, 1076/1200 states
29 shift/reduce, 7 reduce/reduce conflicts reported
245/350 working sets used
memory: states,etc. 12263/60000, parser 9287/12000
477/600 distinct lookahead sets
723 extra closures
6709 shift entries, 15 exceptions
1631 goto entries
4208 entries saved by goto default
Optimizer space used: input 16528/60000, output 9142/12000
9142 table entries, 4109 zero
maximum spread: 351, maximum offset: 1067



REVIEW: STANDARD LEXICAL ANALYSIS HACK: TYPEDEFname VS IDENTIFIER

This section is targeted at readers that are not familiar with 
parsing C with a context free grammar.  The reader should note that 
there are two distinct forms of `identifiers' gathered during lexical 
analysis, and identified as terminal tokens in both of my grammars.  
The terminal tokens are called IDENTIFIER and TYPEDEFname 
respectively.  This distinction is a caused by a fundamental element 
of the C language. The definition of a "TYPEDEFname" is a lexeme that 
looks like a standard identifier, but is currently defined is the 
symbol table as being declared a typedef (or class) name.  All other 
lexemes that appear as identifiers (and are not keywords) are 
tokenized as IDENTIFIERs. The remainder of this section will review 
the motivation for this approach.

Consider the following sample code, which uses the C language:

	...
	int main() { f (*a) [4] ; ...

Most reader will intuitively parse the above sequence as a function 
call to "f", with an argument "*a", the return value of which is 
(probably) a pointer, and hence can be indexed by "4".  Such an 
interpretation presumes that the prior context was something like:

	...
	extern float *a;
	char * f(float);
	int main() { f (*a) [4] ; ...

However, if the prior context was **INSTEAD**:

	...
	typedef int f;
	extern float a;
	int main() { f (*a) [4] ; ...

then the interpretation of the code is entirely different.  The 
fragment in question actually redeclares a local variable "a", such 
that the type of "a" is "pointer to array of 4 int's"!  So we see in 
this example that the statement "f(*a)[4];" cannot be parsed 
independent of context.

The standard solution to the above ambiguity is to allow the lexical 
analyser to form different tokens based on contextual information. 
The contextual information that is used is the answer to the 
question: "Is a given identifier defined as a typedef name (or class 
name) at the current point in the parse"?  I will refer to this 
feedback loop (from the parser that stores information in a symbol 
table, wherein the lexer extracts the information) as the "lex hack".  
With this lex hack in place the code fragment "f(*a)[4];" would be 
provided by the lexer as either:

	IDENTIFIER '(' '*' IDENTIFIER ')' '[' INTEGERconstant ']' ';'

or

	TYPEDEFname '(' '*' IDENTIFIER ')' '[' INTEGERconstant ']' ';'

The two case are very easy for a context free grammar to distinguish, 
and the ambiguity vanishes.  Note that the fact that such a hack is 
used (out of necessity) demonstrates that C is not a context free 
language, but the hack allows us to continue to use an LR(1) parser, 
and a context free grammar.

Note that this hack is, of necessity, made use of in the C++ grammar, 
but no additional feedback (a.k.a.: hack) is required. The interested 
reader should also note that this feedback loop (re: updating the 
symbol table) must be swift, as the lexical analyser cannot proceed 
very far without this contextual information.  This constraint on the 
feedback time often prevents the parser from "deferring" actions, and 
hence increases the pressure on the parser to rapidly disambiguate 
token sequences.


STATUS OF MY "DISAMBIGUATED" GRAMMAR

My grammar is now fairly complete.  Several independent reviews, 
which provided a front end lexical analyser and parsed existing C++ 
code, have verified that the grammars span of the C++ language 
appears complete.  I have not incorporated parametric types nor 
exception handling into the grammars at this point, as they continue 
to be in a state of flux.

The grammar does (I believe) support all the features provided in C++ 
2.0, including multiple inheritance and the enhanced operator "new" 
syntax (includes placement expression). I believe that, except for 
the minor change involving not permitting parentheses around bit 
field names during a declaration, my C++ grammar supports a superset 
of my ANSI C grammar. Note that I haven't inline expanded all the 
rules in the C grammar that were required for C++ disambiguation (re: 
deferring reductions), and hence a `diff' of the two grammars will 
not provide a trivial comparison.  The resulting major advantage of 
this grammar over every other current C++ parser (that I know of) is 
that it supports old style function definitions AS WELL AS all the 
standard C++. (It is my personal belief that such support was dropped 
by many compilers and translators in order to resolve the many syntax 
problems that appear within C++.  I believe this grammar shows that 
such a decision was premature).

My list of shift-reduce and reduce-reduce conflicts is currently: 29 
shift/reduce, 7 reduce/reduce conflicts reported. I have chosen to 
leave so many conflicts in the grammar because I hope to see changes 
to the syntax that will remove them, rather than making changes to my 
grammar that will firmly accept and disambiguate them.

12 SR caused by operator function name with trailing * or & 
8 SR caused by freestore              with trailing * or &
1 SR caused by operator function name OR freestore, with trailing :
1 SR caused by dangling else and my laziness

1 SR and 1 RR caused by operator function name and trailing {

3 SR caused by constructor declaration vs member declaration

3 RR caused by function-like cast vs identifier declaration ambiguity
3 RR caused by function-like cast vs typedef redeclaration ambiguity
3 SR caused by redundant parened TYPEDEFname redeclaration vs old style cast


Of these conflicts, the ones that most C++ parser authors are mainly 
concerned with are the last 9 conflicts, relating to 
function-like-cast vs declaration, and redundant parened TYPEDEFname 
redeclaration vs old style cast.  The following sections breeze 
through the "easy" conflicts, and then talk at length about the 9 
tough ones.


22 EASY CONFLICTS, WITH HAPPY ENDINGS

The first group of 22 SR conflicts:

12 SR caused by operator function name with trailing * or &
8 SR caused by freestore              with trailing * or &
1 SR caused by operator function name OR freestore, with trailing :
1 SR caused by dangling else and my laziness

have very simple resolutions.  If you are reading this, I assume that 
you are already familiar with the if-if-else conflict.

The trailing ':' has to do with the use of the colon in a ternary 
"... ? ... : ..." expression.  The easiest example is:

	void * p = boolean ? new struct S : ...

The problem is that this MIGHT be the first mention of "struct S", 
and this MIGHT be where the elaboration of the structure is provided!  
Under such circumstances what follows the ':' MIGHT be a base class 
name, and the entire curly braced elaboration for S (ugh!).

My resolution of this SR conflict is that "the longest possible type 
is constructed by an LR1 parser".  Hence the ':' is construed to be 
part of the type "struct S".  This is in keeping with the subtle 
statement on in section 7.1 of the C++ 2.0 Ref Man: "The longest 
sequence of decl-specifier that could possibly be a type name is 
taken as the decl-specifiers of a declaration".


The 8 conflicts based "freestore with trailing * or &" can be hinted 
at by the example:

	a = new int * * object;

Is the above the same as:

	a = (new int) * (* T);
or:
	a = (new (int *)) * T;

Again the "longest possible type" is isolated by my grammar.  The 
result is:

	a = (new (int * * )) ...

which leads to a syntax error in my example!  This resolution is 
indeed what is quietly specified for C++ 2.0 (and implemented in 
cfront).  The critical statement and example in the C++ 2.0 Ref Man 
at the end of section 5.3.3 makes this resolution clear.

The 12 conflicts involving "operator function names with trailing * 
or &" are quite similar to what was just presented.  The critical 
fact is that "operator typename" is allowed in the grammar to define 
a function.  Whenever a function is provided, but NOT followed by a 
'(', the address of the function is implicitly taken and used in the 
expression (see draft ANSI C standard for finer details).  For some 
class T, the following MIGHT all be valid statements:

	operator T;
	operator T*;
	operator T**;

If the above are valid, then the interpretation of the following is 
ambiguous:

	operator T * * a;

The above might be interpreted as:

	(operator T) * (* a);
or
	(operator (T *)) * a;

The default LR rules parse the largest possible type, and lead to:

	(operator (T * * )) ...

which in our example leads to a syntax error.  Here again the 
"longest possible type..." rule supports my grammar.  Note that this 
rule is actually a consequence (in my opinion) of the cfront 
implementation via a YACC grammar, and the default resolution of 
conflicts in that grammar.


2 REDUCTION CONFLICTS WITH STRAIGHT FORWARD DISAMBIGUATION

The two conflicts that were described as:

   1 SR and 1 RR caused by operator function name and trailing {

occur when there is an ambiguity as to whether the '{' is the start 
of a function body in a function definition, or the start of a 
structure/class/enum elaboration.  In part, this ambiguity is caused 
by the fact that an arbitrary declarator is used in a function 
definition, but semantics require the declarator in a definition to 
have type "function returning ...".  This subtlety causes the 
following to be a syntactically valid function definition, even 
though it is semantically invalid:

	void func { int a; } ...

Semantic requirements on the declarator demand something more like:

	void func() { int a; }  ...

By using "operator struct S" in place of "func" in the first example 
we get:

	void operator struct S { int a; } ...

My C++ grammar attempts to assemble the longest possible type, and 
hence parses this example as equivalent to:

	void operator (struct S { int a;} ) ...

Interestingly enough, this resolution not only supports the "longest 
possible type" rule, but also avoids the semantically invalid parse.


3 NOVEL CONFLICTS THAT YIELD TO SEMANTIC DISAMBIGUATION

The conflicts that are discussed in this section have been deferred 
(by A LOT of work, and A LOT of inline expansion) to occur when a ';' 
is reached.  At that point, semantic information in the tokens can 
safely be used to decide which of two cases are at hand.

The conflicts referred to as:

    3 SR caused by constructor declaration vs member declaration

occur during a class/struct elaboration.  Consider the following 
class elaborations:

	typedef int T1, T2, T3  ;
	class GOO { int a;} ;
	class FOO {
		GOO    T1  ; // clearly a redefinition of T1
		FOO  ( T2 ); // clearly a constructor
		GOO  ( T3 ); // redefinition of T2
		};

Note that the last two entries in FOO's elaboration "FOO(T2);" and 
"GOO(T3);" are tokenized IDENTICALLY, but must have dramatically 
different meanings. When I first found this ambiguity I was hopeful 
that I could extend the lex hack that distinguishes TYPEDEFnames from 
random IDENTIFIERs, and distinguish something like CURRENTCLASSname. 
Unfortunately, the potential for elaborations within elaborations 
appears to make such a hack unworkable.  In addition, once I got my 
grammar to defer all such ambiguous cases until a ';' was seen, I 
felt confident that the ambiguity was resolved (and the introduction 
of an additional "hack" was unnecessary).


THE TOUGH AMBIGUITIES: FUNCTION LIKE CASTS AND COMPANY

The ambiguities listed in this section pertain to attempts to 
distinguish declaration/types-names from expressions names.  For 
example:

    char *b ="external" ; // declare a variable to confuse us :-)
    main () {
	class S;
	S (*b)[5]; // redeclare "b" pointer to array of 5 S's ?
	       // OR ELSE indirect through b; cast to S; index using 5 ?
	}

The above is what I call the "declaration vs function like cast 
ambiguity".  Awareness about this ambiguity in this context appears 
fairly widespread among C++ parser authors.  The C++ 2.0 Ref Man 
makes explicit reference to this problem in section 6.8 "Ambiguity 
Resolution".  I believe the underlying philosophy provided by the 
Reference Manual is that if a token stream can be interpreted by an 
ANSI C compiler to be a declaration, then a C++ compiler should 
disambiguate in favor of a declaration.  Unfortunately, section 6.8 
goes on to say:

    "To disambiguate, the whole statement may have to be examined to 
    determine if it is an expression-statement, or a declaration. ... 
    The disambiguation is purely syntactic; that is, the meaning of 
    the names, beyond whether they are type-names or not, is not used 
    in the disambiguation".

The above advice only forestalls the inevitable ambiguity, and 
complicates the language in the process.  The examples that follow 
will demonstrate the difficulties.

There are several other contexts where such ambiguities (typedef vs 
expression) arise:

	1) Where a statement is valid (as shown above).
	2) As the argument to sizeof()
	3) Following "new", with the C++ syntax allowing a placement
		expression
	4) Immediately following a left paren in an expression (it might
		be an old style cast, and hence a type name)
	5) Following a left paren, arguments to constructors can be
		confused with prototype type-names.
	6) Recursively in any of the above, following a left paren (what
		follows might be argument expressions, or might be function
		prototype parameter typing)

Examples of simple versions of the sizeof context are:

	class T;
	sizeof ( T    ); // sizeof (type-name)
	sizeof ( T[5] ); // again a type name
	sizeof ( T(5) ); // sizeof (expression)
	sizeof ( T()  ); // semantic error: sizeof "function returning T"?
			// OR ELSE sizeof result of function like cast

Examples of the old style cast ambiguity (still ambiguous when the 
'(' after the 'T' has been seen):

	class T { /* put required declarations here */ 
                };
	a = (T(      5));  // function like cast of 5
	b = (T(      )) 0; // semantic error: cast of 0 to type "function
			// returning T"

In constructors the following demonstrates the problems:

	class T;
	T (b)(int (d)); // same as "T b(int);", a function declaration
	T (d)(int (5)); // same as "T d(5);", an identifier declaration

The problem can appear recursively in the following examples. By 
"recursively" I mean that an ambiguity in the left-context has left 
the parser unsure of whether an "expression" or a "type" is being 
parsed, and the ambiguity is continued by the token sequence.  After 
the parser can determine what this subsequence is, it will in turn be 
able to disambiguating what the prior tokens were.

Recursion on the statement/declaration context:

	class S;
	class T;
	S (*b)(T); // declare b "pointer to function taking T returning S"
	S (*c)(T dummy); // same declaration as for "b"
	int dummy;
	S (*d)(T (dummy)); // This T might be casting dummy

Recursion on the sizeof context is shown in the following examples. 
As before, the examples include semantic errors.

	class T;
	class S;
	sizeof ( T(S dummy) ); // sizeof "function taking S returning T"
	int dummy;
	sizeof ( T(S (dummy)) ); // sizeof "function taking S returning T"
		// OR ELSE cast dummy to S, and then cast that to T, which
			// is the same as "sizeof T;"


The following are the precise conflicts, along with typical contexts.  
I have derived the contexts by manually walking backwards through my 
verbose YACC output. Note that I have deleted some of the 
insignificant rules from the verbose state descriptions in this 
section (insignificant in that they are not involved in the 
conflict).  To see the complete details of each conflict state see 
the final section of this paper.


WARNING: THE REMAINDER OF THIS SECTION IS VERY DETAILED AND TERSE 
(PERHAPS EVEN CRYPTIC); DO NOT TRY TO READ IT CAREFULLY UNLESS YOU 
HAVE A LOT OF TIME TO KILL, AND A LOT OF INTEREST IN THE GRAMMAR.  
(You can skip to the next section, which is a bit less technical and 
terse).

---------------------------------------------------------------------
        Minimal left context:   "main() { int ( identifier"
                Is the identifier being declared?
                Is the identifier a function name?
                Is the identifier an array name?
                   (The last two cases use function like casting to 
                   type "int")

        Left context can include an arbitrary number of '*', '&',  or 
                '('  immediately  to the left of the identifier.  The 
                basic.type.name    "int"    can    also    be     any 
                simple.type.name (e.g., a TYPEDEFname)

        My Default is declarator==> declaration

614: reduce/reduce conflict (red'ns 17 and 22 ) on (
614: reduce/reduce conflict (red'ns 17 and 22 ) on )
614: reduce/reduce conflict (red'ns 17 and 22 ) on [
state 614
	paren.identifier.declarator :  rescoped.identifier_    (17)
	primary.expression :  rescoped.identifier_    (22)

---------------------------------------------------------------------
	Required left context:   "... ( TYPEDEFname ()",
	    where "..." includes a type.specifier.
	    It is assumed that "function returning TYPEDEFname"
		is a valid "type.name".  Semantically this is
		rarely legal, but the focus is on syntax here.

	    The above sequence could eventually parse into:
	    ... ( declarator             )
	    ... ( type.name              ) cast.expression
	    ... ( expression             )
	    ... ( parameter.decl.list    )
            Note that parameter.decl.list is:
                    type.name
                    type.name = default.value , ...
		    type.name , parameter.decl.list
            Expanded examples are:
                sizeof ( expression )
		sizeof ( type.name  )
		   where the argument to "sizeof" is one of the following:
		int ( typename2 ( TYPEDEFname() )
		int ( typename2 ( TYPEDEFname()  ,
		int ( typename2 ( TYPEDEFname() = expression
		int ( TYPEDEFname()
	    Is the TYPEDEFname a declaration specifier?
	    Is TYPEDEFname used as function like cast of 0 args?

        Left  context can include an arbitrary number of '*', '&', or 
		'(' immediately to the left  of  the "(TYPEDEFname".

        Default to type.qualifier.list ==> parameter.type.list
                ==> function.returning.modifier 
		==> redeclaration of TYPEDEFname
		(semantics may outlaw this redeclaration of TYPEDEFname
                as a function WITHOUT use of "extern"!!!)

709: reduce/reduce conflict (red'ns 77 and 67 ) on )
709: reduce/reduce conflict (red'ns 77 and 67 ) on ,
709: reduce/reduce conflict (red'ns 77 and 67 ) on =
state 709
	postfix.expression :  TYPEDEFname ( )_    (77)
	parameter.type.list :  ( )_type.qualifier.list.opt
	type.qualifier.list.opt : _    (67)


---------------------------------------------------------------------
	Minimal left context:   "main() { int ( ( TYPEDEFname"
		Is the TYPEDEFname being redeclared?
		Is TYPEDEFname in parenthesis the start as old style cast?

        Left context can include an arbitrary number of '*', '&',  or 
                '('  between the two '('s.  The basic.type.name "int" 
		can also be any simple.type.name (e.g., a TYPEDEFname)

	Default to typedef.declarator ==> redeclare TYPEDEFname

736: shift/reduce conflict (shift 577, red'n 395) on )
state 736
	type.name :  TYPEDEFname_    (395)
	simple.paren.typedef.declarator :  ( TYPEDEFname_)

---------------------------------------------------------------------
	Minimal left context:   "main() { int ( ( TYPEDEFname[2]"
         OR
	Minimal left context:   "main() { int ( ( TYPEDEFname(float)"
		Is the TYPEDEFname being redeclared?
		Is "array of TYPEDEFname" the type for
                        an old style cast?  The result of the cast
                        expression will undergo a function like cast.

        Left context can include an arbitrary number of '*', '&',  or 
                '('  between the two '('s.  The basic.type.name "int" 
                can also be any simple.type.name (e.g., a TYPEDEFname)

	Default to typedef.declarator ==> redeclare TYPEDEFname.
                (Semantics preclude cast to this type anyway.)

808: shift/reduce conflict (shift 722, red'n 568) on )
state 808
	paren.postfix.typedef.declarator :  ( TYPEDEFname postfixing.abstract.declarator_)
	abstract.declarator :  postfixing.abstract.declarator_    (568)

---------------------------------------------------------------------
	Minimal left context:   "main() { int ( * ( TYPEDEFname"
		Is the TYPEDEFname being redeclared?
		Is TYPEDEFname in parenthesis the start as old style
                    cast? The result of the cast will undergo and 
                    indirection, and the result of that will be 
                    cast to int by a function like cast!

        Left context can include an arbitrary number of '*', '&',  or 
                '('  to the left of the '*'.  The basic.type.name "int" 
                can also be any simple.type.name (e.g., a TYPEDEFname)

	Default to typedef.declarator ==> redeclare TYPEDEFname

809: shift/reduce conflict (shift 715, red'n 395) on )
state 809
	type.name :  TYPEDEFname_    (395)
	paren.typedef.declarator :  indirect.or.reference ( TYPEDEFname_)

---------------------------------------------------------------------


SAMPLE RESOLUTIONS OF AMBIGUITIES BY MY C++ GRAMMAR


Of the "hard examples" given in the C++ reference manual (r6.8), my 
grammar can only "properly" detect a "statement-expression" for the 
stream:

        T(a,5)>>c;

All the other examples default to a declarator after the closing 
parenthesis following the identifier. (See my comments in the 
conclusion section of this paper).

I actually am not sure I agree with all the examples in the C++ 2.0 
Reference Manual. Specifically, the example in section 6.8:

        T (*d) (double(3));  // expression statement

In the example "T" is specified to be a simple-type-name, which 
includes all the basic types as well as class-names, and more. 
Considering the following are valid declarations:

        void *a (0);
        void *b (int(0));
        void (*c)(int(0));

I am unable to see the "syntactic" difference between this last token 
stream and the example just cited in the reference manual. My 
simplistic parser gives me the result that I at least expect.  It 
concludes (prematurely, but seemingly correctly) that the stream is a 
declaration (with a new style initializer).

As a positive note, my grammar is able to parse the example given a 
while back in comp.lang.c++, that Zortech 1.07 cannot parse:

        a = (double(a)/double(b))...;

Apparently, upon seeing "(double" some parsers commit to a 
parenthesized type-name for a cast expression, and cannot proceed to 
parse a parenthesized expression.  No mention of this problem is 
listed in my conflict list, as resolution of this problem is simply a 
matter of letting the LR parser wait long enough before committing.  
Specifically, my grammar has not yet committed when all of:

	a = (double(a)

has been seen!  The next character ('/') allows the grammar to 
unambiguously conclude that the sequence "double(a)..." is an 
expression.


DIFFICULT AMBIGUITIES FOR A "C++ 2.0" COMPATIBLE PARSER TO TRY

Having seen the above contexts, I would be curious to see if other 
C++ front ends with "smart lexers" (such as cfront) can handle the 
following.  I am exploiting in the first example the fact that 
regular expressions are not capable of defining strings with 
arbitrarily nested matching parenthesis.

    main()
      {
      class T 
        { 
        /* ... */
        } a;
      typedef T T1,T2,T3,T4,T5,T7,T8,T9,Ta,Tb,Tc,Td;
	{ /* start inner scope */
        T((T1)         ); // declaration
        T((T2)    a    ); // Statement expression
        T((T3)(       )); // declaration of T3
        T((T4)(T      )); // declaration of T4
        T((T5)(T  a   )); // declaration of T5
        T((T6)(T((a) ))); // declaration of T6
        T((T7)(T((T) ))); // declaration of T7
        T((T8)(T((T)b))); // statement expression

        T(b[5]); // declaration
        T(c());  // declaration
        T(d()[5]);  // statement expression ? (function returning array 
                      // is semantically illegal, but syntactically proper)
        T(e[5]());  // statement expression ? (No array of functions)
        T(f)[5]();  // statement expression ?  "                   "
        
        T(*Ta() )[5] [4];  //declaration
        T(*Tb() [5]) [4];  //statement expression ? (function returning array)

        T(*Tc()) [3 +2];  //declaration
        T(*Td()) [3 ]+2;  //statement expression

        }
      }        


COMMENTARY ON C++ 2.0 DISAMBIGUATING RULES

There are two distinct thrusts in conflict disambiguation.  The first 
thrust is "parse tokens into the longest possible declarator, and 
identify the syntax errors that result".  The second thrust is "use 
massive external technology ("smart lexer", a.k.a.: "recursive decent 
parser that helps the lexer") to look ahead, so that you don't 
mis-parse a function-like-cast as a declaration and induce a syntax 
error".  The first is a commitment to LR parser technology, and an 
existing grammar (which could be cleaned up).  The second is a 
commitment to NOT use an LR parser, and to the use of an existing 
implementation.




SOURCE OF CONFLICTS IN C++ (MIXING TYPES AND EXPRESSIONS)

One fundamental strength in C is the similarity between declarations 
and expressions.  The syntax of the two is intended to be very 
similar, and the result is a clean declaration and expression syntax. 
(It takes some getting used to, but it is in my opinion good).  
Unfortunately, there are some slight distinctions between types and 
expressions, which Ritchie et. al.  apparently noticed.  It is for 
this reason (I am guessing) that the C cast operator REQUIRES that 
the type be enclosed in parenthesis.  Moreover, there is also a clear 
separator in a declaration between the declarator and the 
initializing expression (the '=') (as some of you know, there is some 
interesting history in this area.).  The bottom line (as seen with 
20-20 hindsight) is: "keep declarations and expressions separate".  
Each violation of this basic rule has induced conflicts.


To be concrete about the differences between types and expressions, 
the following two distinctions are apparent:

    1) Abstract declarators are permitted.  No analogy is provided in 
    expressions.  The notable distinction is that abstract 
    declarators include the possibility of trailing '*' operators.
        
    2) The binding of elements in a declaration is very different 
    from any expression.  Notably, the declaration-specifiers are 
    bound separately to each declarator in the comma separated list 
    of declarators.  With (most forms of) expressions, a comma 
    provides a major isolation between expressions.

C also used reserved names to GREATLY reduce the complexity of 
parsing.  The introduction of typedef names increased the complexity, 
but a simple hack between lex and YACC overcame the problem.  An 
example is the statement:

        name (*b)[4];

Note that this is ambiguous, EVEN in ANSI C, IF there is no 
distinction between type-names and function names!  (i.e., "b" could 
be getting redeclared to be of type "pointer to array of name", OR 
the function "name" could be called with argument "*b", the result of 
which is indexed to the 4th element). In C, the two kinds of names 
(TYPEDEFnames and function names (a.k.a.: declared identifiers)) 
share a name space, and at every point in a source program the (hack) 
contextual distinction can be made by the tokenizer.  Hacks are neat 
things in that the less you use them, the more likely they are to 
work when you REALLY need them (i.e., you don't have to fight with 
existing hacks).  Having worked on designing and implementing a C 
compiler, I was pleasantly amazed at how the constructs all fell 
together.

The major violations of this approach (i.e., keep declaration 
separate from expressions) that come to mind with C++ are: 

	function-like-casts,
	freestore expressions without parens around the type,
	conversion function names using arbitrary type specifiers,
	parenthesized initializers that drive constructors.

The last problem, parenthesized initializers, provide two areas of 
conflicts.  The first is really an interference issue with old style 
C function definitions, which only bothers folks at file scope (GNU's 
G++ compiler considered this to be too great an obstacle, and they 
don't currently support old style C definitions!).  The second part 
of this conflict involves a more subtle separation between the 
declarator, and the initializer. (K&R eventually provided an equal 
sign as an unequivocal separator, but parens used in C++ are already 
TOO overloaded to separate anything).  The significance of this lack 
of a clear separator is that it is difficult to decide that the 
"declarator" is complete, and that the declared name should be added 
to the scope.  The last problem does interact in a nasty way with the 
function-like cast vs declaration conflicts.  The parened initializer 
provides another context where it is difficult to distinguish between 
expressions (a true argument list for the constructor) and a 
declaration continuation (a parameter type list).  

The second problem listed falls out of the "new-expression" with an 
unparenthesized type.  This form of freestore (such as "new int * *") 
allows types to be placed adjacent to expressions, and the trailing 
'*' ambiguity rears its head.  I can easily prove that this is the 
culprit in terms of specific ambiguities, in that removing these 
(unnecessary?) forms significantly disambiguates the grammar. (It is 
rather nice to use YACC as a tool to prove that a grammar is 
unambiguous!).  It is interesting to note that if only the derivation 
of a freestore expression were limited to (using the non-terminal 
names of the form that the C++ Reference manual uses):

        new placement-opt ( type-name )  parened-arg-list-opt

then all the LR(1) reduce conflicts based on this problem would 
vanish.  Indeed, the culprit can clearly be shown to be:

        new placement-opt  restricted-type-name parened-arg-list-opt

The characters which excite these reduction conflicts are '*', '&', 
and ':'.  The context in which the ':' is significant occurs when the 
freestore expression is the middle expression of the ternary operator 
set "?:".  In this ternary operator context, the use of a type name 
such as "class a" leaves the LR(1) parser confused about the meaning 
of a ':' that follows.

The third problem that I indicated involves the 
conversion-function-name.  Here again, if the syntax were restricted 
to ONLY:

        operator simple-type-name

then the LR(1) conflicts would vanish.  It is interesting to note 
that the keyword "operator" serves as the left separator, and the 
restriction to "simple-type-name" results in an implicit right 
separator (simple-type-names are exactly one token long).  The 
conflicts appear when multiple tokens are allowed for a declaration 
specifier, and an optional pointer-modifier list may be added as a 
postfix.  The conflicts that result from this lack of separation 
include all those provided by the freestore example, and an 
additional set as well.  The additional conflicts are not 
semantically significant, but they are noticeable to a compiler 
writer. The interesting new trailing character conflict is '{'.  The 
context for this conflict involves the definition of a conversion 
function, which always includes a function body (with a leading 
character of '{' ).  A simple grammar does not SYNTACTICALLY require 
that "function returning modifier" follow the 
conversion-function-name in all declarations/definitions, although 
semantics do require such.  Hence an LR(1) conflict occurs when the 
type-name is of the form "struct A", and a possible structure 
elaboration may follow (with leading character '{' ).

Here again (as with the unambiguous version of freestore) the syntax 
could be extended to:

(Note to Roskind: the following examples use DRM style names, and not 
our own id names)

    conversion-function-name:
        operator simple-type-name
        operator ( type-name )

and the ambiguities would vanish (and the expressivity would not be 
diminished).


FUNCTION LIKE CAST VS DECLARATION AMBIGUITIES

The real big culprit (i.e., my anti-favorite) in this whole ambiguity 
set (re: keeping types and expressions separate) is the 
function-like-cast.  The reason why it is so significant (to an LR 
parser) is that the binding of a type-name, when used in a 
function-like-cast, is directly to the following parenthesized 
argument list.  In contrast, the binding of a type-name when used in 
a declaration is to all the "stuff" that follows, up until a 
declarator termination mark like a ',', ';' or '='. Life really gets 
tough for LR folks when the parse tree MUST be reduced, but you just 
can't tell how yet.  With this problem, the hacks began to appear 
(re: the "smart lexer").  Note that these new style casts are much 
more than a notational convenience in C++. The necessity of the 
function like cast lies in the fact that such a cast can take several 
arguments, whereas the old style cast is ALWAYS a unary operator.

I was (past tense) actually working towards resolving this problem 
via some standard methods that I have developed (re: inline expansion 
of rules to provide deferred reduction).  I was (past tense) also 
using one more sneaky piece of work to defer the reductions, as I was 
carefully making use of right recursion (instead of the standard left 
recursion) in order to give the parser a chance to build up more 
context.  I can demonstrate the usefulness of right recursive 
grammars in disambiguating difficult toy grammars.  Unfortunately, I 
realized at some point that I NEEDED to perform certain actions (such 
as add identifiers to the symbol table) in order to complete the 
parse!?!  This was my catch 22.  I could POSSIBLY parse using an LALR 
grammar, if I could only defer actions until I had enough context to 
disambiguate.  Unfortunately, I HAD to perform certain actions (re: 
modify the symbol table, which changed the action of the tokenizer) 
BEFORE I could continue to examine tokens!  In some terrible sense, 
the typedef hack had come back to haunt me.

I backed off a bit in my grammar after reaching this wall, and now my 
grammar only waits until it reaches the identifier in the would be 
declarator.  I really didn't want to parse the stuff after the 
identifier name (sour grapes?), because I knew I would not (for 
example) be able to identify a "constant expression" in an array 
subscript (most of the time, if it isn't constant, then it can't be a 
declaration).  I don't believe that a compiler should compete in a 
battle of wits with the programmer, and the parser was already 
beginning to outwit me (i.e., I was having a hard time parsing stuff 
mentally that is provided as examples in the 2.0 Reference Manual).   


FUNCTION LIKE CAST vs DECLARATION : THE TOUGH EXAMPLE:        
        
The following is about the nastiest example that I have been able to 
construct for this ambiguity group.  I am presenting it here just in 
case someone is left with a thought that there is "an easy way out"

The fact that identifiers can remain ambiguous SO LONG after 
encountering them can cause no end of trouble to the parser.  The 
following example does not succumb to static (re: no change to the 
symbol table) anticipatory lexing of a statement.  As such, it 
demonstrates the futility of attempting to use a "smart lexer" to 
support the philosophy: "If it can be interpreted as a declaration, 
then so be it; otherwise it is an expression".  This ambiguous 
example exploits the fact that declarators MUST be added to the 
symbol table as soon as they are complete (and hence they mask 
external declarations).

First I will present the example without comments:

class Type1 {
            Type1 operator()(int);
            } ;
class wasType2 { ...}; 
int (*c2)(Type1 dummy);

main ()
    {
    int    a1  = 3, new_var  (4), (*c1)(int     (a1));
    Type1 (a2) = 3, wasType2 (4), (*c2)(wasType2(a1));
    }




Now to repeat the example with comments:

class Type1 {....
            Type1 operator()(int);
            } ;
class wasType2 { ....}; /* we will almost redeclare this typename!?! */
int (*c2)(Type1 dummy); /* we will NOT redeclare this */

main ()
    {
    /* The first line is indeed simple.  It is simply placed there
    to hint at how the second line MIGHT analogously be parsed. 
    There is a minor semantic error, but we will not pay attention to
    it in this example.*/    
    
    int    a1  = 3, new_var  (4), (*c1)(int     (a1));

    /* Static lexing of what follows will suggest that both of the 
    following are declarations.  The second is actually 3 comma 
    separated expressions!! The explanation that follows shows that 
    assuming the 2nd statement is a declaration leads to a 
    contradiction. */
    
    Type1 (a2) = 3, wasType2 (4), (*c2)(wasType2(a1));
    
    /*  Assume  the  second statement is a declaration.  Note that by 
    the time "c2" is parsed, "wasType2" has been redeclared to  be  a 
    variable  of  type  "Type1".   Hence "wasType2(a1)" is actually a 
    function  call  to  "wasType2.operator()(a1)".  It  follows  that 
    "(*c2)(wasType2(a1))"  is  an  expression,  and NOT a declarator!  
    Since this last entry is not a declarator, the  entire  statement 
    must  be an expression (ugh! it is time to backtrack). After much 
    work on my part, I think it might even be  a  semantically  valid 
    expression.   Once this backtracking is complete, we see that the 
    first expression "Type1 (a2) = 3" is  an  assignment  to  a  cast 
    expression.  The second expression "wasType2 (4)", is a cast of a 
    constant.  The  third  expression  "(*c2)(wasType2(a1))",  is  an 
    indirect  function  call.  The argument of the call is the result 
    of a cast.  Note that "wasType2" is  actually  never  redeclared, 
    but it was close! */
    
    /* For those of you who can parse this stuff in your sleep, and 
    noticed the slight error in the above argument, I have the 
    following "fix".  The error is that the 
        "(*c2)(wasType2(a1))" 
    could actually be a declaration with a parenthesized initializer.  
    I could have change this token subsequence to:
        "(*(*c2)(wasType2(a1)))(int(a1))", and avoid the constructor 
    ambiguity, but it would only complicate the discussion. */

    /*  Two  parens  are  all a user would need to add to the cryptic 
    example to  unambiguously  specify  that  this  statement  is  an 
    expression.  Specifically: */
    
       (Type1) (a2) = 3, wasType2 (4), (*c2)(wasType2(a1));
    /* or ...*/
       (Type1 (a2) = 3), wasType2 (4), (*c2)(wasType2(a1));

    /*  I would vote for a syntax error in the ambiguous stream, with 
    an early decision that it was a declaration.  After  seeing  this 
    example, I doubt that I could quickly assert that I could produce 
    a non-backtracking parser that disambiguates statements according 
    to  your  rule.  I  am  sure I can forget about a simple lex-YACC 
    combination doing it. */

    }


Most simply put, if a "smart lexer" understands these: a) I am 
impressed, b) Why use a parser when your lexer can parse so well?

The bottom line is that disambiguation of declarations via "If it can 
be a declaration, then it is one", seems to require a backtracking 
parser. (Or some very fancy parsing approach).  I am not even sure if 
the above examples are as bad as it can get!


CONCLUSION

I believe that the C++ grammar that I have made available represents 
a viable machine readable standard for the syntax description of the 
C++ language.  In cases where the ambiguities are still exposed by 
conflict (as noted by YACC), to further defer resolution would be 
detrimental to a user.  I see no benefit in describing a computer 
language that must support human writers, but cannot be understood by 
humans.  Any code that exploits such deferral is inherently 
non-portable, and deserves to be diagnosed as an error (my grammar 
asserts a "syntax error").  Rather than dragging the language into 
support for a ad-hoc (at least in this area) implementation such as 
cfront (and the "smart lexer"), I would heavily suggest the use of my 
grammar.  I do not believe that my grammar would "break" much 
existing code, but in cases where it would, the code would not be 
portable anyway (other than to a port of an IDENTICAL parser).

I hope to see a great deal of use of my grammars, and I believe that 
the standardizing on the represented syntax will unify the C++ 
language greatly.

	Jim Roskind
	Independent Consultant
	516 Latania Palm Drive
	Indialantic FL 32903
	(407)729-4348
	jar@ileaf


APPENDIX A:
  PROPOSED GRAMMAR MODIFICATIONS (fixing '*', '&' and ':' conflicts)

Based on the other items described above, I have the following 
suggestions for cleaning up the grammar definition.  Unfortunately, 
it provides subtle variations from the "C++ 2.0" standard.


Current Grammar:

operator.function.name :
        OPERATOR any.operator
        | OPERATOR TYPEDEFname     operator.function.ptr.opt
        | OPERATOR basic.type.name operator.function.ptr.opt
        | OPERATOR type.specifier      operator.function.ptr.opt
        | OPERATOR type.qualifier.list
        | OPERATOR type.qualifier.list operator.function.ptr
        ;

operator.new.type:
	type.specifier    operator.new.declarator.opt
		operator.new.initializer
	| basic.type.name operator.new.declarator.opt
		operator.new.initializer
	| TYPEDEFname     operator.new.declarator.opt
		operator.new.initializer
	;

Proposed new grammar (which requires parens around complex types):

operator.function.name :
        OPERATOR any.operator
	| OPERATOR TYPEDEFname
	| OPERATOR basic.type.name
	| OPERATOR '(' TYPEDEFname     operator.function.ptr.opt ')'
	| OPERATOR '(' basic.type.name operator.function.ptr.opt ')'
	| OPERATOR '(' type.specifier      operator.function.ptr.opt ')'
	| OPERATOR '(' type.qualifier.list ')'
	| OPERATOR '(' type.qualifier.list operator.function.ptr ')'
        ;

operator.new.type:
	basic.type.name
	| TYPEDEFname
	| '(' type.specifier  operator.new.declarator.opt
		operator.new.initializer ')'
	| '(' basic.type.name operator.new.declarator.opt
		operator.new.initializer ')'
	| '(' TYPEDEFname     operator.new.declarator.opt
		operator.new.initializer ')'
	;

The impact of the above changes is that all complex type names (i.e.: 
names that are not simply a typedef/class name, or a basic type name 
like char) must be enclosed in parenthesis in both `new ...' and 
`operator ...' expressions. Both of the above changes would clear up 
a number of ambiguities.  In some sense, the current "disambiguation" 
(of trailing '*', '&', and ':') is really a statement that whatever 
an LR(1) parser cannot disambiguate is a syntax error.  In contrast, 
the above rules define an unambiguous grammar.


APPENDIX B:
  AMBIGUITIES IN MY C++ GRAMMAR AS LISTED BY YACC VERBOSE OPTION

178: shift/reduce conflict (shift 179, red'n 267) on :
178: reduce/reduce conflict (red'ns 267 and 261 ) on {
state 178
	aggregate.name :  aggregate.key identifier.or.typedef.name_derivation.opt $$265 { member.declaration.list.opt }
	aggregate.name :  aggregate.key identifier.or.typedef.name_    (267)
	derivation.opt : _    (261)

	:  shift 179
	{  reduce 261
	.  reduce 267

	derivation.opt  goto 399

183: shift/reduce conflict (shift 410, red'n 348) on {
state 183
	enum.name :  ENUM identifier.or.typedef.name_{ enumerator.list }
	enum.name :  ENUM identifier.or.typedef.name_    (348)

	{  shift 410
	.  reduce 348


185: shift/reduce conflict (shift 50, red'n 32) on *
185: shift/reduce conflict (shift 51, red'n 32) on &
state 185
	operator.function.name :  OPERATOR TYPEDEFname_operator.function.ptr.opt
	typedef.type.specifier :  TYPEDEFname_type.qualifier
	operator.function.ptr.opt : _    (32)

	CONST  shift 63
	VOLATILE  shift 64
	TYPEDEFname  shift 415
	*  shift 50
	&  shift 51
	.  reduce 32

	operator.function.ptr.opt  goto 411
	operator.function.ptr  goto 412
	pointer.operator  goto 413
	indirect.or.reference  goto 414
	type.qualifier  goto 131

186: shift/reduce conflict (shift 50, red'n 32) on *
186: shift/reduce conflict (shift 51, red'n 32) on &
state 186
	operator.function.name :  OPERATOR basic.type.name_operator.function.ptr.opt
	basic.type.specifier :  basic.type.name_basic.type.name
	basic.type.specifier :  basic.type.name_type.qualifier
	operator.function.ptr.opt : _    (32)

	DOUBLE  shift 42
	INT  shift 39
	LONG  shift 40
	CHAR  shift 37
	CONST  shift 63
	FLOAT  shift 41
	SHORT  shift 38
	UNSIGNED  shift 44
	SIGNED  shift 43
	VOID  shift 36
	VOLATILE  shift 64
	TYPEDEFname  shift 415
	*  shift 50
	&  shift 51
	.  reduce 32

	operator.function.ptr.opt  goto 416
	basic.type.name  goto 123
	operator.function.ptr  goto 412
	pointer.operator  goto 413
	indirect.or.reference  goto 414
	type.qualifier  goto 124

187: shift/reduce conflict (shift 50, red'n 32) on *
187: shift/reduce conflict (shift 51, red'n 32) on &
state 187
	operator.function.name :  OPERATOR type.specifier_operator.function.ptr.opt
	operator.function.ptr.opt : _    (32)

	TYPEDEFname  shift 415
	*  shift 50
	&  shift 51
	.  reduce 32

	operator.function.ptr.opt  goto 417
	operator.function.ptr  goto 412
	pointer.operator  goto 413
	indirect.or.reference  goto 414

188: shift/reduce conflict (shift 50, red'n 30) on *
188: shift/reduce conflict (shift 51, red'n 30) on &
state 188
	operator.function.name :  OPERATOR type.qualifier.list_    (30)
	operator.function.name :  OPERATOR type.qualifier.list_operator.function.ptr
	type.qualifier.list :  type.qualifier.list_type.qualifier
	basic.type.specifier :  type.qualifier.list_basic.type.name
	sue.type.specifier :  type.qualifier.list_elaborated.type.name
	typedef.type.specifier :  type.qualifier.list_TYPEDEFname

	DOUBLE  shift 42
	INT  shift 39
	STRUCT  shift 68
	LONG  shift 40
	ENUM  shift 66
	CHAR  shift 37
	UNION  shift 69
	CONST  shift 63
	FLOAT  shift 41
	SHORT  shift 38
	UNSIGNED  shift 44
	SIGNED  shift 43
	VOID  shift 36
	VOLATILE  shift 64
	CLASS  shift 70
	TYPEDEFname  shift 419
	*  shift 50
	&  shift 51
	.  reduce 30

	basic.type.name  goto 147
	operator.function.ptr  goto 418
	pointer.operator  goto 413
	indirect.or.reference  goto 414
	type.qualifier  goto 146
	elaborated.type.name  goto 148
	aggregate.name  goto 48
	enum.name  goto 49
	aggregate.key  goto 65

412: shift/reduce conflict (shift 50, red'n 33) on *
412: shift/reduce conflict (shift 51, red'n 33) on &
state 412
	operator.function.ptr.opt :  operator.function.ptr_    (33)
	operator.function.ptr :  operator.function.ptr_pointer.operator
	operator.function.ptr :  operator.function.ptr_indirect.or.reference

	TYPEDEFname  shift 415
	*  shift 50
	&  shift 51
	.  reduce 33

	pointer.operator  goto 608
	indirect.or.reference  goto 609

418: shift/reduce conflict (shift 50, red'n 31) on *
418: shift/reduce conflict (shift 51, red'n 31) on &
state 418
	operator.function.name :  OPERATOR type.qualifier.list operator.function.ptr_    (31)
	operator.function.ptr :  operator.function.ptr_pointer.operator
	operator.function.ptr :  operator.function.ptr_indirect.or.reference

	TYPEDEFname  shift 415
	*  shift 50
	&  shift 51
	.  reduce 31

	pointer.operator  goto 608
	indirect.or.reference  goto 609

470: shift/reduce conflict (shift 50, red'n 104) on *
470: shift/reduce conflict (shift 51, red'n 104) on &
state 470
	operator.new.type :  type.specifier_operator.new.declarator.opt operator.new.initializer
	operator.new.declarator.opt : _    (104)

	*  shift 50
	&  shift 51
	[  shift 649
	.  reduce 104

	indirect.or.reference  goto 648
	operator.new.declarator.opt  goto 646
	operator.new.array.declarator  goto 647

471: shift/reduce conflict (shift 50, red'n 104) on *
471: shift/reduce conflict (shift 51, red'n 104) on &
state 471
	operator.new.type :  basic.type.name_operator.new.declarator.opt operator.new.initializer
	basic.type.specifier :  basic.type.name_basic.type.name
	basic.type.specifier :  basic.type.name_type.qualifier
	operator.new.declarator.opt : _    (104)

	DOUBLE  shift 42
	INT  shift 39
	LONG  shift 40
	CHAR  shift 37
	CONST  shift 63
	FLOAT  shift 41
	SHORT  shift 38
	UNSIGNED  shift 44
	SIGNED  shift 43
	VOID  shift 36
	VOLATILE  shift 64
	*  shift 50
	&  shift 51
	[  shift 649
	.  reduce 104

	basic.type.name  goto 123
	indirect.or.reference  goto 648
	operator.new.declarator.opt  goto 650
	operator.new.array.declarator  goto 647
	type.qualifier  goto 124

472: shift/reduce conflict (shift 50, red'n 104) on *
472: shift/reduce conflict (shift 51, red'n 104) on &
state 472
	operator.new.type :  TYPEDEFname_operator.new.declarator.opt operator.new.initializer
	typedef.type.specifier :  TYPEDEFname_type.qualifier
	operator.new.declarator.opt : _    (104)

	CONST  shift 63
	VOLATILE  shift 64
	*  shift 50
	&  shift 51
	[  shift 649
	.  reduce 104

	indirect.or.reference  goto 648
	operator.new.declarator.opt  goto 651
	operator.new.array.declarator  goto 647
	type.qualifier  goto 131

614: reduce/reduce conflict (red'ns 17 and 22 ) on (
614: reduce/reduce conflict (red'ns 17 and 22 ) on )
614: reduce/reduce conflict (red'ns 17 and 22 ) on [
state 614
	paren.identifier.declarator :  rescoped.identifier_    (17)
	primary.expression :  rescoped.identifier_    (22)

	(  reduce 17
	)  reduce 17
	[  reduce 17
	.  reduce 22


648: shift/reduce conflict (shift 50, red'n 104) on *
648: shift/reduce conflict (shift 51, red'n 104) on &
state 648
	operator.new.declarator.opt :  indirect.or.reference_operator.new.declarator.opt
	operator.new.declarator.opt : _    (104)

	*  shift 50
	&  shift 51
	[  shift 649
	.  reduce 104

	indirect.or.reference  goto 648
	operator.new.declarator.opt  goto 756
	operator.new.array.declarator  goto 647

709: reduce/reduce conflict (red'ns 77 and 67 ) on )
709: reduce/reduce conflict (red'ns 77 and 67 ) on ,
709: reduce/reduce conflict (red'ns 77 and 67 ) on =
state 709
	postfix.expression :  TYPEDEFname ( )_    (77)
	parameter.type.list :  ( )_type.qualifier.list.opt
	type.qualifier.list.opt : _    (67)

	)  reduce 67
	,  reduce 67
	CONST  shift 63
	VOLATILE  shift 64
	ELLIPSIS  reduce 67
	=  reduce 67
	.  reduce 77

	type.qualifier.list  goto 517
	type.qualifier.list.opt  goto 516
	type.qualifier  goto 46

736: shift/reduce conflict (shift 577, red'n 395) on )
state 736
	class.rescoped.identifier :  TYPEDEFname_CLCL identifier.or.typedef.name
	class.rescoped.identifier :  TYPEDEFname_CLCL operator.function.name
	class.rescoped.identifier :  TYPEDEFname_CLCL ~ TYPEDEFname
	class.rescoped.identifier :  TYPEDEFname_CLCL class.rescoped.identifier
	postfix.expression :  TYPEDEFname_( )
	postfix.expression :  TYPEDEFname_( argument.expression.list )
	typedef.type.specifier :  TYPEDEFname_type.qualifier
	type.name :  TYPEDEFname_    (395)
	type.name :  TYPEDEFname_abstract.declarator
	paren.postfix.typedef.declarator :  ( TYPEDEFname_postfixing.abstract.declarator )
	simple.paren.typedef.declarator :  ( TYPEDEFname_)
	pointer.operator :  TYPEDEFname_CLCL * type.qualifier.list.opt

	(  shift 660
	)  shift 577
	CONST  shift 63
	VOLATILE  shift 64
	TYPEDEFname  shift 415
	CLCL  shift 127
	*  shift 50
	&  shift 51
	[  shift 98
	.  error

	pointer.operator  goto 657
	indirect.or.reference  goto 656
	type.qualifier  goto 131
	postfixing.abstract.declarator  goto 808
	parameter.type.list  goto 97
	abstract.declarator  goto 543
	unary.abstract.declarator  goto 530
	postfix.abstract.declarator  goto 531
	array.abstract.declarator  goto 96

808: shift/reduce conflict (shift 722, red'n 568) on )
state 808
	paren.postfix.typedef.declarator :  ( TYPEDEFname postfixing.abstract.declarator_)
	abstract.declarator :  postfixing.abstract.declarator_    (568)

	)  shift 722
	.  error


809: shift/reduce conflict (shift 715, red'n 395) on )
state 809
	class.rescoped.identifier :  TYPEDEFname_CLCL identifier.or.typedef.name
	class.rescoped.identifier :  TYPEDEFname_CLCL operator.function.name
	class.rescoped.identifier :  TYPEDEFname_CLCL ~ TYPEDEFname
	class.rescoped.identifier :  TYPEDEFname_CLCL class.rescoped.identifier
	postfix.expression :  TYPEDEFname_( )
	postfix.expression :  TYPEDEFname_( argument.expression.list )
	typedef.type.specifier :  TYPEDEFname_type.qualifier
	type.name :  TYPEDEFname_    (395)
	type.name :  TYPEDEFname_abstract.declarator
	paren.typedef.declarator :  indirect.or.reference ( TYPEDEFname_)
	paren.postfix.typedef.declarator :  ( TYPEDEFname_postfixing.abstract.declarator )
	pointer.operator :  TYPEDEFname_CLCL * type.qualifier.list.opt

	(  shift 660
	)  shift 715
	CONST  shift 63
	VOLATILE  shift 64
	TYPEDEFname  shift 415
	CLCL  shift 127
	*  shift 50
	&  shift 51
	[  shift 98
	.  error

	pointer.operator  goto 657
	indirect.or.reference  goto 656
	type.qualifier  goto 131
	postfixing.abstract.declarator  goto 808
	parameter.type.list  goto 97
	abstract.declarator  goto 543
	unary.abstract.declarator  goto 530
	postfix.abstract.declarator  goto 531
	array.abstract.declarator  goto 96

810: shift/reduce conflict (shift 876, red'n 428) on ELSE
state 810
	selection.statement :  IF ( expression ) statement_    (428)
	selection.statement :  IF ( expression ) statement_ELSE statement

	ELSE  shift 876
	.  reduce 428


947: shift/reduce conflict (shift 985, red'n 548) on ;
state 947
	constructor.conflicting.parameter.list.and.body :  ( TYPEDEFname )_;
	constructor.conflicting.parameter.list.and.body :  ( TYPEDEFname )_type.qualifier.list ;
	constructor.conflicting.parameter.list.and.body :  ( TYPEDEFname )_constructor.init.list.opt compound.statement
	constructor.conflicting.parameter.list.and.body :  ( TYPEDEFname )_type.qualifier.list constructor.init.list.opt compound.statement
	simple.paren.typedef.declarator :  ( TYPEDEFname )_    (548)
	constructor.init.list.opt : _    (516)

	CONST  shift 63
	VOLATILE  shift 64
	:  shift 81
	;  shift 985
	{  reduce 516
	.  reduce 548

	type.qualifier.list  goto 986
	type.qualifier  goto 46
	constructor.init.list  goto 988
	constructor.init.list.opt  goto 987

984: shift/reduce conflict (shift 1021, red'n 340) on ;
state 984
	member.conflict.paren.postfix.declaring.item :  TYPEDEFname ( TYPEDEFname postfixing.abstract.declarator )_member.pure.opt
	constructor.conflicting.typedef.declarator :  ( TYPEDEFname postfixing.abstract.declarator )_type.qualifier.list ;
	constructor.conflicting.typedef.declarator :  ( TYPEDEFname postfixing.abstract.declarator )_type.qualifier.list constructor.init.list.opt compound.statement
	constructor.conflicting.typedef.declarator :  ( TYPEDEFname postfixing.abstract.declarator )_;
	constructor.conflicting.typedef.declarator :  ( TYPEDEFname postfixing.abstract.declarator )_constructor.init.list.opt compound.statement
	member.pure.opt : _    (340)
	constructor.init.list.opt : _    (516)

	CONST  shift 63
	VOLATILE  shift 64
	:  shift 81
	=  shift 892
	;  shift 1021
	{  reduce 516
	.  reduce 340

	type.qualifier.list  goto 1020
	type.qualifier  goto 46
	member.pure.opt  goto 1019
	constructor.init.list  goto 988
	constructor.init.list.opt  goto 1022

1008: shift/reduce conflict (shift 1021, red'n 340) on ;
state 1008
	member.conflict.paren.postfix.declaring.item :  declaration.specifier ( TYPEDEFname postfixing.abstract.declarator )_member.pure.opt
	constructor.conflicting.typedef.declarator :  ( TYPEDEFname postfixing.abstract.declarator )_type.qualifier.list ;
	constructor.conflicting.typedef.declarator :  ( TYPEDEFname postfixing.abstract.declarator )_type.qualifier.list constructor.init.list.opt compound.statement
	constructor.conflicting.typedef.declarator :  ( TYPEDEFname postfixing.abstract.declarator )_;
	constructor.conflicting.typedef.declarator :  ( TYPEDEFname postfixing.abstract.declarator )_constructor.init.list.opt compound.statement
	member.pure.opt : _    (340)
	constructor.init.list.opt : _    (516)

	CONST  shift 63
	VOLATILE  shift 64
	:  shift 81
	=  shift 892
	;  shift 1021
	{  reduce 516
	.  reduce 340

	type.qualifier.list  goto 1020
	type.qualifier  goto 46
	member.pure.opt  goto 1047
	constructor.init.list  goto 988
	constructor.init.list.opt  goto 1022


123/127 terminals, 154/200 nonterminals
586/650 grammar rules, 1076/1200 states
29 shift/reduce, 7 reduce/reduce conflicts reported
245/350 working sets used
memory: states,etc. 12263/60000, parser 9287/12000
477/600 distinct lookahead sets
723 extra closures
6709 shift entries, 15 exceptions
1631 goto entries
4208 entries saved by goto default
Optimizer space used: input 16528/60000, output 9142/12000
9142 table entries, 4109 zero
maximum spread: 351, maximum offset: 1067