[comp.lang.c++] yaccable grammars for C and C++

jar@io.UUCP (Jim Roskind x5570) (03/08/90)

%{

    /* Copyright (C) 1989,1990 James A. Roskind, All rights reserved. 
    This grammar was developed  and  written  by  James  A.  Roskind. 
    Copying  of  this  grammar  description, as a whole, is permitted 
    providing this notice is intact and applicable  in  all  complete 
    copies.   Translations as a whole to other parser generator input 
    languages  (or  grammar  description  languages)   is   permitted 
    provided  that  this  notice is intact and applicable in all such 
    copies,  along  with  a  disclaimer  that  the  contents  are   a 
    translation.   The reproduction of derived text, such as modified 
    versions of this grammar, or the output of parser generators,  is 
    permitted,  provided  the  resulting  work includes the copyright 
    notice "Portions Copyright (c)  1989,  1990  James  A.  Roskind". 
    Derived products, such as compilers, translators, browsers, etc., 
    that  use  this  grammar,  must also provide the notice "Portions 
    Copyright  (c)  1989,  1990  James  A.  Roskind"  in   a   manner 
    appropriate  to  the  utility,  and in keeping with copyright law 
    (e.g.: EITHER displayed when first invoked/executed; OR displayed 
    continuously on display terminal; OR via placement in the  object 
    code  in  form  readable in a printout, with or near the title of 
    the work, or at the end of the file).  No royalties, licenses  or 
    commissions  of  any  kind are required to copy this grammar, its 
    translations, or derivative products, when the copies are made in 
    compliance with this notice. Persons or corporations that do make 
    copies in compliance with this notice may charge  whatever  price 
    is  agreeable  to  a  buyer, for such copies or derivative works. 
    THIS GRAMMAR IS PROVIDED ``AS IS'' AND  WITHOUT  ANY  EXPRESS  OR 
    IMPLIED  WARRANTIES,  INCLUDING,  WITHOUT LIMITATION, THE IMPLIED 
    WARRANTIES  OF  MERCHANTABILITY  AND  FITNESS  FOR  A  PARTICULAR 
    PURPOSE.

    James A. Roskind
    Independent Consultant
    516 Latania Palm Drive
    Indialantic FL, 32903
    (407)729-4348
    jar@ileaf.com


    ---end of copyright notice---

MOTIVATION-

My  goal  is  to  see  software  developers  adopt  this grammar as a 
standard until such time as a better  standard  is  accessible.   The 
only  way  to  get it to become a standard, is to be sure that people 
know that derivations are based on a specific work.   The  intent  of 
releasing  this  grammar is to provide a publicly accessible standard 
grammar for C++.  The intent of the  copyright  notice  is  to  allow 
arbitrary  commercial  and non-commercial use of the grammar, as long 
as reference is given to the original standard.  Without reference to 
a specific standard, many alternative  grammars  would  develop.   By 
referring  to  the  standard,  this grammar is given publicity, which 
should lead to further use in compatible products and  systems.   The 
benefits  of  such  a  standard  to  commercial  products  (browsers, 
beautifiers, translators, compilers, ...) should be  obvious  to  the 
developers,  in  that  other compatible products will emerge, and the 
value of all conforming products  will  rise.   Most  developers  are 
aware  of  the  value  of  acquiring  a fairly complete grammar for a 
language, and the copyright notice  (and  the  resulting  affiliation 
with my work) should not be too high a price to pay.  By copyrighting 
this  grammar,  I have some minor control over what this standard is, 
and I can (hopefully) keep it from degrading without my approval.   I 
will  consistently  attempt  to  provide  upgraded  grammars that are 
compliant  with  the  current  art,  and  the  ANSI   C++   Committee 
recommendation  in  particular.   A developer is never prevented from 
modifying the grammar to improve it in  whatever  way  is  seen  fit.  
There  is  also  no  restriction on the sale of copies, or derivative 
works, providing the request in the copyright notice are satisfied.

If you are not "copying" my work, but  are  rather  only  abstracting 
some  of  the  standard,  an acknowledgment with references to such a 
standard would be appreciated.  Specifically,  agreements  with  this 
standard  as  to  the  resolution  of otherwise ambiguous constructs, 
should be noted.

Simply put: "make whatever use you would like  of  the  grammar,  but 
include  the  ``portions  Copyright  ...''  as  a  reference  to this 
standard."


*/


/* Last modified 3/7/90, Version 1.00 */

/* File CPLUSPLS.Y is translated by YACC to CPLUSPLS.C */

/* ACKNOWLEDGMENT: Without Bjarne Stroustrup and his many  co-workers 
at  Bell  Labs, there would be no C++ Language for which to provide a 
syntax description. Bjarne has also been especially helpful and  open 
in  discussions,  and  by  permitting me to review his texts prior to 
their publication, allowed me a wonderful vantage point of clarity.

Without the effort expended by the ANSI C standardizing committee,  I 
would  have been lost.  Although the ANSI C standard does not include 
a fully disambiguated syntax description, the committee has at  least 
provided  most  of  the disambiguating rules in narratives.  This C++ 
grammar is intended to be a superset of an ANSI C compatible  grammar 
that is provided in an related file.

Several  reviewers  have  also  recently  critiqued this grammar, the 
related C  grammar,  and  or  assisted  in  discussions  during  it's 
preparation.   These  reviewers are certainly not responsible for the 
errors I have committed here, but they are responsible  for  allowing 
me   to  provide  fewer  errors.   These  colleagues  include:  Bruce 
Blodgett, Mark Langley, Joe Fialli, and Greg Perkins. */


%}

/* Interesting ambiguity:
Usually
        typename ( typename2 ) ...
or
        typename ( typename2 [4] ) ...
etc.
is a redeclaration of typename2.

Inside a structure elaboration, it is sometimes the declaration of  a 
constructor!   Note,  this  only  counts  if  typename IS the current 
containing class name. (Note this can't conflict with ANSI C  because 
ANSI  C  would  call  it a redefinition, but claim it is semantically 
illegal because you can't have a member declared the same type as the 
containing struct!  ) Since the ambiguity is only reached when a  ';' 
is  found,  there  is  no  problem  with  the  fact that the semantic 
interpretation  is  providing  the  true  resolution.   As  currently 
implemented, the constructor semantic actions must be able to process 
an  ordinary  declaration.  I may reverse this in the future, to ease 
semantic implementation. 

*/


/*

INTRO TO ANSI C GRAMMAR (provided in a separate file):

The refined grammar resolves several typedef ambiguities in the draft 
proposed ANSI C standard syntax down to 1 shift/reduce  conflict,  as 
reported by a YACC process.  Note that the one shift reduce conflicts 
is  the  traditional  if-if-else conflict that is not resolved by the 
grammar.  This ambiguity can be removed using the method described in 
the Dragon Book (2nd edition), but this does  not  appear  worth  the 
effort.

There  was quite a bit of effort made to reduce the conflicts to this 
level, and an additional effort was made to make  the  grammar  quite 
similar  to  the  C++ grammar being developed in parallel.  Note that 
this grammar resolves the following ANSI C ambiguities:

ANSI C section 3.5.6, "If the [typedef  name]  is  redeclared  at  an 
inner  scope,  the  type specifiers shall not be omitted in the inner 
declaration".  Supplying type specifiers prevents consideration of  T 
as a typedef name in this grammar.  Failure to supply type specifiers 
forced the use of the TYPEDEFname as a type specifier.  This is taken 
to an (unnecessary) extreme by this implementation.  The ambiguity is 
only  a  problem  with  the first declarator in a declaration, but we 
restrict  ALL  declarators  whenever  the  users  fails  to   use   a 
type.specifier.
              
ANSI C section 3.5.4.3, "In a parameter declaration, a single typedef 
name  in  parentheses  is  taken  to  be  an abstract declarator that 
specifies a function  with  a  single  parameter,  not  as  redundant 
parentheses  around  the  identifier".  This is extended to cover the 
following cases:

typedef float T;
int noo(const (T[5]));
int moo(const (T(int)));
...

Where again the '(' immediately to the left of 'T' is interpreted  as 
being  the  start  of  a  parameter type list, and not as a redundant 
paren around a redeclaration of T.  Hence an equivalent code fragment 
is:

typedef float T;
int noo(const int identifier1 (T identifier2 [5]));
int moo(const int identifier1 (T identifier2 (int identifier3)));
...

*/



/* This group is used by the C/C++ language parser */
%token AUTO            DOUBLE          INT             STRUCT
%token BREAK           ELSE            LONG            SWITCH
%token CASE            ENUM            REGISTER        TYPEDEF
%token CHAR            EXTERN          RETURN          UNION
%token CONST           FLOAT           SHORT           UNSIGNED
%token CONTINUE        FOR             SIGNED          VOID
%token DEFAULT         GOTO            SIZEOF          VOLATILE
%token DO              IF              STATIC          WHILE

/* The following are used in C++ only.  ANSI C would call these IDENTIFIERs */
%token NEW             DELETE
%token THIS            
%token OPERATOR
%token CLASS           
%token PUBLIC          PROTECTED       PRIVATE
%token VIRTUAL         FRIEND
%token INLINE          OVERLOAD        

/* ANSI C Grammar suggestions */
%token IDENTIFIER              STRINGliteral
%token FLOATINGconstant        INTEGERconstant        CHARACTERconstant
%token OCTALconstant           HEXconstant

/* New Lexical element, whereas ANSI C suggested non-terminal */
  /* A TYPEDEFname is an identifier that is, at the current scope, 
  defined to be a class or type name */
%token TYPEDEFname 

/* Multi-Character operators */
%token  ARROW            /*    ->                              */
%token  ICR DECR         /*    ++      --                      */
%token  LS RS            /*    <<      >>                      */
%token  LE GE EQ NE      /*    <=      >=      ==      !=      */
%token  ANDAND OROR      /*    &&      ||                      */
%token  ELLIPSIS         /*    ...                             */
                 /* Following are used in C++, not ANSI C      */
%token  CLCL             /*    ::                              */
%token  DOTstar ARROWstar/*    .*       ->*                    */
 
/* modifying assignment operators */
%token MULTassign  DIVassign    MODassign   /*   *=      /=      %=      */
%token PLUSassign  MINUSassign              /*   +=      -=              */
%token LSassign    RSassign                 /*   <<=     >>=             */
%token ANDassign   ERassign     ORassign    /*   &=      ^=      |=      */


/*************************************************************************/

%start prog.start

/*************************************************************************/

%%
prog.start:
        /*nothing*/
        | translation.unit
        ;

/* CONSTANTS */
constant:
        FLOATINGconstant
        | INTEGERconstant
        /* We are not including ENUMERATIONconstant here  because  we 
          are treating it like a variable with a type of "enumeration 
          constant".  */
        | OCTALconstant
        | HEXconstant
        | CHARACTERconstant
        ;

/* STRING LITERALS */
string.literal.list:
                STRINGliteral
                | string.literal.list STRINGliteral
                ;


/* EXPRESSIONS */

    /* The following can be used in an identifier  based  declarator.  
    (Declarators   that  redefine  an  existing  TYPEDEFname  require 
    special handling, and  not  included  here).   In  addition,  the 
    following are all valid "identifiers" in an expression, whereas a 
    TYPEDEFname is NOT.*/
    
rescoped.identifier:
        IDENTIFIER  /* ANSI C: We cannot use a TYPEDEFname as a variable */
        | operator.function.name /* C++, not ANSI C*/
        | class.rescoped.identifier
        ;


class.rescoped.identifier:
        TYPEDEFname CLCL identifier.or.typedef.name /* C++, not ANSI C */
        | TYPEDEFname CLCL operator.function.name /* C++, not ANSI C */
        | TYPEDEFname CLCL '~' TYPEDEFname /* C++, not ANSI C */
        | TYPEDEFname CLCL class.rescoped.identifier /* C++, not ANSI C */
        ;

    /* Note that the following MUST come before primary.expression so 
    that  the  reduce/reduce conflict is properly resolved (according 
    to Stroustrup: "If it can be a declaration, then it  should  be".  
    I  just  give  up  and  go  in  that  direction  after  I see the 
    declarator, whereas cfront tries to lex  ahead  to  disambiguate.  
    My reduce-reduce conflict disambiguates this situation at a point 
    where human readers cannot generally disambiguate (mentally). */
    
paren.identifier.declarator:
        rescoped.identifier
        | '(' paren.identifier.declarator ')'
        ;

    /*  Note that CLCL IDENTIFIER is NOT part of rescoped.identifier. 
    It is ONLY valid for referring to an identifier,  and  NOT  valid 
    for  declaring  (or  importing  an  external  declaration  of) an 
    identifier.  This disambiguates the following code,  which  would 
    otherwise be syntactically and semantically ambiguous:

            class base {
                static int i; // element i;
                float member_funct(void);
                };
            base i; // global i
            float base::member_function(void) {
                i; // refers to static int element "i" of base
                ::i; // refers to global "i", with type "base"
                    {
                    base :: i; // import of global "i", like "base (::i);"?
                                // OR reference to global??
                    }
                }
        */

primary.expression:
        CLCL identifier.or.typedef.name  /* C++, not ANSI C */
        | CLCL operator.function.name /* C++, not ANSI C */
        | THIS   /* C++, not ANSI C */
        | rescoped.identifier
        | constant
        | string.literal.list
        | '(' expression ')'
        ;

operator.function.name :
        OPERATOR any.operator
        | OPERATOR TYPEDEFname     operator.function.ptr.opt
        | OPERATOR basic.type.name operator.function.ptr.opt
                /* The following introduces MANY conflicts.  Requiring
                '(' ')' around the `type' would help GREATLY! */
        | OPERATOR type.specifier      operator.function.ptr.opt
        | OPERATOR type.qualifier.list
        | OPERATOR type.qualifier.list operator.function.ptr
        ;

operator.function.ptr.opt:
        /* nothing */
        | operator.function.ptr
        ;
        
    /*  The  following  causes  several ambiguities on * and &.  This 
    would also be removed if parens around the `type'  were  required 
    in the derivations for operator.function.name */

operator.function.ptr:
        pointer.operator 
        | indirect.or.reference 
        | operator.function.ptr pointer.operator 
        | operator.function.ptr indirect.or.reference 
        ;
        
any.operator:
        '+'
        | '-'
        | '*'
        | '/'
        | '%'
        | '^'
        | '&'
        | '|'
        | '~'
        | '!'
        | '<'
        | '>'
        | LS
        | RS
        | ANDAND
        | OROR
        | ARROW
        | ICR
        | DECR
        | LE
        | GE
        | EQ
        | NE
        | assignment.operator
        | '(' ')'
        | '[' ']'
        | NEW
        | DELETE
        | ','
        ;

    /*  The  following  was specially placed BEFORE the definition of 
    postfix.expression  to  resolve  a  reduce-reduce  conflict  more 
    correctly */        

type.qualifier.list.opt:
        /* Nothing */
        | type.qualifier.list
        ;

    /*  Note  that  the next set of productions in this grammar gives 
    post-increment a clearly higher  precedence  that  pre-increment.  
    This  is  not  clearly stated in the C++ Reference manual, and is 
    only implied by the grammar in the ANSI C Standard. */

postfix.expression:
        primary.expression
        | postfix.expression '[' expression ']'
        | postfix.expression '(' ')'
        | postfix.expression '(' argument.expression.list ')'
        | postfix.expression '.' rescoped.identifier.or.typedef.name
        | postfix.expression ARROW rescoped.identifier.or.typedef.name
        | postfix.expression ICR
        | postfix.expression DECR

                /* The next 4 rules are the source of cast ambiguity */
        | TYPEDEFname '(' ')'
        | TYPEDEFname '(' argument.expression.list ')'
        | basic.type.name '(' assignment.expression ')'
            /*  If  the following rule is added to the grammar, there 
            will be 3 additional reduce-reduce conflicts.  They  will 
            all be resolved in favor of NOT using the following rule, 
            so  no  harm  will  be  done.  However, since the rule is 
            semantically  illegal  we  will  omit  it  until  we  are 
            enhancing the grammar for error recovery */
/*      | basic.type.name '(' ')'  Illegal: no such constructor*/
        ;

rescoped.identifier.or.typedef.name:
        TYPEDEFname 
        | rescoped.identifier
        ;

argument.expression.list:
        assignment.expression
        | argument.expression.list ',' assignment.expression
        ;

unary.expression:
        postfix.expression
        | ICR unary.expression
        | DECR unary.expression
        | indirect.or.reference cast.expression
        | '-' cast.expression
        | '+' cast.expression
        | '~' cast.expression
        | '!' cast.expression
        | SIZEOF unary.expression
        | SIZEOF '(' type.name ')'
        | allocation.expression
        ;


    /* Note that I could have moved the  newstore  productions  to  a 
    lower  precedence  level  than  multiplication  (binary '*'), and 
    lower than bitwise AND (binary '&').  These moves  are  the  nice 
    way  to  disambiguate a trailing unary '*' or '&' at the end of a 
    freestore expression.  Since the freestore expression (with  such 
    a  grammar  and  hence  precedence  given)  can never be the left 
    operand of a binary '*' or '&', the ambiguity would  be  removed.  
    These  problems  really  surface when the binary operators '*' or 
    '&' are overloaded, but this must be syntactically  disambiguated 
    before  the semantic checking is performed... Unfortunately, I am 
    not creating the language, only writing a grammar  that  reflects 
    its  specification,  and  hence  I  cannot  change its precedence 
    assignments.  If I had  my  druthers,  I  would  probably  prefer 
    surrounding  the  type with parens all the time, and avoiding the 
    dangling * and & problem all together.*/

       /* Following are C++, not ANSI C */
allocation.expression:
        operator.new                                  '(' type.name ')' 
                operator.new.initializer
        | operator.new '(' argument.expression.list ')' '(' type.name ')' 
                operator.new.initializer
                /* next two rules are the source of * and & ambiguities */
        | operator.new                                  operator.new.type
        | operator.new '(' argument.expression.list ')' operator.new.type
        ;

operator.new:
        NEW
        | CLCL NEW
        ;

operator.new.type:
        type.specifier    operator.new.declarator.opt operator.new.initializer
        | basic.type.name operator.new.declarator.opt operator.new.initializer
        | TYPEDEFname     operator.new.declarator.opt operator.new.initializer
        ;

operator.new.declarator.opt:
        /* Nothing */
        | operator.new.array.declarator
        | indirect.or.reference operator.new.declarator.opt
        ;

operator.new.array.declarator:
        '[' ']'
        | '[' expression ']'
        | operator.new.array.declarator '[' expression ']'
        ;

operator.new.initializer:
        /* Nothing */
        | '(' ')'
        | '(' argument.expression.list ')'
        ;
        
cast.expression:
        unary.expression
        | '(' type.name ')' cast.expression
        ;

    /* Following are C++, not ANSI C */
deallocation.expression:
        cast.expression
        | clcl.opt.delete deallocation.expression
        | clcl.opt.delete '[' expression ']' deallocation.expression
        ;

    /* Following are C++, not ANSI C */
clcl.opt.delete:
        DELETE
        | CLCL DELETE
        ;

    /* Following are C++, not ANSI C */
point.member.expression:
        deallocation.expression
        | point.member.expression DOTstar  deallocation.expression
        | point.member.expression ARROWstar  deallocation.expression
        ;
        
multiplicative.expression:
        point.member.expression
        | multiplicative.expression '*' point.member.expression
        | multiplicative.expression '/' point.member.expression
        | multiplicative.expression '%' point.member.expression
        ;

additive.expression:
        multiplicative.expression
        | additive.expression '+' multiplicative.expression
        | additive.expression '-' multiplicative.expression
        ;

shift.expression:
        additive.expression
        | shift.expression LS additive.expression
        | shift.expression RS additive.expression
        ;

relational.expression:
        shift.expression
        | relational.expression '<' shift.expression
        | relational.expression '>' shift.expression
        | relational.expression LE shift.expression
        | relational.expression GE shift.expression
        ;

equality.expression:
        relational.expression
        | equality.expression EQ relational.expression
        | equality.expression NE relational.expression
        ;

AND.expression:
        equality.expression
        | AND.expression '&' equality.expression
        ;

exclusive.OR.expression:
        AND.expression
        | exclusive.OR.expression '^' AND.expression
        ;

inclusive.OR.expression:
        exclusive.OR.expression
        | inclusive.OR.expression '|' exclusive.OR.expression
        ;

logical.AND.expression:
        inclusive.OR.expression
        | logical.AND.expression ANDAND inclusive.OR.expression
        ;

logical.OR.expression:
        logical.AND.expression
        | logical.OR.expression OROR logical.AND.expression
        ;

conditional.expression:
        logical.OR.expression
        | logical.OR.expression '?' expression ':'
                conditional.expression
        ;

assignment.expression:
        conditional.expression
        | unary.expression assignment.operator assignment.expression
        ;

assignment.operator:
        '='
        | MULTassign
        | DIVassign
        | MODassign
        | PLUSassign
        | MINUSassign
        | LSassign
        | RSassign
        | ANDassign
        | ERassign
        | ORassign
        ;

expression:
        assignment.expression
        | expression ',' assignment.expression
        ;

constant.expression:
        conditional.expression
        ;

    /* The following was used for clarity */
expression.opt:
        /* Nothing */
        | expression
        ;



/* DECLARATIONS */

    /* The following are notably different from the ANSI  C  Standard 
    specified  grammar,  but  are  present  in  my  ANSI C compatible 
    grammar.  The changes were made to disambiguate typedefs presence 
    in   declaration.specifiers   (vs.    in   the   declarator   for 
    redefinition);  to allow struct/union/enum/class tag declarations 
    without  declarators,  and  to  better  reflect  the  parsing  of 
    declarations     (declarators     must     be    combined    with 
    declaration.specifiers ASAP, so that they can immediately  become 
    visible in the current scope). */

declaration:
        sue.declaration.specifier ';' { /* this is semantic error, as it
                                        includes a storage class!?!*/ }
        | sue.type.specifier ';'
        | declaring.list ';'
        | default.declaring.list ';'
        ;

    /*  Note  that  if  a typedef were redeclared, then a declaration 
    specifier must be supplied (re: ANSI C spec).  The following  are 
    declarations  wherein  no  declaration.specifier is supplied, and 
    hence the 'default' must be used.  An example of this is

        const a;
    which by default, is the same as:
        const int a;
    `a' must NOT be a typedef in the above example.
    */


    /* The presence of `{}' in the following rules  indicates  points 
    at  which  the symbol table MUST be updated so that the tokenizer 
    can IMMEDIATELY  continue  to  maintain  the  proper  distinction 
    between a TYPEDEFname and an IDENTIFIER. */
    
default.declaring.list:  /* Can't  redeclare typedef names */
        declaration.qualifier.list identifier.declarator {} initializer.opt
        | type.qualifier.list      identifier.declarator {} initializer.opt
        | default.declaring.list ',' identifier.declarator {} initializer.opt

        | declaration.qualifier.list constructed.identifier.declarator
        | type.qualifier.list      constructed.identifier.declarator
        | default.declaring.list ',' constructed.identifier.declarator
        ;

declaring.list:
        declaration.specifier  declarator {} initializer.opt
        | type.specifier       declarator {} initializer.opt
        | basic.type.name      declarator {} initializer.opt
        | TYPEDEFname          declarator {} initializer.opt
        | declaring.list ','   declarator {} initializer.opt

        | declaration.specifier constructed.declarator
        | type.specifier        constructed.declarator
        | basic.type.name       constructed.declarator
        | TYPEDEFname           constructed.declarator
        | declaring.list ','    constructed.declarator
        ;

constructed.declarator:
        constructed.identifier.declarator
        | TYPEDEFname '(' argument.expression.list ')' 
        ;

constructed.identifier.declarator:
        paren.identifier.declarator '(' argument.expression.list ')' 
        | simple.paren.typedef.declarator '(' argument.expression.list ')' 
        ;

declaration.specifier:
        basic.declaration.specifier        /* Arithmetic or void */
        | sue.declaration.specifier          /* struct/union/enum/class */
        | typedef.declaration.specifier      /* typedef*/
        ;

type.specifier:
        basic.type.specifier                 /* Arithmetic or void */
        | sue.type.specifier                 /* Struct/Union/Enum/Class */
        | typedef.type.specifier             /* Typedef */
        ;


declaration.qualifier.list:  /* const/volatile, AND storage class */
        storage.class
        | type.qualifier.list storage.class
        | declaration.qualifier.list declaration.qualifier
        ;

type.qualifier.list:
        type.qualifier
        | type.qualifier.list type.qualifier
        ;

declaration.qualifier:
        type.qualifier                  /* const or volatile */
        | storage.class
        ;

type.qualifier:
        CONST
        | VOLATILE
        ;

basic.declaration.specifier:      /*Storage Class+Arithmetic or void*/
        basic.type.specifier  storage.class
        | basic.type.name storage.class
        | declaration.qualifier.list basic.type.name
        | basic.declaration.specifier declaration.qualifier
        | basic.declaration.specifier basic.type.name
        ;

basic.type.specifier:
        basic.type.name basic.type.name /* Arithmetic or void */
        | basic.type.name type.qualifier
        | type.qualifier.list basic.type.name
        | basic.type.specifier type.qualifier
        | basic.type.specifier basic.type.name
        ;

sue.declaration.specifier:          /* StorageClass + struct/union/enum/class */
        sue.type.specifier storage.class
        | declaration.qualifier.list elaborated.type.name
        | sue.declaration.specifier declaration.qualifier
        ;

sue.type.specifier:
        elaborated.type.name              /* struct/union/enum/class */
        | type.qualifier.list elaborated.type.name
        | sue.type.specifier type.qualifier
        ;


typedef.declaration.specifier:       /*Storage Class + typedef types */
        typedef.type.specifier storage.class
        | TYPEDEFname storage.class
        | declaration.qualifier.list TYPEDEFname
        | typedef.declaration.specifier declaration.qualifier
        ;

typedef.type.specifier:              /* typedef types */
        TYPEDEFname type.qualifier
        | type.qualifier.list TYPEDEFname
        | typedef.type.specifier type.qualifier
        ;

    /*  There  are  really  several distinct sets of storage.classes.  
    The sets vary depending on whether the  declaration  is  at  file 
    scope,  is  a  declaration  within  a  struct/class,  is within a 
    function body, or in a function declaration/definition (prototype 
    parameter declarations).  They are grouped here to  simplify  the 
    grammar,  and  can  be  semantically  checked.   Note  that  this 
    approach tends to ease the syntactic restrictions in the  grammar 
    slightly,  but  allows for future language development, and tends 
    to provide superior  diagnostics  and  error  recovery  (i.e.:  a 
    syntax error does not disrupt the parse).


                File    File    Member  Member  Local   Local  Formal
                Var     Funct   Var     Funct   Var     Funct  Params
TYPEDEF         x       x       x       x       x       x
EXTERN          x       x                       x       x
STATIC          x       x       x       x       x
AUTO                                            x              x
REGISTER                                        x              x
FRIEND                                  x       
OVERLOAD                x               x               x
INLINE                  x               x               x
VIRTUAL                                 x               x
*/

storage.class:
        TYPEDEF 
        | EXTERN
        | STATIC 
        | AUTO
        | REGISTER
        | FRIEND   /* C++, not ANSI C */
        | OVERLOAD /* C++, not ANSI C */
        | INLINE   /* C++, not ANSI C */
        | VIRTUAL  /* C++, not ANSI C */
        ;

basic.type.name:
        VOID
        | CHAR
        | SHORT
        | INT
        | LONG
        | FLOAT
        | DOUBLE
        | SIGNED
        | UNSIGNED
        ;

elaborated.type.name:
        aggregate.name
        | enum.name
        ;


    /*  Since  the expression "new type.name" MIGHT use an elaborated 
    type and a derivation, it MIGHT have a ':'.  This fact  conflicts 
    with  the requirement that a new expression can be placed between 
    a '?' and a ':' in a conditional expression (at least it confuses 
    most LR(1) parsers). */

    /*  This  reduction  was  placed   DELIBERATELY   in   front   of 
    aggregate.name   to  resolve  a  reduce-reduce  on  '{'  conflict 
    properly */

derivation.opt:
        /* nothing */
        | ':' derivation.list
        ;

    /* The intermediate actions {}  represent  points  at  which  the 
    database  of  typedef  names  must  be  updated  in C++.  This is 
    critical to the lexer, which must begin to tokenize based on this 
    new information. */

aggregate.name:
        aggregate.key                             derivation.opt {}
                '{' member.declaration.list.opt '}'
        | aggregate.key identifier.or.typedef.name derivation.opt  {}
                '{' member.declaration.list.opt '}'
        | aggregate.key identifier.or.typedef.name                 {}
        ;

derivation.list:
        parent.class
        | derivation.list ',' parent.class
        ;

parent.class:
        TYPEDEFname
        | VIRTUAL access.specifier.opt TYPEDEFname
        | access.specifier virtual.opt TYPEDEFname
        ;

virtual.opt:
        /* nothing */
        | VIRTUAL
        ;

access.specifier.opt:
        /* nothing */
        | access.specifier
        ;

access.specifier:
        PUBLIC
        | PRIVATE
        | PROTECTED /* not syntactically valid, but nice for error reporting*/
        ;

aggregate.key:
        STRUCT
        | UNION
        | CLASS /* C++, not ANSI C */
        ;

    /* Note that an empty list is ONLY allowed under C++. The grammar 
    can be modified so that this stands out.  The trick is to  define 
    member.declaration.list, and have that referenced for non-trivial 
    lists. */

member.declaration.list.opt:
        /* nothing */ 
        | member.declaration.list.opt member.declaration
        ;

member.declaration:
        member.declaring.list ';'
        | member.default.declaring.list ';'
        | PUBLIC ':'                      /* C++, not ANSI C */
        | PRIVATE ':'                     /* C++, not ANSI C */
        | PROTECTED ':'                   /* C++, not ANSI C */
        | new.function.definition         /* C++, not ANSI C */
        | constructor.function.in.class   /* C++, not ANSI C */
        | destructor.function             /* C++, not ANSI C */
        | sue.type.specifier ';'       /* nested tag elaborations*/ /* C++, not ANSI C */
        | class.rescoped.identifier ';' /*access modification*/     /* C++, not ANSI C */
        | typedef.declaration.specifier ';' /* friend T */   /* C++, not ANSI C */
        | sue.declaration.specifier ';'  /* friend class C*/  /* C++, not ANSI C */
        ;

member.default.declaring.list:        /* doesn't redeclare typedef*/
        type.qualifier.list          identifier.declarator member.pure.opt
        | declaration.qualifier.list identifier.declarator member.pure.opt /* C++, not ANSI C */
        | member.default.declaring.list ',' 
                identifier.declarator member.pure.opt

        | type.qualifier.list           bit.field.identifier.declarator 
        | declaration.qualifier.list    bit.field.identifier.declarator /* C++, not ANSI C */
        | member.default.declaring.list ','  bit.field.identifier.declarator 
        ;

member.declaring.list:        /* Can possibly redeclare typedefs */
        type.specifier     declarator member.pure.opt
        | basic.type.name  declarator member.pure.opt
        | member.conflict.declaring.item

        | member.declaring.list ',' declarator member.pure.opt

        | type.specifier        bit.field.declarator 
        | basic.type.name       bit.field.declarator 
        | TYPEDEFname           bit.field.declarator 
        | declaration.specifier bit.field.declarator /* C++? not ANSI C */
        | member.declaring.list ',' bit.field.declarator 
        ;

    /* The following conflict with constructors-
      member.conflict.declaring.item:
        TYPEDEFname           declarator member.pure.opt
        | declaration.specifier declarator member.pure.opt /* C++, not ANSI C * /
        ;
    so we inline expand declarator to get the following productions... */

member.conflict.declaring.item:
        TYPEDEFname             identifier.declarator            member.pure.opt
        | TYPEDEFname           parameter.typedef.declarator     member.pure.opt
        | declaration.specifier identifier.declarator            member.pure.opt
        | declaration.specifier parameter.typedef.declarator     member.pure.opt
        | TYPEDEFname           simple.paren.typedef.declarator  member.pure.opt
        | declaration.specifier simple.paren.typedef.declarator  member.pure.opt

        | member.conflict.paren.declaring.item
        ;

    /* The following still conflicts with constructors-
      member.conflict.paren.declaring.item:
        TYPEDEFname           paren.typedef.declarator     member.pure.opt
        | declaration.specifier paren.typedef.declarator     member.pure.opt
        ;
    so paren.typedef.declarator is expanded inline to get...*/

member.conflict.paren.declaring.item:
        TYPEDEFname indirect.or.reference '(' simple.paren.typedef.declarator ')' member.pure.opt
        | TYPEDEFname pointer.operator      '(' simple.paren.typedef.declarator ')' member.pure.opt
        | TYPEDEFname indirect.or.reference '(' TYPEDEFname ')'                     member.pure.opt
        | TYPEDEFname pointer.operator      '(' TYPEDEFname ')'                     member.pure.opt
        | TYPEDEFname indirect.or.reference  paren.typedef.declarator                member.pure.opt
        | TYPEDEFname pointer.operator       paren.typedef.declarator                member.pure.opt
        | declaration.specifier indirect.or.reference
                '(' simple.paren.typedef.declarator ')' member.pure.opt
        | declaration.specifier pointer.operator
                '(' simple.paren.typedef.declarator ')' member.pure.opt
        | declaration.specifier indirect.or.reference
                '(' TYPEDEFname ')'                     member.pure.opt
        | declaration.specifier pointer.operator
                '(' TYPEDEFname ')'                     member.pure.opt
        | declaration.specifier indirect.or.reference
                paren.typedef.declarator                member.pure.opt
        | declaration.specifier pointer.operator
                paren.typedef.declarator                member.pure.opt

        | member.conflict.paren.postfix.declaring.item
        ;

    /* but still the following conflicts with constructors-
    member.conflict.paren.postfix.declaring.item:
      TYPEDEFname           paren.postfix.typedef.declarator member.pure.opt
      | declaration.specifier paren.postfix.typedef.declarator member.pure.opt
      ;
    so we expand paren.postfix.typedef inline and get...*/

member.conflict.paren.postfix.declaring.item:
        TYPEDEFname     '(' paren.typedef.declarator ')'
                                                           member.pure.opt
        | TYPEDEFname   '(' simple.paren.typedef.declarator
                        postfixing.abstract.declarator ')' member.pure.opt
        | TYPEDEFname   '(' TYPEDEFname
                        postfixing.abstract.declarator ')' member.pure.opt
        | TYPEDEFname   '(' paren.typedef.declarator ')'
                        postfixing.abstract.declarator     member.pure.opt

        | declaration.specifier '(' paren.typedef.declarator ')' 
                                                           member.pure.opt
        | declaration.specifier '(' simple.paren.typedef.declarator
                        postfixing.abstract.declarator ')' member.pure.opt
        | declaration.specifier '(' TYPEDEFname
                        postfixing.abstract.declarator ')' member.pure.opt
        | declaration.specifier '(' paren.typedef.declarator ')' 
                        postfixing.abstract.declarator member.pure.opt
        ;


member.pure.opt:
        /* nothing */
        | '=' OCTALconstant /* C++, not ANSI C */ /* Pure function*/
        ;

    /* Note that bit field names cannot be parenthesized in C++  (due 
    to  ambiguities),  and  hence this part of the grammar is simpler 
    than ANSI C. :-) */

bit.field.declarator:
        bit.field.identifier.declarator
        | TYPEDEFname ':' constant.expression
        ;

bit.field.identifier.declarator:
        ':' constant.expression
        | IDENTIFIER ':' constant.expression
        ;

enum.name:
        ENUM '{' enumerator.list '}'
        | ENUM identifier.or.typedef.name '{' enumerator.list '}'
        | ENUM identifier.or.typedef.name
        ;

enumerator.list:
        identifier.or.typedef.name enumerator.value.opt
        | enumerator.list ',' identifier.or.typedef.name enumerator.value.opt
        ;

enumerator.value.opt:
        /* Nothing */
        | '=' constant.expression
        ;

    /*  We special case the lone type.name which has no storage class 
    (even though it should be an example of  a  parameter.type.list). 
    This helped to disambiguate type-names in parenthetical casts.*/

parameter.type.list: 
        '(' ')'                             type.qualifier.list.opt
        | '(' type.name ')'                 type.qualifier.list.opt
        | '(' type.name initializer ')'     type.qualifier.list.opt /* C++, not ANSI C */
        | '(' named.parameter.type.list ')' type.qualifier.list.opt
        ;

    /* The following are used in old style function definitions, when 
    a  complex return type include the "function returning" modifier. 
    Note the  subtle  distinction  from  parameter.type.list.   These 
    parameters  are  NOT  the parameters for the function be defined, 
    but simply part of the type definition.  An example would be:

        int (*f(   a  ))(float) long a; {...} 

    which is equivalent to the full new style definition:

        int (*f(long a))(float) { ...}

    The   type   list    `(float)'    is    an    example    of    an 
    old.parameter.type.list.   The  bizarre point here is that an old 
    function definition declarator can be followed by  a  type  list, 
    which  can  start  with a qualifier `const'.  This conflicts with 
    the new syntactic construct for const member functions!?! */  

old.parameter.type.list:  
        '('  ')'  
        | '('  type.name  ')'  
        | '(' type.name initializer ')' /* C++, not ANSI C */   
        | '(' named.parameter.type.list ')' 
        ;

named.parameter.type.list:  /* WARNING: excludes lone type.name*/
        parameter.list
        | parameter.list comma.opt.ellipsis
        | type.name comma.opt.ellipsis
        | type.name initializer comma.opt.ellipsis  /* C++, not ANSI C */
        | ELLIPSIS /* C++, not ANSI C */
        ;


comma.opt.ellipsis:
        ELLIPSIS       /* C++, not ANSI C */
        | ',' ELLIPSIS
        ;

parameter.list: 
        named.parameter.declaration
        | named.parameter.declaration initializer /* C++, not ANSI C */
        | type.name             ',' parameter.declaration
        | type.name initializer ',' parameter.declaration  /* C++, not ANSI C */
        | parameter.list ',' parameter.declaration
        ;

    /* There is some very subtle disambiguation going  on  here.   Do 
    not be tempted to make further use of the following production in 
    parameter.list,  or else the conflict count will grow noticeably.  
    Specifically, the next set  of  rules  has  already  been  inline 
    expanded for the first parameter in a parameter.list to support a 
    deferred disambiguation. */
     
parameter.declaration:
        type.name
        | type.name initializer  /* C++, not ANSI C */
        | named.parameter.declaration
        | named.parameter.declaration initializer /* C++, not ANSI C */
        ;

    /* One big thing implemented here is that a TYPEDEFname CANNOT be 
    redeclared  when  we  don't  have declaration.specifiers!  Notice 
    that when we do use a  TYPEDEFname  based  declarator,  only  the 
    "special"  (non-ambiguous  in this context) typedef.declarator is 
    used.  Everything else that is "missing" shows up as a type.name. 
    */

named.parameter.declaration: /*have names or storage classes */
        declaration.specifier
        | declaration.specifier abstract.declarator
        | declaration.specifier identifier.declarator
        | declaration.specifier parameter.typedef.declarator

        | declaration.qualifier.list 
        | declaration.qualifier.list abstract.declarator
        | declaration.qualifier.list identifier.declarator

        | type.specifier identifier.declarator
        | type.specifier parameter.typedef.declarator

        | basic.type.name identifier.declarator
        | basic.type.name parameter.typedef.declarator

        | TYPEDEFname identifier.declarator
        | TYPEDEFname parameter.typedef.declarator

        | type.qualifier.list identifier.declarator
        ;
    
identifier.or.typedef.name:
        IDENTIFIER
        | TYPEDEFname 
        ;

type.name:
        type.specifier
        | basic.type.name 
        | TYPEDEFname 
        | type.qualifier.list 

        | type.specifier abstract.declarator
        | basic.type.name abstract.declarator
        | TYPEDEFname abstract.declarator
        | type.qualifier.list abstract.declarator
        ;

initializer.opt:
        /* nothing */
        | initializer
        ;

initializer:
        '=' initializer.group
        ;

initializer.group:        
        '{' initializer.list '}'
        | '{' initializer.list ',' '}'
        | assignment.expression
        ;

initializer.list:
        initializer.group
        | initializer.list ',' initializer.group
        ;


/* STATEMENTS */
statement:
        labeled.statement
        | compound.statement
        | expression.statement
        | selection.statement
        | iteration.statement
        | jump.statement
        ;

labeled.statement:
        identifier.or.typedef.name ':' statement
        | CASE constant.expression ':' statement
        | DEFAULT ':' statement
        ;

    /* I sneak declarations into statement.list to support C++.   The 
    grammar  is  a  little  clumsy  this  way, but the violation of C 
    syntax is heavily localized */
  
compound.statement:
        '{' '}'
        | '{' declaration.list '}'
        | '{' statement.list '}'
        | '{' declaration.list statement.list '}'
        ;

declaration.list:
        declaration
        | declaration.list declaration
        ;

statement.list:
        statement
        | statement.list statement
        | statement.list declaration /* C++, not ANSI C */
        ;

expression.statement:
        expression.opt ';'
        ;

selection.statement:
        IF '(' expression ')' statement
        | IF '(' expression ')' statement ELSE statement
        | SWITCH '(' expression ')' statement
        ;

iteration.statement:
        WHILE '(' expression ')' statement
        | DO statement WHILE '(' expression ')' ';'
        | FOR '(' expression.opt ';' expression.opt ';'
                expression.opt ')' statement
        | FOR '(' declaration        expression.opt ';'
                expression.opt ')' statement  /* C++, not ANSI C */
        ;

jump.statement:
        GOTO identifier.or.typedef.name ';'
        | CONTINUE ';'
        | BREAK ';'
        | RETURN expression.opt ';'
        ;


/* EXTERNAL DEFINITIONS */

translation.unit:
        external.definition
        | translation.unit external.definition
        ;

external.definition:
        function.declaration
        | function.definition
        | declaration
        | linkage.specifier function.declaration     /* C++, not ANSI C*/
        | linkage.specifier function.definition      /* C++, not ANSI C*/
        | linkage.specifier declaration              /* C++, not ANSI C*/
        | linkage.specifier '{' translation.unit '}' /* C++, not ANSI C*/
        ;

linkage.specifier:
        EXTERN STRINGliteral
        ;

    /* Note that declaration.specifiers are left out of the following 
    function declarations.  This is something of an anomaly in C, but 
    an unfortunately common coding practice. */
        
function.declaration:
        identifier.declarator ';' /* semantically verify it is a function */
        ;
        
function.definition:
        new.function.definition
        | old.function.definition
        | constructor.function.definition
        ;

new.function.definition:
                                     identifier.declarator compound.statement
        | declaration.specifier      identifier.declarator compound.statement
        | type.specifier             identifier.declarator compound.statement
        | basic.type.name            identifier.declarator compound.statement
        | TYPEDEFname                identifier.declarator compound.statement
        | declaration.qualifier.list identifier.declarator compound.statement
        | type.qualifier.list        identifier.declarator compound.statement
        ;

old.function.definition:
                                     old.function.declarator compound.statement 
        | declaration.specifier      old.function.declarator compound.statement
        | type.specifier             old.function.declarator compound.statement
        | basic.type.name            old.function.declarator compound.statement
        | TYPEDEFname                old.function.declarator compound.statement
        | declaration.qualifier.list old.function.declarator compound.statement
        | type.qualifier.list        old.function.declarator compound.statement

        |                            old.function.declarator declaration.list 
                compound.statement
        | declaration.specifier      old.function.declarator declaration.list
                compound.statement
        | type.specifier             old.function.declarator declaration.list
                compound.statement
        | basic.type.name            old.function.declarator declaration.list
                compound.statement
        | TYPEDEFname                old.function.declarator declaration.list
                compound.statement
        | declaration.qualifier.list old.function.declarator declaration.list
                compound.statement
        | type.qualifier.list        old.function.declarator declaration.list
                compound.statement
        ;
    
    /* I needed to add some storage class options to the constructors 
    and  destructors  (e.g.:  virtual, inline, ... ) These were added 
    using declaration.qualifier.list, but they should be semantically 
    verified to be inline or virtual ONLY. If I don't have  to  allow 
    BOTH,  then  I  COULD be explicit (and not increase ambiguities). 
    With this LR(1) grammar, I have to reduce the  list  to  standard 
    declaration.qualifier.list ASAP.*/

destructor.function:
        declaration.qualifier.list '~' TYPEDEFname void.opt.param.list
                        destructor.function.body
        | '~' TYPEDEFname void.opt.param.list destructor.function.body
        ;

void.opt.param.list:
        '('        ')' type.qualifier.list.opt
        | '(' VOID ')' type.qualifier.list.opt
        ;

destructor.function.body:
        compound.statement  /* an actual destructor definition */
        | ';'               /* only a destructor declaration */
        ;

    /*  Semantically verify that the following identifier.declarators 
    are:

            TYPEDEFname :: TYPEDEFname (parameter.type.list).  

    We use the more general form to prevent a clash  with  a  typical 
    function  definition  (which  won't have a constructor.init.list) 
    The ONLY valid declaration qualifier is INLINE. */
    
constructor.function.definition:
                                     identifier.declarator  
                                constructor.init.list compound.statement
        | declaration.qualifier.list identifier.declarator  
                                constructor.init.list compound.statement
        ;

    /* The following use of declaration.specifiers are made to  allow 
    for  a TYPEDEFname preceded by an INLINE modifier. This fact must 
    be verified semantically.  It should also be  verified  that  the 
    TYPEDEFname  is  ACTUALLY  the class name being elaborated.  Note 
    that we could break out typedef.declaration.specifier from within 
    declaration.specifier, and we  might  narrow  down  the  conflict 
    region a bit. A second alternative (to what is done) for cleaning 
    up  this  stuff  is  to  let the tokenizer specially identify the 
    current class being elaborated as a special token, and not just a 
    typedef.name.  Unfortunately, things would get very confusing for 
    the lexer, as we may pop into enclosed  tag  elaboration  scopes; 
    into function definitions; or into both recursively! */
    
constructor.function.in.class:
        declaration.specifier   constructor.parameter.list.and.body
        | TYPEDEFname           constructor.parameter.list.and.body
        ;

    /* The following conflicts with member declarations-
    constructor.parameter.list.and.body:
          parameter.type.list ';'
          | parameter.type.list constructor.init.list.opt compound.statement
          ;
    so parameter.type.list was expanded inline to get */

constructor.parameter.list.and.body:
          '('                           ')' type.qualifier.list.opt ';'
        | '(' type.name initializer     ')' type.qualifier.list.opt ';' /* C++, not ANSI C */
        | '(' named.parameter.type.list ')' type.qualifier.list.opt ';'
        | '('                           ')' type.qualifier.list.opt
                constructor.init.list.opt compound.statement
        | '(' type.name initializer     ')' type.qualifier.list.opt
                constructor.init.list.opt compound.statement  /* C++, not ANSI C */
        | '(' named.parameter.type.list ')' type.qualifier.list.opt
                constructor.init.list.opt compound.statement
        | constructor.conflicting.parameter.list.and.body
        ;

    /* The following conflicted with member declaration-
    constructor.conflicting.parameter.list.and.body:
        '(' type.name ')'                 type.qualifier.list.opt ';'
        | '(' type.name ')'                 type.qualifier.list.opt
                constructor.init.list.opt compound.statement
        ;
    so type.name was inline expanded to get the following... */

constructor.conflicting.parameter.list.and.body:
        '(' type.specifier ')' type.qualifier.list.opt
                ';'
        | '(' basic.type.name  ')' type.qualifier.list.opt
                ';'
        | '(' TYPEDEFname  ')' 
                ';'
        | '(' TYPEDEFname  ')' type.qualifier.list
                ';'

        | '(' type.qualifier.list  ')' type.qualifier.list.opt
                ';'
        | '(' type.specifier abstract.declarator ')' type.qualifier.list.opt
                ';'
        | '(' basic.type.name abstract.declarator ')' type.qualifier.list.opt
                ';'
        | '(' type.qualifier.list abstract.declarator ')' type.qualifier.list.opt
                ';'

        | '(' type.specifier ')' type.qualifier.list.opt
                constructor.init.list.opt compound.statement
        | '(' basic.type.name  ')' type.qualifier.list.opt
                constructor.init.list.opt compound.statement
        | '(' TYPEDEFname  ')' 
                constructor.init.list.opt compound.statement
        | '(' TYPEDEFname  ')' type.qualifier.list
                constructor.init.list.opt compound.statement

        | '(' type.qualifier.list  ')' type.qualifier.list.opt
                constructor.init.list.opt compound.statement
        | '(' type.specifier abstract.declarator ')' type.qualifier.list.opt
                constructor.init.list.opt compound.statement
        | '(' basic.type.name abstract.declarator ')' type.qualifier.list.opt
                constructor.init.list.opt compound.statement
        | '(' type.qualifier.list abstract.declarator ')' type.qualifier.list.opt
                constructor.init.list.opt compound.statement
        | constructor.conflicting.typedef.declarator
        ;


    /* The following have ambiguities with member declarations-
    constructor.conflicting.typedef.declarator:
      '(' TYPEDEFname abstract.declarator ')' type.qualifier.list.opt
                ';'
      |  '(' TYPEDEFname abstract.declarator ')' type.qualifier.list.opt
                constructor.init.list.opt compound.statement
      ;
    which can be deferred by expanding abstract.declarator, and in two 
    cases parameter.qualifier.list, resulting in ...*/

constructor.conflicting.typedef.declarator:
        '(' TYPEDEFname unary.abstract.declarator ')' type.qualifier.list.opt
                ';'
        | '(' TYPEDEFname unary.abstract.declarator ')' type.qualifier.list.opt
                constructor.init.list.opt compound.statement

        | '(' TYPEDEFname postfix.abstract.declarator ')' type.qualifier.list.opt
                ';'
        | '(' TYPEDEFname postfix.abstract.declarator ')' type.qualifier.list.opt
                constructor.init.list.opt compound.statement

        | '(' TYPEDEFname postfixing.abstract.declarator ')' type.qualifier.list
                ';'
        | '(' TYPEDEFname postfixing.abstract.declarator ')' type.qualifier.list
                constructor.init.list.opt compound.statement

        | '(' TYPEDEFname postfixing.abstract.declarator ')' 
                ';'
        | '(' TYPEDEFname postfixing.abstract.declarator ')' 
                constructor.init.list.opt compound.statement
        ;


constructor.init.list.opt:
        /* nothing */
        | constructor.init.list
        ;

constructor.init.list:
        ':' constructor.init
        | constructor.init.list ',' constructor.init
        ;

constructor.init:
        identifier.or.typedef.name '(' argument.expression.list ')'
        | identifier.or.typedef.name '(' ')'
        | '(' argument.expression.list ')' /* Single inheritance ONLY*/
        | '(' ')' /* Is this legal? It might be default! */
        ;        

declarator:
        typedef.declarator
        | identifier.declarator
        ;
        
typedef.declarator:
        paren.typedef.declarator  /* would be ambiguous as parameter*/
        | simple.paren.typedef.declarator /* also ambiguous */
        | parameter.typedef.declarator   /* not ambiguous as param*/
        ;

parameter.typedef.declarator:
        TYPEDEFname
        | TYPEDEFname postfixing.abstract.declarator
        | clean.typedef.declarator
        ;

    /*  The  following  have  at  least  one  '*'or  '&'. There is no 
    (redundant) '(' between the '*'/'&' and the TYPEDEFname. */

clean.typedef.declarator:
        clean.postfix.typedef.declarator
        | indirect.or.reference parameter.typedef.declarator
        | pointer.operator      parameter.typedef.declarator  
        ;

clean.postfix.typedef.declarator:
        '(' clean.typedef.declarator ')'
        | '(' clean.typedef.declarator ')' postfixing.abstract.declarator
        ;

    /* The following have a redundant '(' placed immediately  to  the 
    left of the TYPEDEFname */
    
paren.typedef.declarator:
        paren.postfix.typedef.declarator
        | indirect.or.reference '(' simple.paren.typedef.declarator ')'
        | pointer.operator      '(' simple.paren.typedef.declarator ')'
        | indirect.or.reference '(' TYPEDEFname ')' /* redundant paren */
        | pointer.operator      '(' TYPEDEFname ')' /* redundant paren */
        | indirect.or.reference paren.typedef.declarator
        | pointer.operator      paren.typedef.declarator
        ;
        
paren.postfix.typedef.declarator:
        '(' paren.typedef.declarator ')'
        | '(' simple.paren.typedef.declarator postfixing.abstract.declarator ')' 
        | '(' TYPEDEFname postfixing.abstract.declarator ')' /* redundant paren */
        | '(' paren.typedef.declarator ')' postfixing.abstract.declarator
        ;

    /*  The following excludes lone TYPEDEFname to help in a conflict 
    resolution.  We have special cased lone  TYPEDEFname  along  side 
    all uses of this non-terminal.*/

simple.paren.typedef.declarator:
        '(' TYPEDEFname ')'
        | '(' simple.paren.typedef.declarator ')'
        ;

identifier.declarator:
        unary.identifier.declarator  
        | paren.identifier.declarator
        ;

    /*  The  following  allows  "function return array of" as well as 
    "array of function returning".  It COULD be cleanup  up  the  way 
    abstract  declarators  have been.  This change might make it hard 
    to recover from user's syntax errors, whereas now they appear  as 
    simple semantic errors. */
   
unary.identifier.declarator:
        postfix.identifier.declarator
        | indirect.or.reference identifier.declarator
        | pointer.operator      identifier.declarator
        ;

postfix.identifier.declarator:
        paren.identifier.declarator postfixing.abstract.declarator
        | '(' unary.identifier.declarator ')'
        | '(' unary.identifier.declarator ')' postfixing.abstract.declarator
        ;

old.function.declarator:
        postfix.old.function.declarator
        | indirect.or.reference old.function.declarator
        | pointer.operator      old.function.declarator
        ;

    /*  ANSI  C  section  3.7.1  states  "An identifier declared as a 
    typedef name shall not be redeclared as a parameter".  Hence  the 
    following is based only on IDENTIFIERs.
   
    Instead  of identifier.lists, an argument.expression.list is used 
    in  old  style  function   definitions.    The   ambiguity   with 
    constructors  required the use of argument lists, with a semantic 
    verification  of  the  list  (e.g.:  check  to   see   that   the 
    "expressions" consisted of lone identifiers).  

    An interesting ambiguity appeared:

        const constant=5;
        int foo(constant) ...

    Is this an old function definition, or constructor?  The decision 
    is made later by THIS grammar based on trailing context :-). This 
    ambiguity  is probably what caused many parsers to give up on old 
    style function definitions. */

postfix.old.function.declarator:
        paren.identifier.declarator '(' argument.expression.list ')'
        | '(' old.function.declarator ')'
        | '(' old.function.declarator ')' old.postfixing.abstract.declarator
        ;

old.postfixing.abstract.declarator:
        array.abstract.declarator /* array modifiers */
        | old.parameter.type.list  /* function returning modifiers */
        ;

abstract.declarator:
        unary.abstract.declarator
        | postfix.abstract.declarator
        | postfixing.abstract.declarator
        ;

postfixing.abstract.declarator:
        array.abstract.declarator
        | parameter.type.list
        ;

array.abstract.declarator:
        '[' ']'
        | '[' constant.expression ']'
        | array.abstract.declarator '[' constant.expression ']'
        ;

unary.abstract.declarator:
        indirect.or.reference 
        | pointer.operator 
        | indirect.or.reference abstract.declarator
        | pointer.operator      abstract.declarator
        ;

postfix.abstract.declarator:
        '(' unary.abstract.declarator ')'
        | '(' postfix.abstract.declarator ')'
        | '(' postfixing.abstract.declarator ')'
        | '(' unary.abstract.declarator ')' postfixing.abstract.declarator
        ;

indirect.or.reference:
        '*'
        | '&'
        ;

pointer.operator:
        TYPEDEFname CLCL '*' type.qualifier.list.opt
        | indirect.or.reference type.qualifier.list
        ;
%%

/* ---end of grammar ---- */

news@awdprime.UUCP (USENET News) (03/23/90)

In article <1435@io.UUCP> jar@io.UUCP (Jim Roskind x5570) writes:
Thanks for the grammers.  Some people requested a lexer.  Here is mine
that I quick hacked up.  I'm sure that someone can use this build a
complete one for your grammers.

Things to fix:
 o  It doesn't return same token values per jim's grammers (easy fix).
 o  You'll need to fix it to return different values for identifiers,
    enums, and typedefs (see comment at {ident}).
 o  It doesn't support floating point yet either (easy fix, I have {float}
    and {exp} defined but not used or tested).
 o  You'll want to handle cases like "test \" foobar" in the
    string handling section (look for STRING and CHAR_CONST).
    I couldn't figure out how to do it in 30 seconds or less so I'll
    leave it up to you.

With this I was able to parse a fairly large C program that I had removed
all the typedefs from (no enums) using the grammer right out of the back
of _The C Programming Language_ Second Edition.

ENJOY!!!!!   --- don't forget to remove the .sig at the bottom :-)
-------------- clexer.l for ANSI C ------------------
%{
#include <stdio.h>
#include "y.tab.h"
#define STRDUP(X) ((char *)strcpy(malloc(strlen(X)+1),X))
extern int yychar;
static int column = 0;
static int linenum = 1;
#define count(x) counter(x)
#ifndef YYDEBUG
int     yydebug = 0;
#else
int     yydebug = 1;
#endif
%}
alpha   [a-zA-Z]
digit   [0-9]
special [\_]
ident   (({alpha}|{special})({alpha}|{digit}|{special})*)
int     ({digit}+)
exp     ([Ee][-+]?{digit}+)
float   ([-+]?{digit}+\.?{digit}*)
%p 3000
%%
^\#.*		{ count(0); /* skip cpp lines */ }
[\ \n\t\v\f]+	{ count(0); /* skip white space */ }
"/*"		{ count(1); skipcomments(); }
"..."		{ count(1); return DOTDOTDOT; }
">="		{ count(1); return GE; }
"<="		{ count(1); return LE; }
"!="		{ count(1); return NOTEQU; }
"=="		{ count(1); return EQU; }

"*="		{ count(1); return MULTEQU; }
"/="		{ count(1); return DIVEQU; }
"%="		{ count(1); return MODEQU; }
"+="		{ count(1); return INCEQU; }
"-="		{ count(1); return DECEQU; }
"<<="		{ count(1); return SHIFTLEFTEQU; }
">>="		{ count(1); return SHIFTRIGHTEQU; }
"&="		{ count(1); return ANDEQU; }
"|="		{ count(1); return OREQU; }
"^="		{ count(1); return XOREQU; }

"<<"		{ count(1); return SHIFTLEFT; }
">>"		{ count(1); return SHIFTRIGHT; }
"++"		{ count(1); return INC; }
"--"		{ count(1); return DEC; }
"->"		{ count(1); return POINTS; }
"&&"		{ count(1); return LOGICALAND; }
"||"		{ count(1); return LOGICALOR; }

"("		{ count(1); return '('; }
","		{ count(1); return ','; }
")"		{ count(1); return ')'; }
";"		{ count(1); return ';'; }
"{"		{ count(1); return '{'; }
"}"		{ count(1); return '}'; }
"["		{ count(1); return '['; }
"]"		{ count(1); return ']'; }
"*"		{ count(1); return '*'; }
"/"		{ count(1); return '/'; }
"+"		{ count(1); return '+'; }
"-"		{ count(1); return '-'; }
"%"		{ count(1); return '%'; }
"^"		{ count(1); return '^'; }
"&"		{ count(1); return '&'; }
"?"		{ count(1); return '?'; }
":"		{ count(1); return ':'; }
"!"		{ count(1); return '!'; }
"."		{ count(1); return '.'; }
"~"		{ count(1); return '~'; }
"<"		{ count(1); return '<'; }
">"		{ count(1); return '>'; }
"."		{ count(1); return '.'; }
"="		{ count(1); return '='; }

if		{ count(1); return IF; }
else		{ count(1); return ELSE; }
while		{ count(1); return WHILE; }
do		{ count(1); return DO; }
for		{ count(1); return FOR; }
switch		{ count(1); return SWITCH; }
case		{ count(1); return CASE; }
default		{ count(1); return DEFAULT; }
goto		{ count(1); return GOTO; }
continue	{ count(1); return CONTINUE; }
break		{ count(1); return BREAK; }
return		{ count(1); return RETURN; }
sizeof		{ count(1); return SIZEOF; }
auto		{ count(1); return AUTO; }
register	{ count(1); return REGISTER; }
static		{ count(1); return STATIC; }
extern		{ count(1); return EXTERN; }
typedef		{ count(1); return TYPEDEF; }
void		{ count(1); return VOID; }
char		{ count(1); return CHAR; }
short		{ count(1); return SHORT; }
int		{ count(1); return INT; }
long		{ count(1); return LONG; }
float		{ count(1); return FLOAT; }
double		{ count(1); return DOUBLE; }
unsigned	{ count(1); return UNSIGNED; }
enum		{ count(1); return ENUM; }
const		{ count(1); return CONST; }
volatile	{ count(1); return VOLATILE; }
struct		{ count(1); return STRUCT; }
union		{ count(1); return UNION; }
\'.*\' {
	count(1);
	yylval.strval = STRDUP(yytext+1);
	yylval.strval[strlen(yylval.strval)-1] = 0;
	/* return STRING without quotes */
	return CHAR_CONST;
}
{int} {
	count(1);
	yylval.intval = atoi(yytext);
	return INTEGER_CONST;
}
\".*\" {
	count(1);
	yylval.strval = STRDUP(yytext+1);
	yylval.strval[strlen(yylval.strval)-1] = 0;
	/* return STRING without quotes */
	return STRING;
}
{ident} {
	count(1);
	yylval.strval = STRDUP(yytext);
	/*
	 * we need to hash this and make a structure that tells us
	 * what is going on so we know what to do with it
	 * is it an enum, typedef, or identifier
	 *
	 * BTW: you'll have to cooperate with the grammer to do this.
	 */
	return IDENTIFIER;
}
.			{ count(1); return ERROR; }
%%
yywrap() {return(1);}

/* Skip over comments. */
skipcomments()
{
    char c;

    while (1) {
	while ((c = input()) != '*')
	    if (c == '\n') {
		column = 0;
		linenum++;
	    }
	    else if (c == '\t')
		column += 8 - (column % 8);
	    else
		column++;
	if ((c = input()) == '/') {
	    column++;
#ifdef LEXDEBUG
    printf ("symbol found: %s\n", "*/");
#endif
	    return;
	}
	unput(c);
    }
}

/*ARGSUSED*/
counter (notwhite)
{
    register char *s;

#ifdef LEXDEBUG
    if (notwhite)
	printf ("symbol found: %s\n", yytext);
#endif
    for (s = yytext; *s; s++)
	if (*s == '\n') {
	    column = 0;
	    linenum++;
	}
	else if (*s == '\t')
	    column += 8 - (column % 8);
	else
	    column++;
}

yyerror (s)
char   *s;
{
    fprintf (stderr, "YYERROR: %s: line %d col %d\n", s, linenum, column);
    fprintf (stderr, "YYERROR: yytext=`%s' symbol was (%d)\n", yytext, yychar);
}

-- sanders
For every message of the day, a new improved message will arise to overcome it.
Reply-To: cs.utexas.edu!ibmaus!auschs!sanders.austin.ibm.com!sanders     (ugh!)

wittig@gmdzi.UUCP (Georg Wittig) (03/27/90)

I've some problems with your solution. Quick examples:

news@awdprime.UUCP (USENET News) writes:
...
>#define STRDUP(X) ((char *)strcpy(malloc(strlen(X)+1),X))
	what if malloc returns 0  ???
...
>">="		{ count(1); return GE; }
	As far as I know, also ">     =" mut return GE.
-- 
Georg Wittig   GMD-Z1.BI   P.O. Box 1240   D-5205 St. Augustin 1 (West Germany)
email: wittig@gmdzi.uucp   phone: (+49 2241) 14-2294
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
"Freedom's just another word for nothing left to lose" (Kris Kristofferson)