[comp.lang.c] 'C', is it's grammar context sensitive ?

scott@bbxsda.UUCP (Scott Amspoker) (08/24/90)

In article <1990Aug23.164905.10261@NCoast.ORG> ramsey@NCoast.ORG (Cedric Ramsey) writes:
>I was studying the grammar for the 'C' language and I couldn't 
>help but notice that for declarations the grammar is context sensitive.
>Look at this here:
>
>[declarations deleted]

It is my experience in parsing a language that is similar to C that
the lexical analyser must be "symbol-table-aware".  In C you can
get into a sticky situation with typenames being used in declarations
and casts.  If you are using a yacc-like parser, a typename should be
something other than a generic IDENTIFIER to avoid ambiguities.

-- 
Scott Amspoker
Basis International, Albuquerque, NM
(505) 345-5232
unmvax.cs.unm.edu!bbx!bbxsda!scott

volpe@underdog.crd.ge.com (Christopher R Volpe) (08/24/90)

A couple of points:

Of course it's context sensitive. It's also context free. The set of
context sensitive grammars is a proper superset of the set of
context free grammars. I assume what you wanted to know is whether 
it is context sensitive but NOT context free.

The answer to this is NO. The grammar is indeed ambiguous, but it is
certainly context free, because all the productions have only one
non-terminal on the Left Hand Side. Whether or not a grammar is
context free depends only on the structure of the productions. 

To disambiguate the grammar, typedef_name must be a token returned
by the lexical analyzer. Naturally, this means that the lexical
analyzer, when it sees what looks like an identifier, has to check with
the symbol table to see if it has already been declared as a typedef name
in order to determine what token to return to the parser.


==================
Chris Volpe
G.E. Corporate R&D
volpecr@crd.ge.com

henry@zoo.toronto.edu (Henry Spencer) (08/24/90)

In article <1990Aug23.164905.10261@NCoast.ORG> ramsey@NCoast.ORG (Cedric Ramsey) writes:
>... for declarations the grammar is context sensitive.
>Since the 'typedef-name' is an identifier is impossible to determine that
>it is a type defintion without looking at the context...

Correct.  C's grammar has not been truly context-free, in the usual sense,
since typedefs were added, over a decade ago.  This is Excedrin Headache
Number 3.14159 of writing a C parser.  The standard answer is to have
some sort of ugly feedback path between symbol table and scanner, so
that the scanner can distinguish typedef names from other identifiers
and tell the parser "this one's a typedef name".  You can't easily do
this with a pre-pass because typedefs obey the block-structured scope
rules:  you need nearly a full parse just to classify identifiers.
-- 
Committees do harm merely by existing. | Henry Spencer at U of Toronto Zoology
                       -Freeman Dyson  |  henry@zoo.toronto.edu   utzoo!henry

ramsey@ncoast.UUCP (08/27/90)

	Hello again ! This question is directed towards the 'C' and compiler
gurus out there. I was studying the grammar for the 'C' language and I couldn't 
help but notice that for declarations the grammar is context sensitive.
Look at this here:

declaration: declaration-specifiers init-declarator-list_opt;
declaration-specifiers: 
	storage-class-specifier declaration-specifiers_opt
	type-specifier declaration-specifiers_opt
	type-qualifier declaration-specifiers_opt
type-specifier: one of
	void char short int long float double signed unsigned 
	struct-or-union-specifier enum-specifier typedef-name
typedef-name: IDENTIFIER

Since the 'typedef-name' is an identifier is impossible to determine that
it is a type defintion without looking at the context. I guess that one could
do a pre-scan of the source code and build typedef trees but I thought that
'C' was context free grammar. If you try to build a parser using the above 
grammar, which I did, a declaration tree for following grammar might derive:

sometype  id;

                                  declaration
                                  /         \
                 declaration-specifiers    init-declarator-list
                  /                                         \
         type-specifier                                     NULL
         /            \
typdef-name          type-specifier
     |                          \
IDENTIFIER = 'sometype'   typedef-name
                               |
                          IDENTIFIER = 'id'


Instead of:

                                  declaration
                                  /         \
                 declaration-specifiers    init-declarator-list
                  /                        /                  \
         type-specifier          init-declarator             NULL
         /          \                   |                      
typdef-name        NULL             declarator
     |                              /        \    
IDENTIFIER = 'sometype'        pointer      direct-declarator
                                 |              | 
                                NULL          IDENTIFIER = 'id'


Do you see point? I hope you do. Like I said before the solution to this
problem, I believe, is to do a pre-pass to collect typedef definitions or
just rearrange the grammar some-kind of adhoc way. I think that the prescan
idea is best best it keeps the grammar simple and easy to follow. What do
use guys think ?