[comp.lang.misc] Language Design

wsmith@m.cs.uiuc.edu (03/16/89)

Should the language designers be making work for the language-oriented
editor designers or should the language-oriented editor designers
be making work for the language designers?

I believe it should be the latter.   The language oriented-editor designers
should provide a technology base that may be taken as given by the language
designer who may then design languages without ugly lexical or syntactic 
structures that make it impossible for the compiler to give sensible error 
messages.

(For example, in the language I'm learning now, C++, if you forget to 
declare a class, it is a *syntax* error.  Not misuse of identifier or 
undefined class, *syntax* error.)

(Define language-oriented editor to by the hypertext object-oriented
WYSIWYG formating editor or equivalent that you wish *you'd* written.)

Bill Smith
wsmith@cs.uiuc.edu
uiucdcs!wsmith

gvcormack@watdragon.waterloo.edu (Gordon V. Cormack) (03/17/89)

In article <5200040@m.cs.uiuc.edu>, wsmith@m.cs.uiuc.edu writes:
> 
> 
> (For example, in the language I'm learning now, C++, if you forget to 
> declare a class, it is a *syntax* error.  Not misuse of identifier or 
> undefined class, *syntax* error.)
> 

Don't blame C++.  Blame C.  

C is not context-free parsable [actually I believe it is, but not 
with any sensible grammar].
-- 
Gordon V. Cormack     CS Dept, University of Waterloo, Canada N2L 3G1
gvcormack@waterloo.EDU  gvcormack@uwaterloo.CA  gvcormac@water.BITNET

flaps@dgp.toronto.edu (Alan J Rosenthal) (03/18/89)

In article <9091@claris.com> hearn@claris.com (Bob Hearn) writes:
>Well, the *grammar* defining C programs is certainly context-free!  Just
>look in the back of K&R.

I'm looking in the back of K&R (first edition), and if I'm not mistaken I'm
seeing context-sensitive stuff in the grammar.

Suppose you're looking at a compound-statement and you've just seen the `{'
opening it.  Now suppose the next three tokens are `extern', "x", and `;'.
This is produced by a declaration-list, expanding to declaration, which expands
to decl-specifiers init-declarator-list[opt] ;.  `extern' is an sc-specifier,
and decl-specifiers -> sc-specifier decl_specifiers[opt].  Now is "x" a
declarator (init-declarator-list -> init-declarator -> declarator ->
identifier) or a typedef-name (decl-specifiers -> type-specifier ->
typedef-name -> identifier)?

It's ambiguous unless you use non-context-free information to tell you whether
or not "x" was previously declared via typedef.

ajr

--
"The goto statement has been the focus of much of this controversy."
	    -- Aho & Ullman, Principles of Compiler Design, A-W 1977, page 54.

hearn@claris.com (Bob Hearn) (03/18/89)

>> (For example, in the language I'm learning now, C++, if you forget to 
>> declare a class, it is a *syntax* error.  Not misuse of identifier or 
>> undefined class, *syntax* error.)
>> 
>
>Don't blame C++.  Blame C.  
>
>C is not context-free parsable [actually I believe it is, but not 
>with any sensible grammar].
>-- 
>Gordon V. Cormack     CS Dept, University of Waterloo, Canada N2L 3G1
>gvcormack@waterloo.EDU  gvcormack@uwaterloo.CA  gvcormac@water.BITNET

Well, the *grammar* defining C programs is certainly context-free!  Just
look in the back of K&R.  This doesn't mean that any grammatically correct
program is semantically correct.  I don't know of any non-trivial 
language with a context-free grammar for semantically correct programs.  (I
may be wrong here, but certainly this is correct for most languages.)
Actually, I've always thought parts of the grammar (e.g. type declarations)
were hacked to facilitate easy parsing.
Many compilers, however, do consider some semantic errors syntax errors,
because it's easier that way.  As for the C++ syntax error described above,
that's compiler-dependent.  MPW C++ tells me that the identifier is not
a valid type name.

Bob Hearn
hearn@claris.com

steve@oakhill.UUCP (steve) (03/21/89)

In article <12443@watdragon.waterloo.edu>, gvcormack@watdragon.waterloo.edu (Gordon V. Cormack) writes:
> In article <5200040@m.cs.uiuc.edu>, wsmith@m.cs.uiuc.edu writes:
> > 
> > 
> > (For example, in the language I'm learning now, C++, if you forget to 
> > declare a class, it is a *syntax* error.  Not misuse of identifier or 
> > undefined class, *syntax* error.)
> > 
> 
> Don't blame C++.  Blame C.  
> 
> C is not context-free parsable [actually I believe it is, but not 
> with any sensible grammar].
> -- 
> Gordon V. Cormack     CS Dept, University of Waterloo, Canada N2L 3G1
> gvcormack@waterloo.EDU  gvcormack@uwaterloo.CA  gvcormac@water.BITNET

Don't blame C either.  The real cause of this error is a conspiracy
between the compiler writer and YACC.  YACC produces this message
for every syntax error unless you specifically design your grammer
to catch a specific error and output a different message.  The writer
of your compiler probably used YACC, and then didn't design his
grammer to do this in this case.  He is probably not too much to blame
though.  To design a languge to catch every possible generic grammerical
error would make the compiler so slow you might as well code in assembly.

                   enough from this mooncalf - Steven
----------------------------------------------------------------------------
To flame someone for bad spelling or grammer is a discouragement to net
discussion in a timely fashion.
----------------------------------------------------------------------------
These opinions aren't necessarily Motorola's or Remora's - but I'd like to
think we share some common views.
----------------------------------------------------------------------------
Steven R Weintraub                        cs.utexas.edu!oakhill!devsys!steve
Motorola Inc.  Austin, Texas 
(512) 440-3023 (office) (512) 453-6953 (home)
----------------------------------------------------------------------------

hearn@claris.com (Bob Hearn) (03/21/89)

>----------------------------------------------------------------------------
>To flame someone for bad spelling or grammer is a discouragement to net
>discussion in a timely fashion.
>----------------------------------------------------------------------------

Uh, that's 'grammar'. :-)

Bob Hearn
hearn@claris.com

hearn@claris.com (Bob Hearn) (03/21/89)

In article <8903172225.AA05088@explorer.dgp.toronto.edu> flaps@dgp.toronto.edu (Alan J Rosenthal) writes:
>
>In article <9091@claris.com> hearn@claris.com (Bob Hearn) writes:
>>Well, the *grammar* defining C programs is certainly context-free!  Just
>>look in the back of K&R.
>
>I'm looking in the back of K&R (first edition), and if I'm not mistaken I'm
>seeing context-sensitive stuff in the grammar.
>
>Suppose you're looking at a compound-statement and you've just seen the `{'
>opening it.  Now suppose the next three tokens are `extern', "x", and `;'.
>This is produced by a declaration-list, expanding to declaration, which expands
>to decl-specifiers init-declarator-list[opt] ;.  `extern' is an sc-specifier,
>and decl-specifiers -> sc-specifier decl_specifiers[opt].  Now is "x" a
>declarator (init-declarator-list -> init-declarator -> declarator ->
>identifier) or a typedef-name (decl-specifiers -> type-specifier ->
>typedef-name -> identifier)?
>
>It's ambiguous unless you use non-context-free information to tell you whether
>or not "x" was previously declared via typedef.
>
>ajr
>

Enough of this bullshit!  How many times do I have to say it???  The grammar
in K&R is context-free.  Do you know what a context-free grammar *is*?  You're
confusing the *class* of the grammar (ever heard of the Chomsky hierarchy?)
with what categories of parsers will work on it.  You're also confusing
*syntactic correctness* with *semantic correctness*.  This is a little more
acceptable, since, as I was saying, most compilers do to make parsing
easier.

Most C compilers use LALR(1) parsers (stands for "look-ahead left to right,
order 1), which are a subcategory of shift-reduce parsers.  The 1 means they
can look at the next 1 token on the input stream when deciding whether to
shift or reduce.  And you are quite correct in saying that, in this case,
that one token is not enough; you *do* need symbol table information.  But
(1) No one says all context-free grammars have to be LALR(1) parsable
without context information, (2) no one says that the grammar in the back of
K&R is necessarily the one that's used in practice (grammars are often
tweaked (without modifying the language generated!) no make them easier to
parse), and (3) just because your compiler gives you a syntax error when
you do something like that, that doesn't necessarily mean your program
is syntactically incorrect.  It just means that treating that kind of
semantic error as a syntax error makes the language easier to parse.

Now, before anyone else wants to tell me that K&R's grammar for C is not
context-free, will you *please* go and look up the definition of context-free
grammar???  Hint: anything that can be expressed in BNF is context-free!!

Bob Hearn
hearn@claris.com

djones@megatest.UUCP (Dave Jones) (03/21/89)

From article <9122@claris.com>, by hearn@claris.com (Bob Hearn):
...
> 
> Now, before anyone else wants to tell me that K&R's grammar for C is not
> context-free, will you *please* go and look up the definition of context-free
> grammar???  Hint: anything that can be expressed in BNF is context-free!!
> 
> Bob Hearn
> hearn@claris.com
>  

Bob, Bob, Bob.

Chill out, Dude. You're going to blow a brain vain if you're not
careful.

Now then.

The term "context-free", as Chomsky defined it is not very useful
to us computer types, because no one ever, ever writes grammars any
other way.  To him, "context-free" meant that a rule could only
have one symbol on the left-hand side. "Context-sensitive" meant that
production rules could have strings of symbols on the left:

       a X b <= a x y b

In the above, the a and b form a "context" for the reduction
X <= xy.

I haven't been following the discussion, but I suspect that the
correspondents were not alluding to the unuseful Chomsky definitions,
but instead were using the terms quite literally: "context-free" meaning
"context-free".  I think that is a perfectly reasonable thing to
do, especially if one is unfamiliar with the confusing Chomsky
definition.  (I never did like the practice of "defining" something
that already had a meaning in normal English.)

The grammar as published in K&R (both additions) requires
that the lexical analyzer must use left-context to determine whether
or not a token is recognized as a type-name identifier or a regular
identifier.  In that sense, it is not "context-free".

I seem to remember that I once figured out that it is possible to
modify the grammar so that the scanner and parser do not have to know
about left-context.  But I think I remember that such a parser would
require a two token-lookahead, making it impossible to use with most
compiler-compilers.

norman@a.cs.okstate.edu (Norman Graham) (03/22/89)

From article <8903172225.AA05088@explorer.dgp.toronto.edu>, by flaps@dgp.toronto.edu (Alan J Rosenthal):
> 
> In article <9091@claris.com> hearn@claris.com (Bob Hearn) writes:
>>Well, the *grammar* defining C programs is certainly context-free!  Just
>>look in the back of K&R.
> 
> I'm looking in the back of K&R (first edition), and if I'm not mistaken I'm
> seeing context-sensitive stuff in the grammar.
> 
> [Stuff deleted about ambiguous derivation sequence.]
> 
> It's ambiguous unless you use non-context-free information to tell you whether
> or not "x" was previously declared via typedef.

I have to agree with Bob on this one, but for a different reason.
The grammar in the back of K&R certainly is context-free... but it
is also ambiguous.  To paraphrase Aho & Ullman's "The Theory of 
Parsing, Translation, and Compiling, Volume I: Parsing", page 91:

    A grammar is Context-free if every production has a single
    nonterminal on the left and a sequence of 0 or more terminals
    or nonterminals on the right.

It is obvious that the grammars in both editions of K&R fit this 
description.  Thus they are context-free.

Now on to ambiguity. According to Aho & Ullman (same as above), pagee 143: 

    We say that a CFG G is ambiguous if there is at least one sentence
    w in L(G) for which there is more than one distinct derivation tree
    with frontier w.  This is equivalent to saying that G is ambiguous
    if there is a sentence w in L(G) with two or more distinct leftmost
    (or rightmost) derivations.

The classic example of ambiguity is the "dangling else" problem, as
illustrated by the grammars in both editions of K&R.

So we see that K&R's grammars are ambigous context-free grammars. That's
no big deal--the ambiguity doesn't affect the language generated by the
grammar.  The problem comes in when compilers base the semantic meaning
of a sentence on the derivation sequence the grammar uses to generate
that sentence.  For some reason :-) compiler writers like to choose
deterministically between competing derivation sequences.  Thus they
either statically choose the proper derivation sequence (eg. the "else"
always matches the last else-less "if") or they use run-time information
to help them choose the "proper" derivation sequence.

-- 
Norman Graham                            Oklahoma State University
  Internet:  norman@a.cs.okstate.edu     Computing and Information Sciences
      UUCP:  {cbosgd, ihnp4,             219 Mathematical Sciences Building
              rutgers}!okstate!norman    Stillwater, OK  74078-0599

hearn@claris.com (Bob Hearn) (03/23/89)

In article <2970@goofy.megatest.UUCP> djones@megatest.UUCP (Dave Jones) writes:
>The term "context-free", as Chomsky defined it is not very useful
>to us computer types, because no one ever, ever writes grammars any
>other way.  To him, "context-free" meant that a rule could only
>have one symbol on the left-hand side. "Context-sensitive" meant that
>production rules could have strings of symbols on the left:
>
>       a X b <= a x y b
>
>In the above, the a and b form a "context" for the reduction
>X <= xy.

Precisely!  But, gee... I've considered myself a computer type for, oh, 12
years or so, and it's useful to me.  How you write grammars depends on
what you're going to use the grammars for.  I don't think anyone could
write a compiler, for example, without being at least familiar with 
terms of formal language theory like "context-free" and "context-sensitive."

>
>I haven't been following the discussion, but I suspect that the
>correspondents were not alluding to the unuseful Chomsky definitions,
>but instead were using the terms quite literally: "context-free" meaning
>"context-free".  I think that is a perfectly reasonable thing to
>do, especially if one is unfamiliar with the confusing Chomsky
>definition.  (I never did like the practice of "defining" something
>that already had a meaning in normal English.)

If you *had* been following the discussion, you'd have seen that my original
posting was a followup to someone complaining about how screwed-up C's
grammar was, that it wasn't context-free.  When someone uses the words
"context-free" and "grammar" in the same sentence, I assume he knows what
he's talking about, and what he said was incorrect... K&R's grammar for C
*is* CF, and not particularly screwed-up.  I've received no end of mail
and flames for that simple correction!  As for "context-free" already
having a meaning in normal English, I disagree.  One could say that
you need context to parse C, but if you use the phrase "context-free" in
the realm of computer science, it's because you've heard it before, and
you should find out what it means before using it.

Bob Hearn
hearn@claris.com

P.S. I'll chill when I stop getting flamed.

hearn@claris.com (Bob Hearn) (03/23/89)

In article <12606@watdragon.waterloo.edu> gvcormack@watdragon.waterloo.edu (Gordon V. Cormack) writes:
>
>A context-free grammar G is a 4-tuple  G = <N,T,P,S> where N is a finite
>set of nonterminal symbols, T (the alphabet) is a finite set of terminal 
>symbols, P is a finite set of production rules of the form  A -> X1 X2 X3 ...
>where A is in N, and X1, X2, X3 are in Union(N,T).  S in N is the Start symbol.
>
>The K&R grammar does not have a finite alphabet. 
>-- 
>Gordon V. Cormack     CS Dept, University of Waterloo, Canada N2L 3G1
>gvcormack@waterloo.EDU  gvcormack@uwaterloo.CA  gvcormac@water.BITNET

Oh.  That's nice to know.  From Scott, who is looking over my shoulder:
"Why are even bothering to argue with these idiots?"
Good question.

Bob Hearn
hearn@claris.com

djones@megatest.UUCP (Dave Jones) (03/23/89)

From article <9140@claris.com>, by hearn@claris.com (Bob Hearn):
> In article <2970@goofy.megatest.UUCP> djones@megatest.UUCP (Dave Jones) writes:

> 
> If you *had* been following the discussion, you'd have seen that my
> original posting was a followup to someone complaining about how
> screwed-up C's grammar was, that it wasn't context-free.  When someone
> uses the words "context-free" and "grammar" in the same sentence, I
> assume he knows what he's talking about, ...

What a silly assumption.  I almost never assume that people who post
to the net know what they are talking about.  But what you really
mean, I think, is that you want to use the term "context-free" exactly
the way Chomsky did.  Okay, that's your right.  But don't be surprised
or offended if someone doesn't know it.  Just explain.

I taught two terms of formal language theory during a two year stint 
as Visiting Associate Prof at Ohio University.  I'm sure I brought the
subject up, but only to try to avoid the kind of discussion that is
going on here. We never talked about (Chomsky) context-sensitive
grammars again.  I suspect that for most students, that small piece of
information got lost along the way.

Context-sensitive grammars, (in the Chomsky sense), are just not useful.
I really don't know why people continue to use his definitions.

I just now pulled a new compiler book off the shelf, completely at random.
It does pay lip-service to the Chomsky definitions. But  of its 811 pages,
it devotes less than one quarter of page 97 to context-sensitive
grammars!  First it gives a very brief definition, then the following:

      While context-sensitive grammars and type-0 grammars are more
      powerful than context-free grammars, they are also far less
      useful. The problem is that efficient parsers for these extended
      grammar classes do not exist, and without a parser there is no
      way to use a grammar definition to drive a compiler.

      [ Two sentences about parsers for context-free grammars omitted... ]

      Whenever we mention a grammar (without saying which kind), the
      grammar will be assumed to be context-free.

> [...]
> I've received no end of mail and flames for that simple correction!

Take my advice. Quit worrying about the flames you receive. Flamers
are small minded bigots, pedants, and idiots. Just ignore them.
(Or better yet, *apologize* sincerely.  God, that irritates them!)

                        Dave J.

P.S. Spring has sprung.  I probably won't be around the net much
in the next few weeks.

holt@turing.toronto.edu (Ric Holt) (03/23/89)

#>>A context-free grammar G is a 4-tuple  G = <N,T,P,S> where N is a finite
#>>set of nonterminal symbols, T (the alphabet) is a finite set of terminal 
#>>symbols, P is a finite set of production rules of the form  A -> X1 X2 X3 ...
#>>where A is in N, and X1, X2, X3 are in Union(N,T).  S in N is the Start symbol.
#>"Why are even bothering to argue with these idiots?"
#>Good question.
#>
#>Bob Hearn
#>hearn@claris.com

In my modest opinion, Bob Hearn is doing a fine job of entertaining us
with his emotional outbursts, but a considerable disservice with his
claims.  Likely, Cormack is correct and Hearn is wrong.  Likely
C's grammar is not Context Free.

I believe Cormack (in his academic way) has proven Hearn wrong.  I believe
C's grammar has the appearance of a context free grammar but that it
violates the mathematical def of context free.

This question (is C's grammar context free) is 
important and we should get a clear answer to it rather than outbursts.
Could someone please settle this definitely?
		Ric Holt

ge@phoibos.UUCP (Ge Weijers) (03/23/89)

From article <9122@claris.com>, by hearn@claris.com (Bob Hearn):
>>I'm looking in the back of K&R (first edition), and if I'm not mistaken I'm
>>seeing context-sensitive stuff in the grammar.
> Enough of this bullshit!  How many times do I have to say it???  The grammar
> in K&R is context-free.  Do you know what a context-free grammar *is*?

Yeah, here is my regular C grammar (a regular expression, really):

c	:	
	|	letter c
	|	digit c
	|	/* etc. */
	;

Okay, it matches a lot more than C, but all legal C programs are acceptable.
So are all programs in all other programming languages. Useful!
The grammar in K&R is in the same category. It accepts all legal C programs,
plus a lot more. Because of current compiler technology we use an artificial
separation between syntax and static semantics. Static semantics can be
expressed in more powerful grammar types (2-level van Wijngaarden grammars etc.)
Your note is correct, but so is the observation that some context-sensitive
features are suggested by using different names for the same terminal, nl.
identifier. K&R use non-terminal names judiciously to suggest more than they
are really telling.
C can not be described exactly by a CFG, so C is not a context-free language.
Usually this type of confusion starts because one party is talking about
a grammar, whilst the other party is really talking about (an exact grammar of)
the language.

Ge' Weijers, KUN Nijmegen, the Netherlands
ge@cs.kun.nl

hearn@claris.com (Bob Hearn) (03/24/89)

In article <89Mar23.094504est.4328@turing.toronto.edu> holt@turing.toronto.edu (Ric Holt) writes:
>In my modest opinion, Bob Hearn is doing a fine job of entertaining us
>with his emotional outbursts, but a considerable disservice with his
>claims.  Likely, Cormack is correct and Hearn is wrong.  Likely
>C's grammar is not Context Free.

OK, OK, I'm sorry.  I'm new to the net, and I wasn't expecting to get flamed
*quite* so much for one simple correction... it just became very annoying.
I'll try to restrict my responses to computer science from now on.

>I believe Cormack (in his academic way) has proven Hearn wrong.  I believe
>C's grammar has the appearance of a context free grammar but that it
>violates the mathematical def of context free.
>
>This question (is C's grammar context free) is 
>important and we should get a clear answer to it rather than outbursts.
>Could someone please settle this definitely?
>		Ric Holt

Good idea.  Although I think this whole argument is founded on
misunderstandings.  It all depends on what you mean by "context-free
grammar."  If you follow the technical definition, then there is no
question that C's grammar (as specified in K&R) is context-free.  As
for Cormack's claim that C does not have a finite set of terminals, I
can't say that I really understand what he means.  Perhaps he is 
referring to the fact that there is an infinite number of identifiers?
Well, if we restrict ourselves to K&R, then the terminal in question is
simply (identifier), and it does not decompose further.  If we want
to make this grammar more concrete and have it generate the actual
language that a compiler accepts, then we would have identifiers producing
strings over the char. set.  In this case, the char. set is the set
of terminals, and it is certainly finite.

BTW, the real argument here is not whether C's grammar satisfies some
mathematical definition.  It does, but it is not a particularly relevant
piece of information.  The only reason I claimed it *did* was that
someone, a long time ago, made some comment on how screwed up C's grammar
was, and that it wasn't context-free.  Well, as I don't think C's grammar
is particularly screwed up relative toother languages' grammars, I
responded.  I happen to like C, and I don't like C-bashing.  Guess I
learned my lesson, huh?

Bob Hearn
hearn@claris.com

stevev@tekchips.LABS.TEK.COM (Steve Vegdahl) (03/24/89)

In article <9122@claris.com>, hearn@claris.com (Bob Hearn) writes:
> Enough of this bullshit!  How many times do I have to say it???  The grammar
> in K&R is context-free.
 ...
> Now, before anyone else wants to tell me that K&R's grammar for C is not
> context-free, will you *please* go and look up the definition of context-free
> grammar???  Hint: anything that can be expressed in BNF is context-free!!

I detect that we are picking nits here.  I think that we all know what is
meant by the claim that the K&R grammar is not context-free.  It has been
noted that the K&R grammar is *ambiguous*; hence, its specification of
how programs should be parsed is *ill-defined*.  Additional
"context-sensitive" specifications are added to the K&R grammar in the form
of English prose (in the second edition, anyway; the first edition
seems to ignore the ambiguity problem).

Effectively, the grammar in K&R is "BNF + English prose".  Taking this
view, the K&R grammar is not context-free.

		Steve Vegdahl
		Computer Research Lab
		Tektronix Labs
		Beaverton, Oregon

gudeman@arizona.edu (David Gudeman) (03/24/89)

In article  <9165@claris.com> hearn@claris.com (Bob Hearn) writes:
>Good idea.  Although I think this whole argument is founded on
>misunderstandings.  It all depends on what you mean by "context-free
>grammar."  If you follow the technical definition, then there is no
>question that C's grammar (as specified in K&R) is context-free.

I agree, that this is founded on misunderstandings, but I think you've
got the wrong misunderstanding.  I believe that C's grammar is _not_,
in fact context free, though the grammar in K&R is.  How can this be?
I'm glad you asked.  The grammar given in K&R isn't technically
correct.  Don't take offense, I'm not making a judgement on this
practice, just pointing it out.

One inaccuracy is in the production

	typedef-name : identifier

I don't know if there are others.  "typedef-name" occurs in the production

	type-specifier: ... | typedef-name

and a mechanical reading of this implies that _any_ identifier can
occur as a type-specifier.  This is not the case.  In fact, an
identifier can only be a type-specifier if it has previously been
declared with a typedef.  I can't prove it, but I believe you need a
context-sensitive grammar to specify this.

So both claims made in this argument are correct:

(1) the grammar of C is not context free
and
(2) the grammar given in K&R is context free

hearn@claris.com (Bob Hearn) (03/24/89)

In article <9861@megaron.arizona.edu> gudeman@arizona.edu (David Gudeman) writes:
>I agree, that this is founded on misunderstandings, but I think you've
>got the wrong misunderstanding.  I believe that C's grammar is _not_,
>in fact context free, though the grammar in K&R is.  How can this be?
>I'm glad you asked.  The grammar given in K&R isn't technically
>correct.  Don't take offense, I'm not making a judgement on this
>practice, just pointing it out.
>
>One inaccuracy is in the production
>
>	typedef-name : identifier
>
>I don't know if there are others.  "typedef-name" occurs in the production
>
>	type-specifier: ... | typedef-name
>
>and a mechanical reading of this implies that _any_ identifier can
>occur as a type-specifier.  This is not the case.  In fact, an
>identifier can only be a type-specifier if it has previously been
>declared with a typedef.  I can't prove it, but I believe you need a
>context-sensitive grammar to specify this.
>

Yes, this is another big misunderstanding in this discussion... I would
say that the *grammar* stipulates that a type-specifier can be an
identifier, but the *semantics* specifies that this results in a valid
program only when the identifier is a valid type name.  This is really
the same problem as

{
    .
    .
    x = 1;
    . 
    .
}

which is correct only if x has been previously declared.  I think here
one would always call this an undeclared identifier error, a type of
semantic error, rather than a syntax error.  And you're quite right that
it takes a context-sensitive grammar to generate the set of VALID C programs
in this sense!  My point is that this is true for just about any language
you can name, and there is nothing particularly odd about this aspect
of C.

In any case, this discussion is getting old... I vote we talk about something
new.  This is comp.lang.misc, not comp.lang.C.grammar, right? :-)
So, anyone designed any good languages lately?

Bob Hearn
hearn@claris.com

gudeman@arizona.edu (David Gudeman) (03/24/89)

In article  <9170@claris.com> hearn@claris.com (Bob Hearn) writes:
]In article <9861@megaron.arizona.edu> gudeman@arizona.edu (David Gudeman) writes:
]>
]>One inaccuracy is in the production
]>
]>	typedef-name : identifier
]>
]>I don't know if there are others.  "typedef-name" occurs in the production
]>
]>	type-specifier: ... | typedef-name
]>
]>and a mechanical reading of this implies that _any_ identifier can
]>occur as a type-specifier.  This is not the case.  In fact, an
]>identifier can only be a type-specifier if it has previously been
]>declared with a typedef.
]
]Yes, this is another big misunderstanding in this discussion... I would
]say that the *grammar* stipulates that a type-specifier can be an
]identifier, but the *semantics* specifies that this results in a valid
]program only when the identifier is a valid type name.  This is really
]the same problem as  [undeclared identifier]

It sounds as though you are defining the semantics of a language as
any part that cannot be described with a context free grammar.  Of
course this makes you trivially correct, but does not affect the
opinions of people who criticize C based on a different definition of
syntax and semantics.  Also, you cannot treat the C grammar as a
formal grammar that is only meant to accept strings from a formal
language.  The grammar is intended to show syntactic categories with
semantic attachments.

The problem of undeclared identifiers is common to many programming
languages, but the typedef problem is more serious.  An identifier is
always parsed into category "identifier" regardless of whether it has
been declared.  However, typedefs are ambigous.  For example:

typedef int foo;
int i;

void f()
{ foo *i;
  ...
}

The K&R grammar is ambigous on how to parse this, while any C
programmer would expect "i" to be a local pointer to an int.  Once an
identifier has been declared as a type, it should no longer be an
identifier, but rather have a different syntactic class.  No, this
does not affect the language accepted by the grammar.  But it does
affect the syntactic category of the whole string "foo *i;".

Because of the differences in syntactic categories, most people would
say that

static foo x;

should be a syntax error if foo has not been typedefed.  You can argue
that, but like I said before, you are only defining terms to fit your
views.

>In any case, this discussion is getting old... I vote we talk about something
>new.  This is comp.lang.misc, not comp.lang.C.grammar, right? :-)

The following should not be taken as a flame, I'm just making an
observation: If you want to stop the discussion, you don't have to
reply.  It doesn't work to write an article and then suggest that no
one respond to it after you have gotten in the last word.

norvell@csri.toronto.edu (Theodore Stevens Norvell) (03/25/89)

In article <12606@watdragon.waterloo.edu> gvcormack@watdragon.waterloo.edu (Gordon V. Cormack) writes:
>A context-free grammar G is a 4-tuple  G = <N,T,P,S> where N is a finite
>set of nonterminal symbols, T (the alphabet) is a finite set of terminal 
>symbols, P is a finite set of production rules of the form  A -> X1 X2 X3 ...
>where A is in N, and X1, X2, X3 are in Union(N,T).  S in N is the Start symbol.
>
>The K&R grammar does not have a finite alphabet. 

I'm sure that Gordon Cormack has something in mind here because I know
he's a smart guy, but I'm darned if I can figure out what it is.  In
any case, I'm sure that the K&R grammar only uses a finite subset of its
alphabet, because otherwise it wouldn't fit in my desk.

So Gordon, if your still there after being insulted in such an uncalled for
way, please enlighten us.

As I recall, the grammar in K&R [1978] has a single terminal "identifier"
which includes all identifiers, a nonterminal called "typedef_name"
defined by the production
	typedef_name --> identifier
This leads to an ambiguity because phrases like
	t * i ;
can be understood to be either declarations or statements. But ambiguous
grammars have been with us a long time.  People even use parser generators
that accept ambiguous grammars! In fact there are other ambiguities
in the grammar (e.g. the dangling else).  Ambiguities don't change the
set of strings admitted by a grammar they merely admit some strings in
more than one way.  

The real question is what do you want to call syntax and what do you
want to call static semantics.  In the absence of any definitive definition,
this is a matter of taste and convenience to the implementor.  Is
the following (complete) compilation unit syntactically correct?
	f() { t * i ; int a; }
The grammar says yes; most compliers say "syntax error".  If we
change the grammar to make typedef_name a terminal and remove its
only production, then the grammar agrees with the compilers.
The grammar is of course still context free, but the ambiguity is
gone.

Ric Holt raises the separate question of whether the C language is
context free.  The syntax of C certainly is.  The problem with
C is that it is not "feedback free" in the following sense. The
parsing of C depends on the lexing of C (this is only natural),
but the lexing depends on the parsing.  Of course, with the original
K&R grammar, C _is_ feedback free which is no doubt why Ritchie
wrote the grammar that way.

Summary:
-The K&R [1978] grammar is context free, but describes the wrong language.
-The C language syntax (abstracted from lexing and static semantics)
 is also context free.
-The usual implementations and most useful definitions of C language
 are not "feedback free".
-We're still in the dark about infinite alphabets.

Theo Norvell

(The term "feedback free" is used in a completely different sense by Holt.
My apologies for stealing it.  By the time I realized I was using it
differently, I liked it too much to give it back.  In Holt's sense, C is
feedback free. See the Turing Programming Language Design and Definition
for his definition.  In fact, see it anyway, if you're interested in
design or definition.)

norman@a.cs.okstate.edu (Norman Graham) (03/25/89)

From article <9861@megaron.arizona.edu>, by gudeman@arizona.edu (David Gudeman):
> [stuff deleted about how the grammar specifies that
>  any identifier can serve as a type-specifier]
> 
> So both claims made in this argument are correct:
> 
> (1) the grammar of C is not context free
> and
> (2) the grammar given in K&R is context free

This is an interesting point (ie. one I hadn't thought of before :-).
In theory, a grammar is a generator for strings in a language, not
a recognizer for a language (eg. various forms of automata).  So
the grammar in K&R is context-free, but it doesn't generate the C
language--in addition to valid C programs it also generates invalid
C programs. Of course, when this grammar is used as the basis of
a compiler, the semantic analyzer should detect these invalid
programs. This makes sense to me... how about anyone else out there?


-- 
Norman Graham                            Oklahoma State University
  Internet:  norman@a.cs.okstate.edu     Computing and Information Sciences
      UUCP:  {cbosgd, ihnp4,             219 Mathematical Sciences Building
              rutgers}!okstate!norman    Stillwater, OK  74078-0599

pa1162@sdcc15.ucsd.edu (pa1162) (03/27/89)

Bob,
I'm only a student, so don't think I have pretensions to telling you folks
you're business, but the grammar in K&R probably isn't even the grammar
for C... as I think you or someone else mentioned... it gets tweaked
a bit if you want to try to write a compiler from it... so worry about
whether it is context-free is immaterial.  Why don't we all open our
K&R (old edition) to page 214 and read the note at the heading of section 18
"Syntax Summary".  It says that the syntax summary given isn't
meant as an exact statement of the language.  This isn't meant to
be a flame... it's an attempt at a distinction because I think you and
whoever are arguing at cross-purposes.  The discussion is either
about context-freeness and grammars or about non-precise syntax summaries
and it's really not clear which one.  As I said I am yet a student...
but I can see the structure of an argument when it hits me in the face.
I recommend somone clarify the and separate the matters at hand so that
all this "stuff" can be separated.
-Thomas Kammeyer (A student who likes to stick his nose into discussions)

john@frog.UUCP (John Woods) (03/27/89)

In article <9165@claris.com>, hearn@claris.com (Bob Hearn) writes:
> >C's grammar is not Context Free.
> grammar."  If you follow the technical definition, then there is no
> question that C's grammar (as specified in K&R) is context-free.

I think the crux of the argument of the non-context-free crowd is that
the grammar in K&R does NOT actually specify the C language by itself, not
as a grammar.  It relies on the trick of telling the lexer about typedefs,
which is a form of back-door context.

Consider:  if you wrote a YACC specification of C according to the context-free
grammar, but omitted any kind of action clauses (perhaps you want to turn on
debugging and watch the state machine play, a fine entertainment for a rainy
day), it would NOT be able to correctly parse a C program with typedefs (unless,
of course, you hid a "real" C grammar (with actions and the ever present
back-door gimmick) in the lexer itself).

When the pro-context-free crowd's argument is presented as "So what; it is
close enough, and it works", the argument in convincing.  When it is presented
as "Yeah, well, your mother wears army boots!", it isn't.
-- 
John Woods, Charles River Data Systems, Framingham MA, (508) 626-1101
...!decvax!frog!john, john@frog.UUCP, ...!mit-eddie!jfw, jfw@eddie.mit.edu

			Remainder Khomeini!

clive@ixi.UUCP (Clive) (03/29/89)

In article <8903250115.AA08564@yorkville.csri.toronto.edu>, norvell@csri.toronto.edu (Theodore Stevens Norvell) writes:
> The real question is what do you want to call syntax and what do you
> want to call static semantics.  In the absence of any definitive definition,
> this is a matter of taste and convenience to the implementor.

For a new approach to this, look at:

"The Revised Report on the Algorithmic Language Algol 68".
- van Wijngaarden et al

This approached this problem by defining the grammar using another (context-free)
grammar (hence the term 'two-level grammar'). This meant that things like not using
a variable before it was defined were specified in the *grammar* - very nifty.
To do this, the second level grammar had a non-terminal called NEST which expanded
to sequences like "int a char b" and so on. Productions of the first level grammar
had NESTs scattered in them (translating to C):

    TYPE NEST variable : TYPE NEST IDENTIFIER, where TYPE IDENTIFIER declared in NEST .

(comma means "and"). The production

    where TYPE IDENTIFIER declared in NEST

expands to "empty" if the variable was declared, and to a nonterminal with no
production rule if it isn't - thus the grammar can't generate a program with an
undeclared variable. (Algol 68 had user-definable types, like C).

They did all kinds of things with this - type checking of assignments and operator
arguments, and context sensitive automatic type conversion come to mind.

Pity it was so hard to parse :-)
-- 
Clive D.W. Feather           clive@ixi.uucp
IXI Limited                  ...!mcvax!ukc!acorn!ixi!clive (untested)
                             +44 223 462 131

norman@a.cs.okstate.edu (Norman Graham) (04/01/89)

From article <125@ixi.UUCP>, by clive@ixi.UUCP (Clive):
> This approached this problem by defining the grammar using another (context-free)
> grammar (hence the term 'two-level grammar'). This meant that things like not using
> a variable before it was defined were specified in the *grammar* - very nifty.


Of course these two-level grammars are *substantially* more powerful
than context-free grammars.  Two-level grammars are capable of generating
recursively-enumerable sets (ie. those things that can be recognized only
by something with the power of a Turing machine).  This class of languages
is much larger than both the Context-Free and Context-Sensitive languages.

However, Two-level grammars can also be used to generate the Context-Free
and Context-Sensitive languages that we use for programming.  Two-level
grammars provide a compact and elegant (some would say 'cryptic' :-)
notation for the description of programming language syntax without
the need to escape to a natural language.  Personally, I would love
to see more language designers take advantage of this tool.


-- 
Norman Graham                            Oklahoma State University
  Internet:  norman@a.cs.okstate.edu     Computing and Information Sciences
      UUCP:  {cbosgd, rutgers}           219 Mathematical Sciences Building
              !okstate!norman            Stillwater, OK  74078-0599

chris@mimsy.UUCP (Chris Torek) (04/03/89)

In article <1163@frog.UUCP> john@frog.UUCP (John Woods) writes:
>I think the crux of the argument of the non-context-free crowd is that
>the grammar in K&R does NOT actually specify the C language by itself, not
>as a grammar.  It relies on the trick of telling the lexer about typedefs,
>which is a form of back-door context.

For practical parsing (using, e.g., yacc), yes; for CF-ness, no.  The
*real* crux of the argument is that while you can write a decent, simple,
no-feedback-through-lexer CFG that accepts all C programs, you can *not*
write a decent, simple, no-feedback LALR(1) parser to do this, because
the simple CFG is ambiguous.

Those who say `C does not have a CFG' are really saying `C does not
have a trivially yacc-able grammar', which is true (but, as I keep
saying, really means only that we need a better parser generator than
yacc).

(And no, it does not take a nondeterministic automaton to parse C.  The
little back-door trick would work wonderfully well if only you could
know, each time a yacc parser examines a token, which rule yacc was
attempting.  When it tries `typedef-name: identifier', the lexer should
answer `identifier' if and only if the identifier happens to be a
typedef.  Then make sure that yacc tries this rule before trying
`primary: identifier' [this part is easy] and voila!  you have a
parser.  Of course, yacc actually only calls the parser once for both
rules; you would need a way to change the token on the fly.  A better
gimmick---less expensive in compute power, at least---would be to
associate a function (predicate) with each rule, and allow the
predicate to abort the rule.  You then attach an `identifier-really-is-
typedef' predicate to the `typedef: identifier' rule, making the parser
pass over this resolution and choose instead the `primary: identifier'
rule.  Indeed, the same trick would allow one to reject undeclared
identifiers, making `program foo; begin bar := 1 end.' a syntax error
instead of a semantic one---not that this is a good thing to do!)
-- 
In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 7163)
Domain:	chris@mimsy.umd.edu	Path:	uunet!mimsy!chris

chris@mimsy.UUCP (Chris Torek) (04/04/89)

In article <16674@mimsy.UUCP> I wrote:
>(And no, it does not take a nondeterministic automaton to parse C. ...
[gimmick A deleted; gimmick B:]
>associate a function (predicate) with each rule, and allow the
>predicate to abort the rule.

gvcormack points out (as I conveniently forgot to mention) that this
is actually an affix grammar, and affix grammars are not context free.

I think this is a good point for a summary.

	- There is no context free grammar that recognises only valid
	  C programs (but this is true of many languages).

	- There is a trivial context free grammar that recognises all
	  valid C programs, along with a number of invalid ones, but it
	  is ambiguous.  There is probably no unambiguous CFG for C
	  (consider arbitrarily many parentheses around an id: is it a
	  cast or a value?).

	- There is a trivial affix grammar (gimmick B above) that
	  recognises all valid C programs, along with a number of
	  invalid ones, which is not ambiguous and can be handled with
	  a modified yacc (you would have to augment yacc's output
	  tables and parser to retain both reductions and to know what
	  to call to select which is to occur).

and of course,

	- There is always the wrong way. :-)
-- 
In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 7163)
Domain:	chris@mimsy.umd.edu	Path:	uunet!mimsy!chris