wsmith@m.cs.uiuc.edu (03/16/89)
Should the language designers be making work for the language-oriented editor designers or should the language-oriented editor designers be making work for the language designers? I believe it should be the latter. The language oriented-editor designers should provide a technology base that may be taken as given by the language designer who may then design languages without ugly lexical or syntactic structures that make it impossible for the compiler to give sensible error messages. (For example, in the language I'm learning now, C++, if you forget to declare a class, it is a *syntax* error. Not misuse of identifier or undefined class, *syntax* error.) (Define language-oriented editor to by the hypertext object-oriented WYSIWYG formating editor or equivalent that you wish *you'd* written.) Bill Smith wsmith@cs.uiuc.edu uiucdcs!wsmith
gvcormack@watdragon.waterloo.edu (Gordon V. Cormack) (03/17/89)
In article <5200040@m.cs.uiuc.edu>, wsmith@m.cs.uiuc.edu writes: > > > (For example, in the language I'm learning now, C++, if you forget to > declare a class, it is a *syntax* error. Not misuse of identifier or > undefined class, *syntax* error.) > Don't blame C++. Blame C. C is not context-free parsable [actually I believe it is, but not with any sensible grammar]. -- Gordon V. Cormack CS Dept, University of Waterloo, Canada N2L 3G1 gvcormack@waterloo.EDU gvcormack@uwaterloo.CA gvcormac@water.BITNET
flaps@dgp.toronto.edu (Alan J Rosenthal) (03/18/89)
In article <9091@claris.com> hearn@claris.com (Bob Hearn) writes: >Well, the *grammar* defining C programs is certainly context-free! Just >look in the back of K&R. I'm looking in the back of K&R (first edition), and if I'm not mistaken I'm seeing context-sensitive stuff in the grammar. Suppose you're looking at a compound-statement and you've just seen the `{' opening it. Now suppose the next three tokens are `extern', "x", and `;'. This is produced by a declaration-list, expanding to declaration, which expands to decl-specifiers init-declarator-list[opt] ;. `extern' is an sc-specifier, and decl-specifiers -> sc-specifier decl_specifiers[opt]. Now is "x" a declarator (init-declarator-list -> init-declarator -> declarator -> identifier) or a typedef-name (decl-specifiers -> type-specifier -> typedef-name -> identifier)? It's ambiguous unless you use non-context-free information to tell you whether or not "x" was previously declared via typedef. ajr -- "The goto statement has been the focus of much of this controversy." -- Aho & Ullman, Principles of Compiler Design, A-W 1977, page 54.
hearn@claris.com (Bob Hearn) (03/18/89)
>> (For example, in the language I'm learning now, C++, if you forget to >> declare a class, it is a *syntax* error. Not misuse of identifier or >> undefined class, *syntax* error.) >> > >Don't blame C++. Blame C. > >C is not context-free parsable [actually I believe it is, but not >with any sensible grammar]. >-- >Gordon V. Cormack CS Dept, University of Waterloo, Canada N2L 3G1 >gvcormack@waterloo.EDU gvcormack@uwaterloo.CA gvcormac@water.BITNET Well, the *grammar* defining C programs is certainly context-free! Just look in the back of K&R. This doesn't mean that any grammatically correct program is semantically correct. I don't know of any non-trivial language with a context-free grammar for semantically correct programs. (I may be wrong here, but certainly this is correct for most languages.) Actually, I've always thought parts of the grammar (e.g. type declarations) were hacked to facilitate easy parsing. Many compilers, however, do consider some semantic errors syntax errors, because it's easier that way. As for the C++ syntax error described above, that's compiler-dependent. MPW C++ tells me that the identifier is not a valid type name. Bob Hearn hearn@claris.com
steve@oakhill.UUCP (steve) (03/21/89)
In article <12443@watdragon.waterloo.edu>, gvcormack@watdragon.waterloo.edu (Gordon V. Cormack) writes: > In article <5200040@m.cs.uiuc.edu>, wsmith@m.cs.uiuc.edu writes: > > > > > > (For example, in the language I'm learning now, C++, if you forget to > > declare a class, it is a *syntax* error. Not misuse of identifier or > > undefined class, *syntax* error.) > > > > Don't blame C++. Blame C. > > C is not context-free parsable [actually I believe it is, but not > with any sensible grammar]. > -- > Gordon V. Cormack CS Dept, University of Waterloo, Canada N2L 3G1 > gvcormack@waterloo.EDU gvcormack@uwaterloo.CA gvcormac@water.BITNET Don't blame C either. The real cause of this error is a conspiracy between the compiler writer and YACC. YACC produces this message for every syntax error unless you specifically design your grammer to catch a specific error and output a different message. The writer of your compiler probably used YACC, and then didn't design his grammer to do this in this case. He is probably not too much to blame though. To design a languge to catch every possible generic grammerical error would make the compiler so slow you might as well code in assembly. enough from this mooncalf - Steven ---------------------------------------------------------------------------- To flame someone for bad spelling or grammer is a discouragement to net discussion in a timely fashion. ---------------------------------------------------------------------------- These opinions aren't necessarily Motorola's or Remora's - but I'd like to think we share some common views. ---------------------------------------------------------------------------- Steven R Weintraub cs.utexas.edu!oakhill!devsys!steve Motorola Inc. Austin, Texas (512) 440-3023 (office) (512) 453-6953 (home) ----------------------------------------------------------------------------
hearn@claris.com (Bob Hearn) (03/21/89)
>---------------------------------------------------------------------------- >To flame someone for bad spelling or grammer is a discouragement to net >discussion in a timely fashion. >---------------------------------------------------------------------------- Uh, that's 'grammar'. :-) Bob Hearn hearn@claris.com
hearn@claris.com (Bob Hearn) (03/21/89)
In article <8903172225.AA05088@explorer.dgp.toronto.edu> flaps@dgp.toronto.edu (Alan J Rosenthal) writes: > >In article <9091@claris.com> hearn@claris.com (Bob Hearn) writes: >>Well, the *grammar* defining C programs is certainly context-free! Just >>look in the back of K&R. > >I'm looking in the back of K&R (first edition), and if I'm not mistaken I'm >seeing context-sensitive stuff in the grammar. > >Suppose you're looking at a compound-statement and you've just seen the `{' >opening it. Now suppose the next three tokens are `extern', "x", and `;'. >This is produced by a declaration-list, expanding to declaration, which expands >to decl-specifiers init-declarator-list[opt] ;. `extern' is an sc-specifier, >and decl-specifiers -> sc-specifier decl_specifiers[opt]. Now is "x" a >declarator (init-declarator-list -> init-declarator -> declarator -> >identifier) or a typedef-name (decl-specifiers -> type-specifier -> >typedef-name -> identifier)? > >It's ambiguous unless you use non-context-free information to tell you whether >or not "x" was previously declared via typedef. > >ajr > Enough of this bullshit! How many times do I have to say it??? The grammar in K&R is context-free. Do you know what a context-free grammar *is*? You're confusing the *class* of the grammar (ever heard of the Chomsky hierarchy?) with what categories of parsers will work on it. You're also confusing *syntactic correctness* with *semantic correctness*. This is a little more acceptable, since, as I was saying, most compilers do to make parsing easier. Most C compilers use LALR(1) parsers (stands for "look-ahead left to right, order 1), which are a subcategory of shift-reduce parsers. The 1 means they can look at the next 1 token on the input stream when deciding whether to shift or reduce. And you are quite correct in saying that, in this case, that one token is not enough; you *do* need symbol table information. But (1) No one says all context-free grammars have to be LALR(1) parsable without context information, (2) no one says that the grammar in the back of K&R is necessarily the one that's used in practice (grammars are often tweaked (without modifying the language generated!) no make them easier to parse), and (3) just because your compiler gives you a syntax error when you do something like that, that doesn't necessarily mean your program is syntactically incorrect. It just means that treating that kind of semantic error as a syntax error makes the language easier to parse. Now, before anyone else wants to tell me that K&R's grammar for C is not context-free, will you *please* go and look up the definition of context-free grammar??? Hint: anything that can be expressed in BNF is context-free!! Bob Hearn hearn@claris.com
djones@megatest.UUCP (Dave Jones) (03/21/89)
From article <9122@claris.com>, by hearn@claris.com (Bob Hearn): ... > > Now, before anyone else wants to tell me that K&R's grammar for C is not > context-free, will you *please* go and look up the definition of context-free > grammar??? Hint: anything that can be expressed in BNF is context-free!! > > Bob Hearn > hearn@claris.com > Bob, Bob, Bob. Chill out, Dude. You're going to blow a brain vain if you're not careful. Now then. The term "context-free", as Chomsky defined it is not very useful to us computer types, because no one ever, ever writes grammars any other way. To him, "context-free" meant that a rule could only have one symbol on the left-hand side. "Context-sensitive" meant that production rules could have strings of symbols on the left: a X b <= a x y b In the above, the a and b form a "context" for the reduction X <= xy. I haven't been following the discussion, but I suspect that the correspondents were not alluding to the unuseful Chomsky definitions, but instead were using the terms quite literally: "context-free" meaning "context-free". I think that is a perfectly reasonable thing to do, especially if one is unfamiliar with the confusing Chomsky definition. (I never did like the practice of "defining" something that already had a meaning in normal English.) The grammar as published in K&R (both additions) requires that the lexical analyzer must use left-context to determine whether or not a token is recognized as a type-name identifier or a regular identifier. In that sense, it is not "context-free". I seem to remember that I once figured out that it is possible to modify the grammar so that the scanner and parser do not have to know about left-context. But I think I remember that such a parser would require a two token-lookahead, making it impossible to use with most compiler-compilers.
norman@a.cs.okstate.edu (Norman Graham) (03/22/89)
From article <8903172225.AA05088@explorer.dgp.toronto.edu>, by flaps@dgp.toronto.edu (Alan J Rosenthal): > > In article <9091@claris.com> hearn@claris.com (Bob Hearn) writes: >>Well, the *grammar* defining C programs is certainly context-free! Just >>look in the back of K&R. > > I'm looking in the back of K&R (first edition), and if I'm not mistaken I'm > seeing context-sensitive stuff in the grammar. > > [Stuff deleted about ambiguous derivation sequence.] > > It's ambiguous unless you use non-context-free information to tell you whether > or not "x" was previously declared via typedef. I have to agree with Bob on this one, but for a different reason. The grammar in the back of K&R certainly is context-free... but it is also ambiguous. To paraphrase Aho & Ullman's "The Theory of Parsing, Translation, and Compiling, Volume I: Parsing", page 91: A grammar is Context-free if every production has a single nonterminal on the left and a sequence of 0 or more terminals or nonterminals on the right. It is obvious that the grammars in both editions of K&R fit this description. Thus they are context-free. Now on to ambiguity. According to Aho & Ullman (same as above), pagee 143: We say that a CFG G is ambiguous if there is at least one sentence w in L(G) for which there is more than one distinct derivation tree with frontier w. This is equivalent to saying that G is ambiguous if there is a sentence w in L(G) with two or more distinct leftmost (or rightmost) derivations. The classic example of ambiguity is the "dangling else" problem, as illustrated by the grammars in both editions of K&R. So we see that K&R's grammars are ambigous context-free grammars. That's no big deal--the ambiguity doesn't affect the language generated by the grammar. The problem comes in when compilers base the semantic meaning of a sentence on the derivation sequence the grammar uses to generate that sentence. For some reason :-) compiler writers like to choose deterministically between competing derivation sequences. Thus they either statically choose the proper derivation sequence (eg. the "else" always matches the last else-less "if") or they use run-time information to help them choose the "proper" derivation sequence. -- Norman Graham Oklahoma State University Internet: norman@a.cs.okstate.edu Computing and Information Sciences UUCP: {cbosgd, ihnp4, 219 Mathematical Sciences Building rutgers}!okstate!norman Stillwater, OK 74078-0599
hearn@claris.com (Bob Hearn) (03/23/89)
In article <2970@goofy.megatest.UUCP> djones@megatest.UUCP (Dave Jones) writes: >The term "context-free", as Chomsky defined it is not very useful >to us computer types, because no one ever, ever writes grammars any >other way. To him, "context-free" meant that a rule could only >have one symbol on the left-hand side. "Context-sensitive" meant that >production rules could have strings of symbols on the left: > > a X b <= a x y b > >In the above, the a and b form a "context" for the reduction >X <= xy. Precisely! But, gee... I've considered myself a computer type for, oh, 12 years or so, and it's useful to me. How you write grammars depends on what you're going to use the grammars for. I don't think anyone could write a compiler, for example, without being at least familiar with terms of formal language theory like "context-free" and "context-sensitive." > >I haven't been following the discussion, but I suspect that the >correspondents were not alluding to the unuseful Chomsky definitions, >but instead were using the terms quite literally: "context-free" meaning >"context-free". I think that is a perfectly reasonable thing to >do, especially if one is unfamiliar with the confusing Chomsky >definition. (I never did like the practice of "defining" something >that already had a meaning in normal English.) If you *had* been following the discussion, you'd have seen that my original posting was a followup to someone complaining about how screwed-up C's grammar was, that it wasn't context-free. When someone uses the words "context-free" and "grammar" in the same sentence, I assume he knows what he's talking about, and what he said was incorrect... K&R's grammar for C *is* CF, and not particularly screwed-up. I've received no end of mail and flames for that simple correction! As for "context-free" already having a meaning in normal English, I disagree. One could say that you need context to parse C, but if you use the phrase "context-free" in the realm of computer science, it's because you've heard it before, and you should find out what it means before using it. Bob Hearn hearn@claris.com P.S. I'll chill when I stop getting flamed.
hearn@claris.com (Bob Hearn) (03/23/89)
In article <12606@watdragon.waterloo.edu> gvcormack@watdragon.waterloo.edu (Gordon V. Cormack) writes: > >A context-free grammar G is a 4-tuple G = <N,T,P,S> where N is a finite >set of nonterminal symbols, T (the alphabet) is a finite set of terminal >symbols, P is a finite set of production rules of the form A -> X1 X2 X3 ... >where A is in N, and X1, X2, X3 are in Union(N,T). S in N is the Start symbol. > >The K&R grammar does not have a finite alphabet. >-- >Gordon V. Cormack CS Dept, University of Waterloo, Canada N2L 3G1 >gvcormack@waterloo.EDU gvcormack@uwaterloo.CA gvcormac@water.BITNET Oh. That's nice to know. From Scott, who is looking over my shoulder: "Why are even bothering to argue with these idiots?" Good question. Bob Hearn hearn@claris.com
djones@megatest.UUCP (Dave Jones) (03/23/89)
From article <9140@claris.com>, by hearn@claris.com (Bob Hearn): > In article <2970@goofy.megatest.UUCP> djones@megatest.UUCP (Dave Jones) writes: > > If you *had* been following the discussion, you'd have seen that my > original posting was a followup to someone complaining about how > screwed-up C's grammar was, that it wasn't context-free. When someone > uses the words "context-free" and "grammar" in the same sentence, I > assume he knows what he's talking about, ... What a silly assumption. I almost never assume that people who post to the net know what they are talking about. But what you really mean, I think, is that you want to use the term "context-free" exactly the way Chomsky did. Okay, that's your right. But don't be surprised or offended if someone doesn't know it. Just explain. I taught two terms of formal language theory during a two year stint as Visiting Associate Prof at Ohio University. I'm sure I brought the subject up, but only to try to avoid the kind of discussion that is going on here. We never talked about (Chomsky) context-sensitive grammars again. I suspect that for most students, that small piece of information got lost along the way. Context-sensitive grammars, (in the Chomsky sense), are just not useful. I really don't know why people continue to use his definitions. I just now pulled a new compiler book off the shelf, completely at random. It does pay lip-service to the Chomsky definitions. But of its 811 pages, it devotes less than one quarter of page 97 to context-sensitive grammars! First it gives a very brief definition, then the following: While context-sensitive grammars and type-0 grammars are more powerful than context-free grammars, they are also far less useful. The problem is that efficient parsers for these extended grammar classes do not exist, and without a parser there is no way to use a grammar definition to drive a compiler. [ Two sentences about parsers for context-free grammars omitted... ] Whenever we mention a grammar (without saying which kind), the grammar will be assumed to be context-free. > [...] > I've received no end of mail and flames for that simple correction! Take my advice. Quit worrying about the flames you receive. Flamers are small minded bigots, pedants, and idiots. Just ignore them. (Or better yet, *apologize* sincerely. God, that irritates them!) Dave J. P.S. Spring has sprung. I probably won't be around the net much in the next few weeks.
holt@turing.toronto.edu (Ric Holt) (03/23/89)
#>>A context-free grammar G is a 4-tuple G = <N,T,P,S> where N is a finite #>>set of nonterminal symbols, T (the alphabet) is a finite set of terminal #>>symbols, P is a finite set of production rules of the form A -> X1 X2 X3 ... #>>where A is in N, and X1, X2, X3 are in Union(N,T). S in N is the Start symbol. #>"Why are even bothering to argue with these idiots?" #>Good question. #> #>Bob Hearn #>hearn@claris.com In my modest opinion, Bob Hearn is doing a fine job of entertaining us with his emotional outbursts, but a considerable disservice with his claims. Likely, Cormack is correct and Hearn is wrong. Likely C's grammar is not Context Free. I believe Cormack (in his academic way) has proven Hearn wrong. I believe C's grammar has the appearance of a context free grammar but that it violates the mathematical def of context free. This question (is C's grammar context free) is important and we should get a clear answer to it rather than outbursts. Could someone please settle this definitely? Ric Holt
ge@phoibos.UUCP (Ge Weijers) (03/23/89)
From article <9122@claris.com>, by hearn@claris.com (Bob Hearn): >>I'm looking in the back of K&R (first edition), and if I'm not mistaken I'm >>seeing context-sensitive stuff in the grammar. > Enough of this bullshit! How many times do I have to say it??? The grammar > in K&R is context-free. Do you know what a context-free grammar *is*? Yeah, here is my regular C grammar (a regular expression, really): c : | letter c | digit c | /* etc. */ ; Okay, it matches a lot more than C, but all legal C programs are acceptable. So are all programs in all other programming languages. Useful! The grammar in K&R is in the same category. It accepts all legal C programs, plus a lot more. Because of current compiler technology we use an artificial separation between syntax and static semantics. Static semantics can be expressed in more powerful grammar types (2-level van Wijngaarden grammars etc.) Your note is correct, but so is the observation that some context-sensitive features are suggested by using different names for the same terminal, nl. identifier. K&R use non-terminal names judiciously to suggest more than they are really telling. C can not be described exactly by a CFG, so C is not a context-free language. Usually this type of confusion starts because one party is talking about a grammar, whilst the other party is really talking about (an exact grammar of) the language. Ge' Weijers, KUN Nijmegen, the Netherlands ge@cs.kun.nl
hearn@claris.com (Bob Hearn) (03/24/89)
In article <89Mar23.094504est.4328@turing.toronto.edu> holt@turing.toronto.edu (Ric Holt) writes: >In my modest opinion, Bob Hearn is doing a fine job of entertaining us >with his emotional outbursts, but a considerable disservice with his >claims. Likely, Cormack is correct and Hearn is wrong. Likely >C's grammar is not Context Free. OK, OK, I'm sorry. I'm new to the net, and I wasn't expecting to get flamed *quite* so much for one simple correction... it just became very annoying. I'll try to restrict my responses to computer science from now on. >I believe Cormack (in his academic way) has proven Hearn wrong. I believe >C's grammar has the appearance of a context free grammar but that it >violates the mathematical def of context free. > >This question (is C's grammar context free) is >important and we should get a clear answer to it rather than outbursts. >Could someone please settle this definitely? > Ric Holt Good idea. Although I think this whole argument is founded on misunderstandings. It all depends on what you mean by "context-free grammar." If you follow the technical definition, then there is no question that C's grammar (as specified in K&R) is context-free. As for Cormack's claim that C does not have a finite set of terminals, I can't say that I really understand what he means. Perhaps he is referring to the fact that there is an infinite number of identifiers? Well, if we restrict ourselves to K&R, then the terminal in question is simply (identifier), and it does not decompose further. If we want to make this grammar more concrete and have it generate the actual language that a compiler accepts, then we would have identifiers producing strings over the char. set. In this case, the char. set is the set of terminals, and it is certainly finite. BTW, the real argument here is not whether C's grammar satisfies some mathematical definition. It does, but it is not a particularly relevant piece of information. The only reason I claimed it *did* was that someone, a long time ago, made some comment on how screwed up C's grammar was, and that it wasn't context-free. Well, as I don't think C's grammar is particularly screwed up relative toother languages' grammars, I responded. I happen to like C, and I don't like C-bashing. Guess I learned my lesson, huh? Bob Hearn hearn@claris.com
stevev@tekchips.LABS.TEK.COM (Steve Vegdahl) (03/24/89)
In article <9122@claris.com>, hearn@claris.com (Bob Hearn) writes: > Enough of this bullshit! How many times do I have to say it??? The grammar > in K&R is context-free. ... > Now, before anyone else wants to tell me that K&R's grammar for C is not > context-free, will you *please* go and look up the definition of context-free > grammar??? Hint: anything that can be expressed in BNF is context-free!! I detect that we are picking nits here. I think that we all know what is meant by the claim that the K&R grammar is not context-free. It has been noted that the K&R grammar is *ambiguous*; hence, its specification of how programs should be parsed is *ill-defined*. Additional "context-sensitive" specifications are added to the K&R grammar in the form of English prose (in the second edition, anyway; the first edition seems to ignore the ambiguity problem). Effectively, the grammar in K&R is "BNF + English prose". Taking this view, the K&R grammar is not context-free. Steve Vegdahl Computer Research Lab Tektronix Labs Beaverton, Oregon
gudeman@arizona.edu (David Gudeman) (03/24/89)
In article <9165@claris.com> hearn@claris.com (Bob Hearn) writes: >Good idea. Although I think this whole argument is founded on >misunderstandings. It all depends on what you mean by "context-free >grammar." If you follow the technical definition, then there is no >question that C's grammar (as specified in K&R) is context-free. I agree, that this is founded on misunderstandings, but I think you've got the wrong misunderstanding. I believe that C's grammar is _not_, in fact context free, though the grammar in K&R is. How can this be? I'm glad you asked. The grammar given in K&R isn't technically correct. Don't take offense, I'm not making a judgement on this practice, just pointing it out. One inaccuracy is in the production typedef-name : identifier I don't know if there are others. "typedef-name" occurs in the production type-specifier: ... | typedef-name and a mechanical reading of this implies that _any_ identifier can occur as a type-specifier. This is not the case. In fact, an identifier can only be a type-specifier if it has previously been declared with a typedef. I can't prove it, but I believe you need a context-sensitive grammar to specify this. So both claims made in this argument are correct: (1) the grammar of C is not context free and (2) the grammar given in K&R is context free
hearn@claris.com (Bob Hearn) (03/24/89)
In article <9861@megaron.arizona.edu> gudeman@arizona.edu (David Gudeman) writes: >I agree, that this is founded on misunderstandings, but I think you've >got the wrong misunderstanding. I believe that C's grammar is _not_, >in fact context free, though the grammar in K&R is. How can this be? >I'm glad you asked. The grammar given in K&R isn't technically >correct. Don't take offense, I'm not making a judgement on this >practice, just pointing it out. > >One inaccuracy is in the production > > typedef-name : identifier > >I don't know if there are others. "typedef-name" occurs in the production > > type-specifier: ... | typedef-name > >and a mechanical reading of this implies that _any_ identifier can >occur as a type-specifier. This is not the case. In fact, an >identifier can only be a type-specifier if it has previously been >declared with a typedef. I can't prove it, but I believe you need a >context-sensitive grammar to specify this. > Yes, this is another big misunderstanding in this discussion... I would say that the *grammar* stipulates that a type-specifier can be an identifier, but the *semantics* specifies that this results in a valid program only when the identifier is a valid type name. This is really the same problem as { . . x = 1; . . } which is correct only if x has been previously declared. I think here one would always call this an undeclared identifier error, a type of semantic error, rather than a syntax error. And you're quite right that it takes a context-sensitive grammar to generate the set of VALID C programs in this sense! My point is that this is true for just about any language you can name, and there is nothing particularly odd about this aspect of C. In any case, this discussion is getting old... I vote we talk about something new. This is comp.lang.misc, not comp.lang.C.grammar, right? :-) So, anyone designed any good languages lately? Bob Hearn hearn@claris.com
gudeman@arizona.edu (David Gudeman) (03/24/89)
In article <9170@claris.com> hearn@claris.com (Bob Hearn) writes: ]In article <9861@megaron.arizona.edu> gudeman@arizona.edu (David Gudeman) writes: ]> ]>One inaccuracy is in the production ]> ]> typedef-name : identifier ]> ]>I don't know if there are others. "typedef-name" occurs in the production ]> ]> type-specifier: ... | typedef-name ]> ]>and a mechanical reading of this implies that _any_ identifier can ]>occur as a type-specifier. This is not the case. In fact, an ]>identifier can only be a type-specifier if it has previously been ]>declared with a typedef. ] ]Yes, this is another big misunderstanding in this discussion... I would ]say that the *grammar* stipulates that a type-specifier can be an ]identifier, but the *semantics* specifies that this results in a valid ]program only when the identifier is a valid type name. This is really ]the same problem as [undeclared identifier] It sounds as though you are defining the semantics of a language as any part that cannot be described with a context free grammar. Of course this makes you trivially correct, but does not affect the opinions of people who criticize C based on a different definition of syntax and semantics. Also, you cannot treat the C grammar as a formal grammar that is only meant to accept strings from a formal language. The grammar is intended to show syntactic categories with semantic attachments. The problem of undeclared identifiers is common to many programming languages, but the typedef problem is more serious. An identifier is always parsed into category "identifier" regardless of whether it has been declared. However, typedefs are ambigous. For example: typedef int foo; int i; void f() { foo *i; ... } The K&R grammar is ambigous on how to parse this, while any C programmer would expect "i" to be a local pointer to an int. Once an identifier has been declared as a type, it should no longer be an identifier, but rather have a different syntactic class. No, this does not affect the language accepted by the grammar. But it does affect the syntactic category of the whole string "foo *i;". Because of the differences in syntactic categories, most people would say that static foo x; should be a syntax error if foo has not been typedefed. You can argue that, but like I said before, you are only defining terms to fit your views. >In any case, this discussion is getting old... I vote we talk about something >new. This is comp.lang.misc, not comp.lang.C.grammar, right? :-) The following should not be taken as a flame, I'm just making an observation: If you want to stop the discussion, you don't have to reply. It doesn't work to write an article and then suggest that no one respond to it after you have gotten in the last word.
norvell@csri.toronto.edu (Theodore Stevens Norvell) (03/25/89)
In article <12606@watdragon.waterloo.edu> gvcormack@watdragon.waterloo.edu (Gordon V. Cormack) writes: >A context-free grammar G is a 4-tuple G = <N,T,P,S> where N is a finite >set of nonterminal symbols, T (the alphabet) is a finite set of terminal >symbols, P is a finite set of production rules of the form A -> X1 X2 X3 ... >where A is in N, and X1, X2, X3 are in Union(N,T). S in N is the Start symbol. > >The K&R grammar does not have a finite alphabet. I'm sure that Gordon Cormack has something in mind here because I know he's a smart guy, but I'm darned if I can figure out what it is. In any case, I'm sure that the K&R grammar only uses a finite subset of its alphabet, because otherwise it wouldn't fit in my desk. So Gordon, if your still there after being insulted in such an uncalled for way, please enlighten us. As I recall, the grammar in K&R [1978] has a single terminal "identifier" which includes all identifiers, a nonterminal called "typedef_name" defined by the production typedef_name --> identifier This leads to an ambiguity because phrases like t * i ; can be understood to be either declarations or statements. But ambiguous grammars have been with us a long time. People even use parser generators that accept ambiguous grammars! In fact there are other ambiguities in the grammar (e.g. the dangling else). Ambiguities don't change the set of strings admitted by a grammar they merely admit some strings in more than one way. The real question is what do you want to call syntax and what do you want to call static semantics. In the absence of any definitive definition, this is a matter of taste and convenience to the implementor. Is the following (complete) compilation unit syntactically correct? f() { t * i ; int a; } The grammar says yes; most compliers say "syntax error". If we change the grammar to make typedef_name a terminal and remove its only production, then the grammar agrees with the compilers. The grammar is of course still context free, but the ambiguity is gone. Ric Holt raises the separate question of whether the C language is context free. The syntax of C certainly is. The problem with C is that it is not "feedback free" in the following sense. The parsing of C depends on the lexing of C (this is only natural), but the lexing depends on the parsing. Of course, with the original K&R grammar, C _is_ feedback free which is no doubt why Ritchie wrote the grammar that way. Summary: -The K&R [1978] grammar is context free, but describes the wrong language. -The C language syntax (abstracted from lexing and static semantics) is also context free. -The usual implementations and most useful definitions of C language are not "feedback free". -We're still in the dark about infinite alphabets. Theo Norvell (The term "feedback free" is used in a completely different sense by Holt. My apologies for stealing it. By the time I realized I was using it differently, I liked it too much to give it back. In Holt's sense, C is feedback free. See the Turing Programming Language Design and Definition for his definition. In fact, see it anyway, if you're interested in design or definition.)
norman@a.cs.okstate.edu (Norman Graham) (03/25/89)
From article <9861@megaron.arizona.edu>, by gudeman@arizona.edu (David Gudeman): > [stuff deleted about how the grammar specifies that > any identifier can serve as a type-specifier] > > So both claims made in this argument are correct: > > (1) the grammar of C is not context free > and > (2) the grammar given in K&R is context free This is an interesting point (ie. one I hadn't thought of before :-). In theory, a grammar is a generator for strings in a language, not a recognizer for a language (eg. various forms of automata). So the grammar in K&R is context-free, but it doesn't generate the C language--in addition to valid C programs it also generates invalid C programs. Of course, when this grammar is used as the basis of a compiler, the semantic analyzer should detect these invalid programs. This makes sense to me... how about anyone else out there? -- Norman Graham Oklahoma State University Internet: norman@a.cs.okstate.edu Computing and Information Sciences UUCP: {cbosgd, ihnp4, 219 Mathematical Sciences Building rutgers}!okstate!norman Stillwater, OK 74078-0599
pa1162@sdcc15.ucsd.edu (pa1162) (03/27/89)
Bob, I'm only a student, so don't think I have pretensions to telling you folks you're business, but the grammar in K&R probably isn't even the grammar for C... as I think you or someone else mentioned... it gets tweaked a bit if you want to try to write a compiler from it... so worry about whether it is context-free is immaterial. Why don't we all open our K&R (old edition) to page 214 and read the note at the heading of section 18 "Syntax Summary". It says that the syntax summary given isn't meant as an exact statement of the language. This isn't meant to be a flame... it's an attempt at a distinction because I think you and whoever are arguing at cross-purposes. The discussion is either about context-freeness and grammars or about non-precise syntax summaries and it's really not clear which one. As I said I am yet a student... but I can see the structure of an argument when it hits me in the face. I recommend somone clarify the and separate the matters at hand so that all this "stuff" can be separated. -Thomas Kammeyer (A student who likes to stick his nose into discussions)
john@frog.UUCP (John Woods) (03/27/89)
In article <9165@claris.com>, hearn@claris.com (Bob Hearn) writes: > >C's grammar is not Context Free. > grammar." If you follow the technical definition, then there is no > question that C's grammar (as specified in K&R) is context-free. I think the crux of the argument of the non-context-free crowd is that the grammar in K&R does NOT actually specify the C language by itself, not as a grammar. It relies on the trick of telling the lexer about typedefs, which is a form of back-door context. Consider: if you wrote a YACC specification of C according to the context-free grammar, but omitted any kind of action clauses (perhaps you want to turn on debugging and watch the state machine play, a fine entertainment for a rainy day), it would NOT be able to correctly parse a C program with typedefs (unless, of course, you hid a "real" C grammar (with actions and the ever present back-door gimmick) in the lexer itself). When the pro-context-free crowd's argument is presented as "So what; it is close enough, and it works", the argument in convincing. When it is presented as "Yeah, well, your mother wears army boots!", it isn't. -- John Woods, Charles River Data Systems, Framingham MA, (508) 626-1101 ...!decvax!frog!john, john@frog.UUCP, ...!mit-eddie!jfw, jfw@eddie.mit.edu Remainder Khomeini!
clive@ixi.UUCP (Clive) (03/29/89)
In article <8903250115.AA08564@yorkville.csri.toronto.edu>, norvell@csri.toronto.edu (Theodore Stevens Norvell) writes: > The real question is what do you want to call syntax and what do you > want to call static semantics. In the absence of any definitive definition, > this is a matter of taste and convenience to the implementor. For a new approach to this, look at: "The Revised Report on the Algorithmic Language Algol 68". - van Wijngaarden et al This approached this problem by defining the grammar using another (context-free) grammar (hence the term 'two-level grammar'). This meant that things like not using a variable before it was defined were specified in the *grammar* - very nifty. To do this, the second level grammar had a non-terminal called NEST which expanded to sequences like "int a char b" and so on. Productions of the first level grammar had NESTs scattered in them (translating to C): TYPE NEST variable : TYPE NEST IDENTIFIER, where TYPE IDENTIFIER declared in NEST . (comma means "and"). The production where TYPE IDENTIFIER declared in NEST expands to "empty" if the variable was declared, and to a nonterminal with no production rule if it isn't - thus the grammar can't generate a program with an undeclared variable. (Algol 68 had user-definable types, like C). They did all kinds of things with this - type checking of assignments and operator arguments, and context sensitive automatic type conversion come to mind. Pity it was so hard to parse :-) -- Clive D.W. Feather clive@ixi.uucp IXI Limited ...!mcvax!ukc!acorn!ixi!clive (untested) +44 223 462 131
norman@a.cs.okstate.edu (Norman Graham) (04/01/89)
From article <125@ixi.UUCP>, by clive@ixi.UUCP (Clive): > This approached this problem by defining the grammar using another (context-free) > grammar (hence the term 'two-level grammar'). This meant that things like not using > a variable before it was defined were specified in the *grammar* - very nifty. Of course these two-level grammars are *substantially* more powerful than context-free grammars. Two-level grammars are capable of generating recursively-enumerable sets (ie. those things that can be recognized only by something with the power of a Turing machine). This class of languages is much larger than both the Context-Free and Context-Sensitive languages. However, Two-level grammars can also be used to generate the Context-Free and Context-Sensitive languages that we use for programming. Two-level grammars provide a compact and elegant (some would say 'cryptic' :-) notation for the description of programming language syntax without the need to escape to a natural language. Personally, I would love to see more language designers take advantage of this tool. -- Norman Graham Oklahoma State University Internet: norman@a.cs.okstate.edu Computing and Information Sciences UUCP: {cbosgd, rutgers} 219 Mathematical Sciences Building !okstate!norman Stillwater, OK 74078-0599
chris@mimsy.UUCP (Chris Torek) (04/03/89)
In article <1163@frog.UUCP> john@frog.UUCP (John Woods) writes: >I think the crux of the argument of the non-context-free crowd is that >the grammar in K&R does NOT actually specify the C language by itself, not >as a grammar. It relies on the trick of telling the lexer about typedefs, >which is a form of back-door context. For practical parsing (using, e.g., yacc), yes; for CF-ness, no. The *real* crux of the argument is that while you can write a decent, simple, no-feedback-through-lexer CFG that accepts all C programs, you can *not* write a decent, simple, no-feedback LALR(1) parser to do this, because the simple CFG is ambiguous. Those who say `C does not have a CFG' are really saying `C does not have a trivially yacc-able grammar', which is true (but, as I keep saying, really means only that we need a better parser generator than yacc). (And no, it does not take a nondeterministic automaton to parse C. The little back-door trick would work wonderfully well if only you could know, each time a yacc parser examines a token, which rule yacc was attempting. When it tries `typedef-name: identifier', the lexer should answer `identifier' if and only if the identifier happens to be a typedef. Then make sure that yacc tries this rule before trying `primary: identifier' [this part is easy] and voila! you have a parser. Of course, yacc actually only calls the parser once for both rules; you would need a way to change the token on the fly. A better gimmick---less expensive in compute power, at least---would be to associate a function (predicate) with each rule, and allow the predicate to abort the rule. You then attach an `identifier-really-is- typedef' predicate to the `typedef: identifier' rule, making the parser pass over this resolution and choose instead the `primary: identifier' rule. Indeed, the same trick would allow one to reject undeclared identifiers, making `program foo; begin bar := 1 end.' a syntax error instead of a semantic one---not that this is a good thing to do!) -- In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 7163) Domain: chris@mimsy.umd.edu Path: uunet!mimsy!chris
chris@mimsy.UUCP (Chris Torek) (04/04/89)
In article <16674@mimsy.UUCP> I wrote: >(And no, it does not take a nondeterministic automaton to parse C. ... [gimmick A deleted; gimmick B:] >associate a function (predicate) with each rule, and allow the >predicate to abort the rule. gvcormack points out (as I conveniently forgot to mention) that this is actually an affix grammar, and affix grammars are not context free. I think this is a good point for a summary. - There is no context free grammar that recognises only valid C programs (but this is true of many languages). - There is a trivial context free grammar that recognises all valid C programs, along with a number of invalid ones, but it is ambiguous. There is probably no unambiguous CFG for C (consider arbitrarily many parentheses around an id: is it a cast or a value?). - There is a trivial affix grammar (gimmick B above) that recognises all valid C programs, along with a number of invalid ones, which is not ambiguous and can be handled with a modified yacc (you would have to augment yacc's output tables and parser to retain both reductions and to know what to call to select which is to occur). and of course, - There is always the wrong way. :-) -- In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 7163) Domain: chris@mimsy.umd.edu Path: uunet!mimsy!chris