gvcormack@watdragon.waterloo.edu (Gordon V. Cormack) (03/25/89)
Is C context-free? The question is too ill-defined to support the discussion that has been going on in this medium. I think the ground rules need to be established before anything useful can come of it. Since I have been misquoted as saying "C is not context-free", I will repeat my original statement that seems to have started this whole thing. > Don't blame C++. Blame C. > > C is not context-free parsable [actually I believe it is, but not > with any sensible grammar]. > -- > Gordon V. Cormack CS Dept, University of Waterloo, Canada N2L 3G1 > gvcormack@waterloo.EDU gvcormack@uwaterloo.CA gvcormac@water.BITNET So what does it mean for something to be context-free? A context-free language is a set of strings (a formal language) that is generated by a context-free grammar. It has been shown that the context-free languages are exactly those languages that can be recognised by a single-stack nondeterministic automaton. There is a well known theorem, known as the "pumping lemma" that can be used to show whether a given language is context-free, without having to write a context-free grammar (or prove non-existence of same). When we say "is C context-free", I suppose we can consider the set of all ascii strings that are correct C programs to be the formal language in question. Under this assumption, C is not context free, and neither is any useful programming language. We qualify what we say to "is the syntax of C context-free?". Then what is syntax? If we consider syntax to mean some context-free language that contains all valid C programs, then C is context-free, and so are all programming languages. Here is a context-free grammar for all programming languages: S -> any_character | S any_character any_character -> a | b | c | ... | 0 | 1 | ... | , | & | ... I don't really believe this fits our working definition of "syntax". We therefore have to agree on what constitutes a syntactically valid C program, and what constitutes an invalid one. Then we can start arguing about whether the set of syntactically valid C programs is a context-free language or not. (The original posting to which i replied was complaining that a missing type definition should not be reported as a syntax error.) In any event, in order to show that such a language is context-free, it is necessary to give a CF grammar that generates all strings in the language. The language is a set of ascii strings, not a set of meta-symbols like Identifier, Type_identifier, etc. With Pascal and other langauges that I say have context-free syntax, this distinction is moot, because one can trivially write a context-free grammar to describe each of the meta-symbols, and by substitution, write a context-free grammar to describe the language. It happens that some things, like white space, are a bit tedious to describe, but it is possible [cf. Salomon and Cormack, "Scannerless parsing of programming languages, Proc. Sigplan 89]. With C, the "language" represented by Type_identifier is not only not context-free, it is not a language because it does not represent a fixed set of strings. (The set of Type_identifiers changes throughout the program). And it is my opinion that there are C programs for which we cannot answer "is this program syntactically valid?" without knowing whether a particular substring represents a Type_identifier or not. If you allow meta-symbols with arbitrary definitions, then there is an absurd context-free grammar that "describes" any programming language: S -> first_valid_character | S next_valid_character where next_valid_character is a meta-symbol that at changes its definition every time it is used, to match only valid upcoming input characters. Furthermore, I said "context-free parsable", not just "context-free". I won't claim this has a formal meaning, but I certainly meant to imply it to mean a unique parse in which the parse tree reflects the precedence, associativity, and nesting of language constructs. My qualificiation "any sensible grammar" was meant to disqualify grammars that disregard this. Conclusions: Detailed technical arguments are useless without a testable hypothesis. We need to agree on a definition of "C syntax" before we can proceed. Once we have agreed on that we can try to answer the question Is C's syntax context-free? -- Gordon V. Cormack CS Dept, University of Waterloo, Canada N2L 3G1 gvcormack@waterloo.EDU gvcormack@uwaterloo.CA gvcormac@water.BITNET