mmengel@cuuxb.ATT.COM (Marc W. Mengel) (04/10/89)
The reason this discussion has generated so much discussion is that we have a clash between the theory-of-computation folks and the pragmatic programmer types. (Please pardon the arbitrary distinction, many people belong to both classes) The theoretical folks are talking about Context Free Grammars i.e. grammars for sets of strings recognizable by a Pushdown Automata. This is a strict set-theoritical definition. CFG's *cannot* recognize C, as C requires a variable to be declared before it is used, and the Pumping lemma tells us that this is beyond the scope of a CFG when there are n>3 occurences of the variable name. 'nuff said. To the theoretical folks, syntax is the form of the code; requirements like "previously declared" are syntactic, for example, as are requirements like balanced parenthesis, etc. Semantics to them is the meaning associatecd with the text, i.e. "i = j;" has the effect of changing the value of the storage associated with the variable "i". The pragmatic programmers are referring to syntax and semantics based upon the tools they use to build compilers. To them, the stuff built by lex and yacc (or flex and bison, or whatever tools they use to build parsers) is the "syntactic code" and any stuff they add onto it (code generation, etc.) is "semantic code". If our parser generators were powerfull enough to really parse the languages involved, this definition would coincide fairly well with the theoretical one; the code generated for a given sequence of characters expresses the semantics of that sequence. However, we have languages that our parser generators cannot parse into the parse trees we want without help, and that cannot by their very nature solve some problems like checking if variables have been declared, etc. Therefore we get code that is really checking syntax, but that our pragmatic programmers have fallen into the habit of calling "semantic code", because it wasn't built by the parser generator. So basically, to sum up the long discussion here: 1) you cannot precisely recognize C with just a Context Free Grammar. 2) you can *extend* a yacc/lex parser by adding a symbol table that maintains typedef names and variable types so that it properly recognizes C *and* builds a parse tree you like. 3) you can build a plain yacc/lex parser that recognizes a superset of C, but it is incapable of distinguishing declared variables from undeclared variables, and therefore incapable of distinguishing between the various parse trees constructable from "word2 ( word1 );" which require that sort of information. -- Marc Mengel mmengel@cuuxb.att.com attmail!mmengel ...!{lll-crg|att}!cuuxb!mmengel
kramer@ruuinf.cs.ruu.nl (mark kramer) (04/13/89)
In article <2700@cuuxb.ATT.COM>, mmengel@cuuxb.ATT.COM (Marc W. Mengel) writes: | The reason this discussion has generated so much discussion | is that we have a clash between the theory-of-computation | folks and the pragmatic programmer types. .... | | The theoretical folks are talking about Context Free Grammars Surely, CFG's are not powerful enough for programming languages, but there are other types of grammars (type 0 or unrestricted grammars are as powerful as any conceivable computer program; Church's hypothesis for Turing Machines). | To the theoretical folks .... .... Semantics to them | is the meaning associatecd with the text, i.e. "i = j;" has the | effect of changing the value of the storage associated with the | variable "i". | | The pragmatic programmers are referring to syntax and semantics | based upon the tools they use to build compilers. .... | .... If our parser generators were powerfull enough | to really parse the languages involved, this definition would | coincide fairly well with the theoretical one; the code generated | for a given sequence of characters expresses the semantics of that | sequence. No, the code generated is just another string of symbols. The semantics is given by the execution on a target machine. Thus the definition above 'coincides fairly well' with the theoretical definition of *syntax*; code generation may be regarded as a purely *syntactical* transformation, so (substitute transform for check): | .... we get code | that is really checking syntax, but that our pragmatic programmers | have fallen into the habit of calling "semantic code", because | it wasn't built by the parser generator. A member(element) of {theoretic folks} INTERSECT {pragmatic programmers} Mark R. Kramer, Department of Computer Science, University of Utrecht Padualaan 14, P.O. Box 80.089, 3508 TB Utrecht, The Netherlands Phone: +31 - 30 - 531937 | UUCP : ...!hp4nl!ruuinf!kramer | INTERNET: kramer@cs.ruu.nl