mmengel@cuuxb.ATT.COM (Marc W. Mengel) (04/10/89)
The reason this discussion has generated so much discussion
is that we have a clash between the theory-of-computation
folks and the pragmatic programmer types. (Please pardon the
arbitrary distinction, many people belong to both classes)
The theoretical folks are talking about Context Free Grammars
i.e. grammars for sets of strings recognizable by a Pushdown
Automata. This is a strict set-theoritical definition. CFG's
*cannot* recognize C, as C requires a variable to be declared
before it is used, and the Pumping lemma tells us that this
is beyond the scope of a CFG when there are n>3 occurences
of the variable name. 'nuff said.
To the theoretical folks, syntax is the form of the code; requirements
like "previously declared" are syntactic, for example, as are
requirements like balanced parenthesis, etc. Semantics to them
is the meaning associatecd with the text, i.e. "i = j;" has the
effect of changing the value of the storage associated with the
variable "i".
The pragmatic programmers are referring to syntax and semantics
based upon the tools they use to build compilers. To them,
the stuff built by lex and yacc (or flex and bison, or whatever
tools they use to build parsers) is the "syntactic code" and
any stuff they add onto it (code generation, etc.) is
"semantic code". If our parser generators were powerfull enough
to really parse the languages involved, this definition would
coincide fairly well with the theoretical one; the code generated
for a given sequence of characters expresses the semantics of that
sequence.
However, we have languages that our parser generators cannot
parse into the parse trees we want without help, and that
cannot by their very nature solve some problems like checking
if variables have been declared, etc. Therefore we get code
that is really checking syntax, but that our pragmatic programmers
have fallen into the habit of calling "semantic code", because
it wasn't built by the parser generator.
So basically, to sum up the long discussion here:
1) you cannot precisely recognize C with just a Context Free
Grammar.
2) you can *extend* a yacc/lex parser by adding a symbol
table that maintains typedef names and variable types
so that it properly recognizes C *and* builds a parse
tree you like.
3) you can build a plain yacc/lex parser that recognizes a
superset of C, but it is incapable of distinguishing
declared variables from undeclared variables, and therefore
incapable of distinguishing between the various parse trees
constructable from "word2 ( word1 );" which require that
sort of information.
--
Marc Mengel mmengel@cuuxb.att.com
attmail!mmengel
...!{lll-crg|att}!cuuxb!mmengelkramer@ruuinf.cs.ruu.nl (mark kramer) (04/13/89)
In article <2700@cuuxb.ATT.COM>, mmengel@cuuxb.ATT.COM (Marc W. Mengel) writes: | The reason this discussion has generated so much discussion | is that we have a clash between the theory-of-computation | folks and the pragmatic programmer types. .... | | The theoretical folks are talking about Context Free Grammars Surely, CFG's are not powerful enough for programming languages, but there are other types of grammars (type 0 or unrestricted grammars are as powerful as any conceivable computer program; Church's hypothesis for Turing Machines). | To the theoretical folks .... .... Semantics to them | is the meaning associatecd with the text, i.e. "i = j;" has the | effect of changing the value of the storage associated with the | variable "i". | | The pragmatic programmers are referring to syntax and semantics | based upon the tools they use to build compilers. .... | .... If our parser generators were powerfull enough | to really parse the languages involved, this definition would | coincide fairly well with the theoretical one; the code generated | for a given sequence of characters expresses the semantics of that | sequence. No, the code generated is just another string of symbols. The semantics is given by the execution on a target machine. Thus the definition above 'coincides fairly well' with the theoretical definition of *syntax*; code generation may be regarded as a purely *syntactical* transformation, so (substitute transform for check): | .... we get code | that is really checking syntax, but that our pragmatic programmers | have fallen into the habit of calling "semantic code", because | it wasn't built by the parser generator. A member(element) of {theoretic folks} INTERSECT {pragmatic programmers} Mark R. Kramer, Department of Computer Science, University of Utrecht Padualaan 14, P.O. Box 80.089, 3508 TB Utrecht, The Netherlands Phone: +31 - 30 - 531937 | UUCP : ...!hp4nl!ruuinf!kramer | INTERNET: kramer@cs.ruu.nl