[comp.lang.misc] syntax and semantics

mmengel@cuuxb.ATT.COM (Marc W. Mengel) (04/10/89)

	The reason this discussion has generated so much discussion
	is that we have a clash between the theory-of-computation
	folks and the pragmatic programmer types. (Please pardon the
	arbitrary distinction, many people belong to both classes)

	The theoretical folks are talking about Context Free Grammars
	i.e. grammars for sets of strings recognizable by a Pushdown 
	Automata.  This is a strict set-theoritical definition.  CFG's
	*cannot* recognize C, as C requires a variable to be declared
	before it is used, and the Pumping lemma tells us that this
	is beyond the scope of a CFG when there are n>3 occurences
	of the variable name.  'nuff said.

	To the theoretical folks, syntax is the form of the code; requirements
	like "previously declared" are syntactic, for example, as are
	requirements like balanced parenthesis, etc.  Semantics to them
	is the meaning associatecd with the text, i.e. "i = j;" has the
	effect of changing the value of the storage associated with the
	variable "i".

	The pragmatic programmers are referring to syntax and semantics
	based upon the tools they use to build compilers.  To them,
	the stuff built by lex and yacc (or flex and bison, or whatever
	tools they use to build parsers) is the "syntactic code" and
	any stuff they add onto it (code generation, etc.) is 
	"semantic code".  If our parser generators were powerfull enough
	to really parse the languages involved, this definition would
	coincide fairly well with the theoretical one; the code generated
	for a given sequence of characters expresses the semantics of that
	sequence.

	However, we have languages that our parser generators cannot
	parse into the parse trees we want without help, and that
	cannot by their very nature solve some problems like checking
	if variables have been declared, etc.  Therefore we get code
	that is really checking syntax, but that our pragmatic programmers
	have fallen into the habit of calling "semantic code", because
	it wasn't built by the parser generator.

	So basically, to sum up the long discussion here:

	1) you cannot precisely recognize C with just a Context Free
	   Grammar.

	2) you can *extend* a yacc/lex parser by adding a symbol
	   table that maintains typedef names and variable types
	   so that it properly recognizes C *and* builds a parse
	   tree you like.

	3) you can build a plain yacc/lex parser that recognizes a
	   superset of C, but it is incapable of distinguishing
	   declared variables from undeclared variables, and therefore
	   incapable of distinguishing between the various parse trees 
	   constructable from "word2 ( word1 );" which require that 
	   sort of information.
	
-- 
 Marc Mengel					mmengel@cuuxb.att.com
 						attmail!mmengel
 						...!{lll-crg|att}!cuuxb!mmengel

kramer@ruuinf.cs.ruu.nl (mark kramer) (04/13/89)

In article <2700@cuuxb.ATT.COM>, mmengel@cuuxb.ATT.COM (Marc W. Mengel) writes:
| The reason this discussion has generated so much discussion
| is that we have a clash between the theory-of-computation
| folks and the pragmatic programmer types. ....
| 
| The theoretical folks are talking about Context Free Grammars
Surely, CFG's are not powerful enough for programming languages, but
there are other types of grammars (type 0 or unrestricted grammars are as
powerful as any conceivable computer program; Church's hypothesis for
Turing Machines).

| To the theoretical folks ....            .... Semantics to them
| is the meaning associatecd with the text, i.e. "i = j;" has the
| effect of changing the value of the storage associated with the
| variable "i".
| 
| The pragmatic programmers are referring to syntax and semantics
| based upon the tools they use to build compilers.  ....
| ....  If our parser generators were powerfull enough
| to really parse the languages involved, this definition would
| coincide fairly well with the theoretical one; the code generated
| for a given sequence of characters expresses the semantics of that
| sequence.
No, the code generated is just another string of symbols.
The semantics is given by the execution on a target machine.
Thus the definition above 'coincides fairly well' with the theoretical
definition of *syntax*; code generation may be regarded as a purely
*syntactical* transformation, so (substitute transform for check):
|                                             .... we get code
| that is really checking syntax, but that our pragmatic programmers
| have fallen into the habit of calling "semantic code", because
| it wasn't built by the parser generator.

A member(element) of {theoretic folks} INTERSECT {pragmatic programmers}

Mark R. Kramer, Department of Computer Science, University of Utrecht
Padualaan 14,   P.O. Box 80.089,   3508 TB  Utrecht,  The Netherlands
Phone: +31 - 30 - 531937        |  UUCP    : ...!hp4nl!ruuinf!kramer
                                |  INTERNET: kramer@cs.ruu.nl