[comp.lang.misc] Complexity of syntax

diamond@tkou02.enet.dec.com ("diamond@tkovoa") (12/06/90)

Most of the syntax of C is not particularly difficult.
However, there is some pretty solid evidence about the declaration syntax.
For how many languages have people written, and recommended to others to
use, programs that translate declarations to English and vice-versa?
 
Also I vaguely recall some customer of the Unix operating system and C
language having a small problem, because one of their programmers put a
"break" statement in a compound statement, but it was the wrong kind of
compound statement so the "break" exited from a larger one instead.
 
Other syntactic structures that are equally complex could provide better
detection and recovery from user errors.
 
--
Norman Diamond       diamond@tkov50.enet.dec.com
If this were the company's opinion, I wouldn't be allowed to post it.

thornley@cs.umn.edu (David H. Thornley) (12/09/90)

In article <9012061208.AA08577@decpa.pa.dec.com> diamond@tkou02.enet.dec.com ("diamond@tkovoa") writes:
>Most of the syntax of C is not particularly difficult.
>However, there is some pretty solid evidence about the declaration syntax.
>For how many languages have people written, and recommended to others to
>use, programs that translate declarations to English and vice-versa?
> 
How many languages support arrays of pointers to functions returning
pointers to functions that return pointers to integers?  If you limit
yourself to data structures that would work in FORTRAN or Pascal, you
have no need for such programs.

>Also I vaguely recall some customer of the Unix operating system and C
>language having a small problem, because one of their programmers put a
>"break" statement in a compound statement, but it was the wrong kind of
>compound statement so the "break" exited from a larger one instead.
> 
AT&T, I think.  (Some phone company, anyway.)  It wasn't a "small"
problem either :-).  From what I saw of it, it looked like the sort of
mistake that could happen with any language, and was due to
misunderstanding the language and failing to test the part that failed.

I assume you've heard about DO 10 I = 1. 10 (parsed as DO10I = 1.10)
ad nauseum.  Weinberg, in _The_Psychology_of_Computer_Programming_,
had a nice story about a FORTRAN program where a variable in a massive
simulation program had been spelled two different ways in the program.
The bug was found after the simulations had been conducted, a book
written, and a couple of systems designed based on that information.

>Other syntactic structures that are equally complex could provide better
>detection and recovery from user errors.
> 
No doubt.  How much better?  Enough to justify moving *lots* of software
from C?  

How about FORTRAN?  Actually, simply requiring *all* variables to be
declared would solve a *lot* of problems, and break a *lot* of existing
code.

DHT

Bruce.Hoult@bbs.actrix.gen.nz (12/10/90)

In article <1990Dec9.013923.14456@cs.umn.edu> thornley@cs.umn.edu (David H. Thornley) writes:

>>Most of the syntax of C is not particularly difficult.
>>However, there is some pretty solid evidence about the declaration syntax.
>>For how many languages have people written, and recommended to others to
>>use, programs that translate declarations to English and vice-versa?
>> 
>How many languages support arrays of pointers to functions returning
>pointers to functions that return pointers to integers?  If you limit
>yourself to data structures that would work in FORTRAN or Pascal, you
>have no need for such programs.

Pascal only narrowly misses out on being able to do this, and it's cousins
Modula-2 and Oberon can both do it far more readably than can C.
-- 
Bruce.Hoult@bbs.actrix.gen.nz   Twisted pair: +64 4 772 116
BIX: brucehoult                 Last Resort:  PO Box 4145 Wellington, NZ

smryan@garth.UUCP (Steven Ryan) (12/18/90)

>How many languages support arrays of pointers to functions returning
>pointers to functions that return pointers to integers?  If you limit
>yourself to data structures that would work in FORTRAN or Pascal, you
>have no need for such programs.

I already answerred this kind of question before. In Algol 68, it's
	[]proc proc ref int

How many languages support arrays of functions returning arrays of
functions returning arrays of functions..... [*]

Two recursive mode definitions in Algol 68 are
	(1)  mode cell = struct(int data, ref cell link)
	(2)  mode list = proc(list)list
Translated into C these would be
	(1)  typedef struct cell{int data; struct cell *link;} cell;
	(2)  cannot be expressed in C

I suppose if you limit yourself to data structures that would work in C,
you have no need for such programs using (2).

Of course, CLU and other more recent languages would permit you to write
something akin to
	(3)  mode cell m = struct(m data, ref cell m link)
although in fact this cannot be written in C or Algol 68, so I suppose
if you limit.......

_______________________________________________________________________
* I don't have a copy of revised definition at hand, so I'm not sure if
	mode func = []proc func
is well formed. However
	mode func = []proc(int)func
is.
-- 
...!uunet!ingr!apd!smryan                                       Steven Ryan
...!{apple|pyramid}!garth!smryan              2400 Geng Road, Palo Alto, CA

kinnersley@kuhub.cc.ukans.edu (Bill Kinnersley) (12/19/90)

In article <20@garth.UUCP>, smryan@garth.UUCP (Steven Ryan) writes:
> 
> How many languages support arrays of functions returning arrays of
> functions returning arrays of functions..... [*]
> 

But it seems to me that this will single out languages that fit a particular
mold, rather than languages with power. If you want a language with

1) Functions as first-class data (most applicative languages)

2) Arrays (mostly imperative languages)

3) Strong type system

there's nothing left but Algol-68 and C.

If you're satisfied with lists in place of arrays, that makes it easier.

Icon, for example, will let you make lists of functions and return them
from functions.  But it has no type declarations.

Otherwise, ML, Miranda and Russell.

-- 
--Bill Kinnersley

dts@cs.edinburgh.ac.uk (Don Sannella) (12/20/90)

In article <27534.276e5bd3@kuhub.cc.ukans.edu>, kinnersley@kuhub.cc.ukans.edu (Bill Kinnersley) writes:
> ...  If you want a language with
> 1) Functions as first-class data (most applicative languages)
> 2) Arrays (mostly imperative languages)
> 3) Strong type system
> there's nothing left but Algol-68 and C.

... and ML.

Don Sannella
University of Edinburgh

P.S. Does C have a strong type system?  This must be some new development
I haven't heard about.

augustss@cs.chalmers.se (Lennart Augustsson) (12/20/90)

In article <27534.276e5bd3@kuhub.cc.ukans.edu> kinnersley@kuhub.cc.ukans.edu (Bill Kinnersley) writes:
> ..., rather than languages with power. If you want a language with
>
>1) Functions as first-class data (most applicative languages)
>
>2) Arrays (mostly imperative languages)
>
>3) Strong type system
>
>there's nothing left but Algol-68 and C.
>
C does not have functions as first class data, you can only use functions
pointers as data.  To see the difference just try to write a function
that does function composition.  For simplicity assume that all functions
take and return int, i.e. write an
	int (*compose(f, g))()
	int (*f)();
	int (*g)();
(ANSI-fy if you like)
so that (*compose(f,g))(x) == f(g(x)).

I claim that this is impossible to do in a portable way.


	-- Lennart Augustsson
Email:	augustss@cs.chalmers.se

gudeman@cs.arizona.edu (David Gudeman) (12/20/90)

In article  <1990Dec19.222059.6878@mathrt0.math.chalmers.se> Lennart Augustsson writes:
]>
]C does not have functions as first class data, you can only use functions
]pointers as data.  To see the difference just try to write a function
]that does function composition.

C does not let you construct functions at run time (which makes the
first-class functions rather anemic) but C does have first class
functions by the usual definition of the term.  The term "first-class"
just means that the type of object in question can be passed as an
argument to procedures and returned from procedures as a result.  In
procedural languages like C, it also means that you can assign the
values to variables.  First-class does not imply any particular
primitive operations on the objects (like construction), and the only
operation that an object needs to make it a function is function
application.

It is not really relevant that the things you pass around are
described in the language documentation as "function pointers".  They
behave exactly like a function.
--
					David Gudeman
gudeman@cs.arizona.edu
noao!arizona!gudeman

thornley@cs.umn.edu (David H. Thornley) (12/21/90)

In article <20@garth.UUCP> smryan@garth.UUCP (Steven Ryan) writes:
>>How many languages support arrays of pointers to functions returning
>>pointers to functions that return pointers to integers?  If you limit
>>yourself to data structures that would work in FORTRAN or Pascal, you
>>have no need for such programs.
>
>[Listing several neat things to do with Algol 68 declarations]
>
>I suppose if you limit yourself to data structures that would work in C,
>you have no need for such programs using (2).
>

Actually, my comment was in reference to the existence of cdecl
to help declare data structures.

Is there an algol68decl program?  :-)

DHT

sommar@enea.se (Erland Sommarskog) (12/26/90)

Also sprach Bill Kinnersley (kinnersley@kuhub.cc.ukans.edu):
>1) Functions as first-class data (most applicative languages)
>2) Arrays (mostly imperative languages)
>3) Strong type system
>there's nothing left but Algol-68 and C.

I speak neither Algol-68 nor C, but what I know of C it does not
have a very strong type system. Neither do I know what the 
requirement is to make functions as first-class data, but I 
would guess that Modula-2 and Ob(solet)eron would qualify at
least as well as C. I wouldn't say that anyone of them provides
a type system strong enough for my taste, though. By the way,
I have never understood how strong the type system is in Cobol,
but it is possible that it would qualify as well.

(A type system strong enough: a language where I can declare
types who are implemented in the same, e.g. integer, and
which are only compatible through explicit conversions.)
-- 
Erland Sommarskog - ENEA Data, Stockholm - sommar@enea.se
"There is only one success, namely to lead your life in your own way"
Anyone who can give a source for this?

augustss@cs.chalmers.se (Lennart Augustsson) (12/29/90)

In article <304@coatimundi.cs.arizona.edu> gudeman@cs.arizona.edu (David Gudeman) writes:
>In article  <1990Dec19.222059.6878@mathrt0.math.chalmers.se> Lennart Augustsson writes:
>]>
>]C does not have functions as first class data, you can only use functions
>]pointers as data.  To see the difference just try to write a function
>]that does function composition.
>
>C does not let you construct functions at run time (which makes the
>first-class functions rather anemic) but C does have first class
>functions by the usual definition of the term.  The term "first-class"
>just means that the type of object in question can be passed as an
>argument to procedures and returned from procedures as a result.  In
>procedural languages like C, it also means that you can assign the
>values to variables.  First-class does not imply any particular
>primitive operations on the objects (like construction), and the only
>operation that an object needs to make it a function is function
>application.
>
I was a bit sloppy in my posting since I did not make clear what I mean
by "first class".  There are several definitions, but the one I have in
mind is that for a type to be first class it should be possible to do the
same things with its objects as with objects of other types, such as
integers.  What this means depends on the language, but typical things would
be: send as an argument, return as a result, be expressible without giving it a
name (this a crucial point where C fails for functions, you can write 5 as it is
(i.e. you don't have to write "int five=5;"), but you cannot write an unnamed function),
read, write, etc.

C allows passing and returning, but not the others (well, C as such does not allow
reading/writing of integers either).  C also lacks first class structs (in my
definition of the term), because you cannot write unnamed struct values (you
can in GCC).

	-- Lennart Augustsson
[This signature is intentionally left blank.]

brnstnd@kramden.acf.nyu.edu (Dan Bernstein) (01/03/91)

In article <1990Dec29.101233.1894@mathrt0.math.chalmers.se> augustss@cs.chalmers.se (Lennart Augustsson) writes:
  [ included in a definition of ``first-class'': ]
> be expressible without giving it a
> name

First-class refers to semantics alone. Why are you trying to include
syntax?

> (this a crucial point where C fails for functions,
  [ you can't write an unnamed function ]

Huh? Who cares? To get an unnamed function you just have to declare the
function and not use its name. The fact that you can't write an unnamed
function is not a problem for programmers.

In contrast, pre-ANSI C structures were truly not first-class objects.
You couldn't pass them by value. This made some problems genuinely more
difficult.

This is just another example of why syntax is a lot less important than
semantics.

---Dan

anw@maths.nott.ac.uk (Dr A. N. Walker) (01/04/91)

In article <18747:Jan220:02:1091@kramden.acf.nyu.edu>
brnstnd@kramden.acf.nyu.edu (Dan Bernstein) writes:

>Huh? Who cares? To get an unnamed function you just have to declare the
>function and not use its name. The fact that you can't write an unnamed
>function is not a problem for programmers.

	Well, it *is* a problem.  You have to invent a name.  Whether
that is a *serious* problem depends on how many of these things there
are, and on how inventive the programmers are, but having programs
liberally bespattered with f1, f2, f3, ..., f17, ..., f923, ... really
isn't very pretty.  What's more, in many languages, including C and
Pascal, the declarations of f1 and its friends cannot usually be put
close to their uses, which means that when you encounter something like

	z = 2 * sin (theta) * integral (f17, 0, pi) + ... f18, etc, ...;

in a program, you're forever scanning around trying to find out what
f17, f18, ... mean.  If declarations *can* be put close to uses, then
we get something like

	{ float f17 (float x) { return blah; },
		f18 (...),
		...;
	  z = ...; }

which is better, but why *not* allow

	z = 2 * sin (theta) * integral ((float x)(blah), 0, pi) + ...;

(or some such)?  [A quick flip through some dusty decks reveals many,
many such anonymous procedures in my own programming practice, and very
convenient they were, too, in a wide variety of applications.]

>This is just another example of why syntax is a lot less important than
>semantics.

	That's a matter of semantics! [:-)]

	Firstly, the line between the two is distinctly blurred.  For
example, is the restriction that you can't declare the same identifier
twice in the same declaration syntactic or semantic?  Are the C
pointer/array relationships syntax or semantics?  One could put forward
respectable arguments either way.

	Secondly, the syntax seriously influences the "software
engineering" quality of a language.  For example, in C, you have a
good chance of getting a procedure and everything you need to know
to understand it onto one screen;  in Pascal, this is very frequently
impossible, both because of the verbosity of Pascal and because of the
restrictions on order of declaration;  in APL, a screenful of code is
likely to be impossible to understand [:-)].

	To be effectively used, a language must be both powerful
(within its chosen problem domain) and understandable (to the
appropriate users).  Both syntax and semantics (wherever the line
is drawn) are absolutely vital for this.

-- 
Andy Walker, Maths Dept., Nott'm Univ., UK.
anw@maths.nott.ac.uk

brnstnd@kramden.acf.nyu.edu (Dan Bernstein) (01/05/91)

In article <1991Jan4.152846.15917@maths.nott.ac.uk> anw@maths.nott.ac.uk (Dr A. N. Walker) writes:
> In article <18747:Jan220:02:1091@kramden.acf.nyu.edu>
> brnstnd@kramden.acf.nyu.edu (Dan Bernstein) writes:
> >Huh? Who cares? To get an unnamed function you just have to declare the
> >function and not use its name. The fact that you can't write an unnamed
> >function is not a problem for programmers.
> 	Well, it *is* a problem.  You have to invent a name.

Every coding stylebook I've seen recommends that you give names to
everything. It's a very good idea, you know.

> but having programs
> liberally bespattered with f1, f2, f3, ..., f17, ..., f923, ... really
> isn't very pretty.

Which is why you choose more meaningful names. Like fxpc, a function
that adds c to x.

> What's more, in many languages, including C and
> Pascal, the declarations of f1 and its friends cannot usually be put
> close to their uses,

Huh?

  #define FXPC \
  double fxpc(x) double x \
  { return x + c; }

  z = 2 * sin(theta) * integral(fxpc, 0, pi) + ...;

Then just dump FXPC at the end of the program, and of course an fxpc
declaration in a header file. That looks pretty close to me, and I find
it more readable than these ludicrous attempts to put everything on one
line.

> but why *not* allow
> 	z = 2 * sin (theta) * integral ((float x)(blah), 0, pi) + ...;

Who cares? How about because it's a pain to parse, compile, and read?

  [ syntax versus semantics ]
> For
> example, is the restriction that you can't declare the same identifier
> twice in the same declaration syntactic or semantic?

Syntax. It has no effect on semantics.

> Are the C
> pointer/array relationships syntax or semantics?

Depends which one you're talking about. The fact that the evaluation of
x[n] is defined as the evaluation of *((x) + (n)) is syntax.

> 	Secondly, the syntax seriously influences the "software
> engineering" quality of a language.

I agree that syntax is an extremely important part of a language. That's
why I'm spending a lot of time on Q's syntax.

---Dan

jwl@garnet.berkeley.edu (James Wilbur Lewis) (01/05/91)

In article <11883:Jan502:21:0191@kramden.acf.nyu.edu> brnstnd@kramden.acf.nyu.edu (Dan Bernstein) writes:
-  [ syntax versus semantics ]
-> For
-> example, is the restriction that you can't declare the same identifier
-> twice in the same declaration syntactic or semantic?
-
-Syntax. It has no effect on semantics.

Gee, all the C grammars I've seen allow the same identifier to be declared
any number of times.  If you can show me a C declaration grammar that 
disallows multiple occurrences of the same identifier, but generates all
valid C declarations, I'll eat my hat!

-- Jim Lewis (computer scientist, and damn proud of it)

brnstnd@kramden.acf.nyu.edu (Dan Bernstein) (01/05/91)

In article <1991Jan5.044445.15116@agate.berkeley.edu> jwl@garnet.berkeley.edu (James Wilbur Lewis) writes:
> In article <11883:Jan502:21:0191@kramden.acf.nyu.edu> brnstnd@kramden.acf.nyu.edu (Dan Bernstein) writes:
> -  [ syntax versus semantics ]
> -> For
> -> example, is the restriction that you can't declare the same identifier
> -> twice in the same declaration syntactic or semantic?
> -Syntax. It has no effect on semantics.
> Gee, all the C grammars I've seen allow the same identifier to be declared
> any number of times.

Who said he was referring specifically to C?

---Dan

jwl@garnet.berkeley.edu (James Wilbur Lewis) (01/05/91)

In article <13857:Jan506:12:5891@kramden.acf.nyu.edu> brnstnd@kramden.acf.nyu.edu (Dan Bernstein) writes:
-In article <1991Jan5.044445.15116@agate.berkeley.edu> jwl@garnet.berkeley.edu (James Wilbur Lewis) writes:
-> In article <11883:Jan502:21:0191@kramden.acf.nyu.edu> brnstnd@kramden.acf.nyu.edu (Dan Bernstein) writes:
-> -  [ syntax versus semantics ]
-> -> For
-> -> example, is the restriction that you can't declare the same identifier
-> -> twice in the same declaration syntactic or semantic?
-> -Syntax. It has no effect on semantics.
-> Gee, all the C grammars I've seen allow the same identifier to be declared
-> any number of times.
-
-Who said he was referring specifically to C?

No one did, although it was a reasonable thing to infer from the context.
Perhaps the poster who asked the above question can verify that he was 
referring to C.  It doesn't really matter; my challenge still stands.

Choose any language you like, or make one up, subject to the conditions of
the original question: that it allows a large number of identifiers to be 
declared in a single declaration, with the restriction that each identifier 
may appear at most once in that declaration.  I believe that you're wrong to 
refer to that restriction as "syntactic", and am interested in seeing what 
kind of grammar you can write that will generate all the valid declarations 
for that language (and no invalid declarations).  Please show *all* your 
work. :-)

-- Jim Lewis

brnstnd@kramden.acf.nyu.edu (Dan Bernstein) (01/05/91)

In article <1991Jan5.081755.23488@agate.berkeley.edu> jwl@garnet.berkeley.edu (James Wilbur Lewis) writes:
> Choose any language you like, or make one up, subject to the conditions of
> the original question: that it allows a large number of identifiers to be 
> declared in a single declaration, with the restriction that each identifier 
> may appear at most once in that declaration.  I believe that you're wrong to 
> refer to that restriction as "syntactic", and am interested in seeing what 
> kind of grammar you can write that will generate all the valid declarations 
> for that language (and no invalid declarations).

One of the failures of the modern computer science curriculum is that it
teaches people to think that ``syntax'' refers to a certain type of
formal grammar.

To answer the (trivial) challenge: Parse each declaration into its
multiset of identifiers. If the multiset contains any duplicates, the
declaration is illegal. Otherwise it passes this stage of parsing. Done.

---Dan

jwl@garnet.berkeley.edu (James Wilbur Lewis) (01/06/91)

In article <14679:Jan509:13:1391@kramden.acf.nyu.edu> brnstnd@kramden.acf.nyu.edu (Dan Bernstein) writes:
-In article <1991Jan5.081755.23488@agate.berkeley.edu> jwl@garnet.berkeley.edu (James Wilbur Lewis) writes:
-> Choose any language you like, or make one up, subject to the conditions of
-> the original question: that it allows a large number of identifiers to be 
-> declared in a single declaration, with the restriction that each identifier 
-> may appear at most once in that declaration.  I believe that you're wrong to 
-> refer to that restriction as "syntactic", and am interested in seeing what 
-> kind of grammar you can write that will generate all the valid declarations 
-> for that language (and no invalid declarations).
-
-One of the failures of the modern computer science curriculum is that it
-teaches people to think that ``syntax'' refers to a certain type of
-formal grammar.

I didn't restrict you to any particular *type* of grammar!  I would accept
BNF, yacc input, a regular expression, a recursive transition network,
a context-sensitive grammar...but make no mistake, syntax IS about
*grammar*, both in linguistics and computer science!  To quote from
one of my NLP texts:

  ``Traditional notions of syntax include ideas like "part of speech" and
"phrase marker" in discussing the structure of a sentence.''  [Birnbaum
and Selfridge, from _Inside Computer Understanding_, p. 331]

It is meaningless to talk about parts of speech, phrase markers, or (I claim)
any other syntactic issues, without at least implicit reference to a grammar.

The construction "int x, y, z, x;" is an invalid C declaration not because
it violates C syntax -- it doesn't, for any reasonable grammar -- but
simply because C compilers do not assign meaning (semantics!) to a program
containing this construct (even though a reasonable interpretation exists).
And what are we to make of "int x; float x;"?  This is *clearly* a
semantic error, sort of a C version of "Colorless green ideas sleep furiously."

-To answer the (trivial) challenge: Parse each declaration into its
-multiset of identifiers. If the multiset contains any duplicates, the
-declaration is illegal. Otherwise it passes this stage of parsing. Done.

I asked for a grammar and you gave me a decision procedure.  *BZZZZZT*,
thank you for playing; your consolation prize is a one-week subscription
to sci.lang. :-)

-- Jim Lewis

oz@yunexus.yorku.ca (Ozan Yigit) (01/06/91)

In article <14679:Jan509:13:1391@kramden.acf.nyu.edu>
brnstnd@kramden.acf.nyu.edu (Dan Bernstein) writes:

>One of the failures of the modern computer science curriculum is that it
>teaches people to think that ``syntax'' refers to a certain type of
>formal grammar.

... and some people are destined to eat the menu instead of the meal.

Dan, I am eagerly awaiting your ``fix'' to the past or present computer
science curriculum. If you have something important (!) to say, why not
just say it, instead of keep implying that you have something important?

Here, I'll be nice, and give you something on the topic I happen to have
around, just to get you started:

Minsky, Computation: Finite and Infinite Machines, pp 226: [1]

   ... we defined a rule of inference to be an effetive test to decide
   wheter a string s can be deduced from a set of strings s1,...sn. We
   required the test to be effective so that we could require the test of a
   proof to be effective. But we did not really tie things down securely
   enough, for one might still make a rule of inference depend (in some
   effective way) upon some understanding of what the strings ``mean''.
   That is, one might have some rule of inference depend upon a certain
   ``interpretation'' of the strings as asserting things about some
   well-understood subject matter; the strings might, for example, be
   sentences in English. (For example, the strings in chapter 4 - the
   ``regular espressions'' were understood to represent the ``regular sets'',
   and our proofs about regular expressions used references to these
   understood meanings.)

   To avoid such dangerous questions, we propose to restrict ourselves to
   rules of inference that concern themselves entirely with the arrangement
   of symbols within strings -- i.e., to the visible form of the strings as
   printed on a page -- and we rule out reference to meanings. This will
   force us, for the present, to direct our attention toward what is often
   called the domain of ``syntax'' -- questions about how expressions are
   assembled and analysed -- rather than the domain of ``semantics'' --
   questions of the meanings of expressions.

[deja vu !!] 

cheers... oz
---
[1] Minsky, Marvin, Computation: Finite and Infinite Machines,
    Prentice-hall Series in Automatic Computation, Englewood Cliffs, 
    N.J., 1967
---
The king: If there's no meaning	   	    Usenet:    oz@nexus.yorku.ca
in it, that saves a world of trouble        ......!uunet!utai!yunexus!oz
you know, as we needn't try to find any.    Bitnet: oz@[yulibra|yuyetti]
Lewis Carroll (Alice in Wonderland)         Phonet: +1 416 736-5257x3976

thorinn@diku.dk (Lars Henrik Mathiesen) (01/07/91)

jwl@garnet.berkeley.edu (James Wilbur Lewis) writes:

>Choose any language you like, or make one up, subject to the conditions of
>the original question: that it allows a large number of identifiers to be 
>declared in a single declaration, with the restriction that each identifier 
>may appear at most once in that declaration.  I believe that you're wrong to 
>refer to that restriction as "syntactic", and am interested in seeing what 
>kind of grammar you can write that will generate all the valid declarations 
>for that language (and no invalid declarations).  Please show *all* your 
>work. :-)

Being a quite unoriginal type of person (or perhaps I'm just lazy)
I'll give a reference instead: A. van Wijngarden et al., Revised
Report on the Algorithmic Language ALGOL 68, _Acta_Informatica_ 5,
1975; reprinted Springer-Verlag, Berlin and New York, 1976.

In ALGOL 68 all context dependencies are described in a two-level (van
Wijngarden) grammar; the rest of the formal description consists of
some basic definitions and an operational semantics. The semantics
does have some restrictions, but violating them just gives undefined
runtime behaviour (it's stuff like dereferencing null pointers and
dangling pointers).

--
Lars Mathiesen, DIKU, U of Copenhagen, Denmark      [uunet!]mcsun!diku!thorinn
Institute of Datalogy -- we're scientists, not engineers.      thorinn@diku.dk

brnstnd@kramden.acf.nyu.edu (Dan Bernstein) (01/07/91)

In article <19876@yunexus.YorkU.CA> oz@yunexus.yorku.ca (Ozan Yigit) writes:
>    ... we defined a rule of inference to be an effetive test to decide
>    wheter a string s can be deduced from a set of strings s1,...sn.

In the article that you're responding to I gave exactly such an
effective test. I conclude that the repeated-variable-in-declaration
problem is syntactic, by Minsky's definition (which I agree with). Done.

---Dan

brnstnd@kramden.acf.nyu.edu (Dan Bernstein) (01/07/91)

In article <1991Jan6.033646.9847@agate.berkeley.edu> jwl@garnet.berkeley.edu (James Wilbur Lewis) writes:
> -One of the failures of the modern computer science curriculum is that it
> -teaches people to think that ``syntax'' refers to a certain type of
> -formal grammar.
> I didn't restrict you to any particular *type* of grammar!  I would accept
> BNF, yacc input, a regular expression, a recursive transition network,
> a context-sensitive grammar...but make no mistake, syntax IS about
> *grammar*, both in linguistics and computer science!  To quote from
> one of my NLP texts:

See, this is exactly what I'm talking about.

> The construction "int x, y, z, x;" is an invalid C declaration not because
> it violates C syntax -- it doesn't, for any reasonable grammar -- but
> simply because C compilers do not assign meaning (semantics!) to a program
> containing this construct (even though a reasonable interpretation exists).

By that argument, every syntactic restriction is a semantic restriction,
because C compilers won't assign meaning to things that aren't
syntactically correct. What does this have to do with standard
definitions of ``syntax''?

> And what are we to make of "int x; float x;"?  This is *clearly* a
> semantic error,

What does this have to do with the question at hand?

> -To answer the (trivial) challenge: Parse each declaration into its
> -multiset of identifiers. If the multiset contains any duplicates, the
> -declaration is illegal. Otherwise it passes this stage of parsing. Done.
> I asked for a grammar and you gave me a decision procedure.  *BZZZZZT*,
> thank you for playing; your consolation prize is a one-week subscription
> to sci.lang. :-)

This is an effective parsing procedure independent of semantics, so by
Minsky's definitions it has to do with syntax. You have heard of Minsky,
I hope? Anyway, syntax does *not* mean any particular type of formal
grammar, or even a combination of such types.

---Dan

hilfingr@rama.cs.cornell.edu (Paul N. Hilfinger) (01/07/91)

In article <1991Jan6.033646.9847@agate.berkeley.edu> jwl@garnet.berkeley.edu (James Wilbur Lewis) writes:
>In article <14679:Jan509:13:1391@kramden.acf.nyu.edu> brnstnd@kramden.acf.nyu.edu (Dan Bernstein) writes:
 ...
>-One of the failures of the modern computer science curriculum is that it
>-teaches people to think that ``syntax'' refers to a certain type of
>-formal grammar.
>
>I didn't restrict you to any particular *type* of grammar!  I would accept
>BNF, yacc input, a regular expression, a recursive transition network,
>a context-sensitive grammar...but make no mistake, syntax IS about
>*grammar*, both in linguistics and computer science!

Context-sensitive and type-0 grammars are indeed capable of expressing scope
rules (e.g., "identifiers must be declared in an enclosing scope
before use", "an identifier may only be declared once") and other
static restrictions (such as type restrictions).  The Algol 68
grammar, for example, completely describes what we generally call the
"static semantics" of that language.  Attribute grammars have similar
capabilities.  As a result, the term "syntax" really is used by some
to include static semantic properties.

Of course, the extensive use of context-free grammars of languages
augmented with semi-formal English text has resulted in the imprecise
use of the term "syntax" or "grammar" to refer to "context-free
syntax".  However, since Mr. Lewis appears not to be committing that
particular imprecision, I must disagree with his conclusion.

The distinction between "syntax" and "semantics" is fuzzy.  Syntax is
supposed to be "structure" and semantics is supposed to be "meaning".
However, one can explain the incorrectness of an undeclared identifier
either by saying "it is nonsense to use an identifier that has no
defined meaning"---a semantic explanation---or "an identifier that
forms a primary-expression must be textually identical to a preceding
direct-declarator immediately within an enclosing translation-unit, or
compound-statement"---which makes no reference to the meaning of
anything and is structural or syntactic.  As an even more concrete
example of this fuzziness, consider the description of a language that
has only a fixed number of possible identifiers, but requires
definition before use.  It is possible, in principle, to define the
scope rules in this language either in (context-free) BNF or with
(context-sensitive) traditional English semantics.

Paul Hilfinger

oz@yunexus.yorku.ca (Ozan Yigit) (01/07/91)

In article <50360@cornell.UUCP> hilfingr@cs.cornell.edu
(Paul N. Hilfinger) writes:

|> ... As a result, the term "syntax" really is used by some
|> to include static semantic properties.

I think, for the purposes of the current discussion (whatever that was)
the terminology must converge at some point, from a study in sublety of
meaning [which I sincerely doubt is attributable to any of the posters in
the current discussion] to something we can all use to communicate and
accomplish whatever we are after.

|> The distinction between "syntax" and "semantics" is fuzzy.  Syntax is
|> supposed to be "structure" and semantics is supposed to be "meaning".

Right. Here is what in Stoy[1] has to say in this topic:

   Syntax deals with the free properties of a language. The sorts of
   question that arise here are: is X a grammatically correct program?
   What are the proper subexpressions of the expression E?
   
   The semantics of a language attempts to give the language an
   interpretation, and to supply a meaning, or value, to the expressions,
   programs etc. in the language.
   
   The boundary between sytax and semantics is a bit fuzzy in places.
   For example, is the question whether the occurence of a name is within
   the the scope of some definition of that name a matter of syntax or
   semantics?  We could regard it as a (context-sensitive) syntactic
   question, or we could do as most compilers do, and make it part of the
   semantics. We should not be worried by the fuzziness, however: the
   distinction remains obvious, important and useful for the majority of
   the cases.
   ...


I hope this is good enough to get to some sort of agreement as to what we
mean when we say "syntax", and "semantics".

oz
---
[1] Stoy, Joseph E., Denotational Semantics: The Scott-Strachey
Approach to Programming Language Theory, MIT Press, Cambridge, Mass. 1977
---
The king: If there's no meaning             Usenet:    oz@nexus.yorku.ca
in it, that saves a world of trouble        ......!uunet!utai!yunexus!oz
you know, as we needn't try to find any.    Bitnet: oz@[yulibra|yuyetti]
Lewis Carroll (Alice in Wonderland)         Phonet: +1 416 736-5257x3976

jwl@garnet.berkeley.edu (James Wilbur Lewis) (01/08/91)

In article <23484:Jan619:01:5391@kramden.acf.nyu.edu> brnstnd@kramden.acf.nyu.edu (Dan Bernstein) writes:
>In article <19876@yunexus.YorkU.CA> oz@yunexus.yorku.ca (Ozan Yigit) writes:
>>    ... we defined a rule of inference to be an effetive test to decide
>>    wheter a string s can be deduced from a set of strings s1,...sn.
>
>In the article that you're responding to I gave exactly such an
>effective test. I conclude that the repeated-variable-in-declaration
>problem is syntactic, by Minsky's definition (which I agree with). Done.

There's something fishy with this line of reasoning.  You seem to think
that the existence of an effective procedure to distinguish valid from
non-valid strings implies that the property of "validity" is purely
syntactic.  Now in a sense, this is true because any effective recognition
procedure corresponds to some Type 0 grammar.  But this, to me, stretches
the traditional notion of "syntax" well past the breaking point.  Does the
existence of a Type 0 grammar for prime numbers imply that primeness is
a syntactic property?  Or what about an effective procedure for computing
the meaning (object code) of a C source file?  In principle, we could pair 
off the C strings with their meanings in some suitable notation and capture the 
relationship in an unrestricted grammar,  and so (by this fishy defininition)
render all discussions of "C semantics" meaningless.  

This is where the fuzziness between syntax and semantics comes from.  For
a given language, one can either specify it with a relatively simple
grammar and allow the possibility of syntactically valid but meaningless
strings, or use a more powerful grammar formalism and reduce most questions of
validity to purely syntactic issues.   

To me, this implies that when someone claims "X is a syntactic property
of language L", it is a reasonable thing to ask them what grammar they're 
using.  To be honest, I *was* surprised to hear that Algol 68 used a
context-sensitive grammar to syntactically enforce its declaration rules.
I'm still curious to see what the grammar rules look like -- that challenge
of mine wasn't totally facetious, though I'd better retract the part about
"eating my hat" if Dan comes up with a grammar having the desired
properties. :-)  But I'm still not going to be impressed with a 
non-constructive existence proof for a Type 0 grammar, for the reasons
outlined above.

We should also be aware of the difference between formal language specification
and the practical realities of compiler construction.  If feature X is
specified by a context-sensitive grammar, but implemented via semantic
actions in a parser using a context-free grammar, is X "syntax" or
"semantics"?  

I think any linguist would take a dim view of resorting to the existence
of a Type 0 grammar as an argument for some language feature being a
syntactic property.  According to Chomsky (quoted in Winograd, _Language
as a Cognitive Process_): "Reduction of the class of grammars is the
major goal of linguistic theory".  His characterization of the problem:

   "The gravest defect of the theory of transformational grammar is
    its enormous latitude and descriptive power.  Virtually anything
    can be expressed as a phrase-marker...virtually any imaginable
    rule can be described in transformational terms.  Therefore, a
    critical problem in making transformational grammar a substantive
    theory with explanatory force is to restrict the category of 
    admissible phrase-markers, admissible transformations, and admissible 
    derivations."

These arguments apply just as well to the languages that computer
scientists are accustomed to dealing with, so it's not surprising that
we tend to think in terms of CFL's when dealing with issues of syntax.
Not only is it not a defect in the CS curriculum (most of us, after all,
*have* been exposed to the more powerful classes of formal grammars), but
as Chomsky argues above, it's good linguistic sense.  And *that* is
about as close to the horse's mouth as we are going to get in this
discussion!

-- Jim Lewis

brnstnd@kramden.acf.nyu.edu (Dan Bernstein) (01/08/91)

In article <1991Jan7.202423.23420@agate.berkeley.edu> jwl@garnet.berkeley.edu (James Wilbur Lewis) writes:
> In article <23484:Jan619:01:5391@kramden.acf.nyu.edu> brnstnd@kramden.acf.nyu.edu (Dan Bernstein) writes:
> >In article <19876@yunexus.YorkU.CA> oz@yunexus.yorku.ca (Ozan Yigit) writes:
> >>    ... we defined a rule of inference to be an effetive test to decide
> >>    wheter a string s can be deduced from a set of strings s1,...sn.
> >In the article that you're responding to I gave exactly such an
> >effective test. I conclude that the repeated-variable-in-declaration
> >problem is syntactic, by Minsky's definition (which I agree with). Done.
> There's something fishy with this line of reasoning.  You seem to think
> that the existence of an effective procedure to distinguish valid from
> non-valid strings implies that the property of "validity" is purely
> syntactic.

Well, not really. I think the procedure has to be not only effective in
the theoretical sense but also practical. I don't care for a parser that
doesn't run at a reasonable speed. In this case, though, recognizing
doubled strings is quite practical.

> In principle, we could pair 
> off the C strings with their meanings in some suitable notation and capture the 
> relationship in an unrestricted grammar,  and so (by this fishy defininition)
> render all discussions of "C semantics" meaningless.  

I see what you're saying, but I don't agree. It really is possible to
define a language feature in such a way that it cannot be turned into
syntax.

Let's come down out of the clouds for a moment and consider a real
example. Is ``(a % b) + (a / b) * b always equals a,'' for integral a
and non-zero integral b, a semantic property or a syntactic property?

``Obviously semantic'' is everyone's first response. But consider the
integers from -32768 to 32767. An operation on integers in this range
doesn't have to be specified mathematically. It can be specified as just
one really big switch statement. Inside that switch statement, numbers
lose their meaning as such. ``57'' as a string reflects everything that
57 as a number does, because ``57'' only appears at a finite number of
spots inside our big switch statement---our parser---and has properties
entirely determined by the fixed symbols near it in the statement. (I'm
not explaining this too well, but bear with me.)

Now Programmer Dan chimes in. ``Real programmers don't use big, slow,
sloppy syntactic switch statements in place of efficient, concise
mathematical semantics,'' he says. That's true but it's besides the
point. Just because something is an awful mess when expressed
syntactically doesn't change the fact that it's syntactic!

But there is a problem with our switch statement. *It depends on the
integer size.* If the integer size is different, our semantic model of
integer operations stays the same. Our syntactic model explodes. *This*
is what makes the integer operations semantics and not syntax: no single
syntactic definition suffices for arbitrarily large integers.

Anyone understand what I'm getting at? Please try to explain it better
than I just did.

> This is where the fuzziness between syntax and semantics comes from.  For
> a given language, one can either specify it with a relatively simple
> grammar and allow the possibility of syntactically valid but meaningless
> strings, or use a more powerful grammar formalism and reduce most questions of
> validity to purely syntactic issues.   

Agreed. I'd say that a feature is semantic if the most powerful *fixed*
grammar for the language can't express the feature.

---Dan

anw@maths.nott.ac.uk (Dr A. N. Walker) (01/09/91)

In article <11883:Jan502:21:0191@kramden.acf.nyu.edu>
brnstnd@kramden.acf.nyu.edu (Dan Bernstein) writes:
>Every coding stylebook I've seen recommends that you give names to
>everything. It's a very good idea, you know.

Really?  Do you write

	#define ONE	1
	#define TWO	2
	#define TWENTYFOUR	24
	#define SIXTY	60
	secs = (((days*TWENTYFOUR + hours)*SIXTY + minutes)*SIXTY + seconds;
	for (j = ONE; ; j += TWO) ...;  /* step through odd integers */

in preference to

	secs = (((days * 24 + hours) * 60 + minutes) * 60 + seconds;
	for (j = 1; ; j += 2) ...;

?  Somewhere between that and

	#define BUFSIZE 4096
	char buffer[BUFSIZE];

which I hope everyone prefers to

	char buffer[4096]

there is a grey area, in which it's a matter of taste.  The vast majority
of functions fall one side of the line, but to deny the possibility that
a few fall the other [especially when, in some languages, extra expressive
power is obtained that way] seems unrealistic.

>Which is why you choose more meaningful names. Like fxpc, a function
>that adds c to x.

	I think I'd find "faddc" more expressive;  but anyway, it all gets
rather messy when the function delivers the absolute value of the difference
between the reciprocal third powers of r1 and r2 -- "famr3r1r2"?  Ugh!

>  #define FXPC \
>  double fxpc(x) double x \
>  { return x + c; }
>
>  z = 2 * sin(theta) * integral(fxpc, 0, pi) + ...;

	OK, that's quite neat for some simple cases (though I personally
don't like multi-line defines).  It has a snag:  I can imagine some very
hard-to-find bugs of the sort

	{ double c = some very messy expression;
	#define FXPC ... as above ...
	  z = ... integral (fxpc, 0, pi) ... + c + ... lots of uses of c;
	}

where the "c" that occurs in "fxpc" is not the one that occurs elsewhere
in the compound statement -- not too obscure if it leads to a compilation
error, but tuf if there just happens to be some global with the same name
and an appropriate type.

>> [anonymous functions]
>Who cares? How about because it's a pain to parse, compile, and read?

	Why is it easier to parse, compile or read a function body when
it's tacked on to a declaration than when it occurs elsewhere?  Is "7"
easier to read in "int i = 7;" than in "j += 7;"?

	[I'll get back to syntax and to anonymous functions in separate
articles, as the threads have diverged somewhat.]

-- 
Andy Walker, Maths Dept., Nott'm Univ., UK.
anw@maths.nott.ac.uk

anw@maths.nott.ac.uk (Dr A. N. Walker) (01/09/91)

In article <1991Jan5.081755.23488@agate.berkeley.edu>
jwl@garnet.berkeley.edu (James Wilbur Lewis) writes:

	[I asked ...]
>-> ->		is the restriction that you can't declare the same identifier
>-> -> twice in the same declaration syntactic or semantic?

	[and Dan Bernstein replied ...]

>-> -Syntax. It has no effect on semantics.

>Perhaps the poster who asked the above question can verify that he was
>referring to C.

	Not particularly.  My point was simply that it's a grey area.
When I was learning to program, we were taught [this was in the days
when BNF was the greatest thing since sliced bread] that (the moral
equivalent of)

	main () { int i, i; exit (0);}

was "syntactically correct" ('cos it was produced by the BNF), but
"semantically incorrect" (there was a section labelled "semantics"
after each bit of syntax that described what the syntax meant, and
what conditions had to be attached).  It was "impossible to prevent
such programs syntactically" (ie, using BNF).  There you are, then;
it's semantics.

	Well, I've loaned out my Algol 68 Report, but the Draft (MR93)
refers instead to "context conditions" -- the (moral equivalent of the)
above is a "program" but not a "proper program" (one that satisfies
conditions such as "No proper program contains a reach containing
two defining occurrences of a given identifier"), and even less a
"meaningful program" (one whose elaboration is defined by the Report).
MR93 makes it clear that whether the distinction between programs and
proper programs is syntactic or not is a matter of taste.

	By the Revised Report, the above text isn't even a "program" --
it can't be generated by the syntax --, and therefore the restriction is
clearly syntactic (though in a real compiler, it will be checked by
exactly the same piece of code that it always was:  "is this identifier
already in the current chunk of the symbol table?").

	Later work with 2-level grammars (or with attribute grammars, as
others have pointed out) makes it clear that even the "meaningful" programs
can be described completely by syntax, and there is thus no need for
semantics at all.  However, you will find no trace of this in most
definitions of more recent languages (including C, Modula-N and Pascal);
almost universally, there is a syntax that generates improper programs,
and a set of statements in natural language about proper and meaningful
programs.  Even the use of denotational semantics seems to be confined
to formalising this natural language;  in this model, a language is
defined by its semantics, and the syntax is sugar for the reader.

	Since both extremes are espoused by reasonable people, I deduce
that, like so much else, it's a matter of taste.

>Choose any language you like, or make one up, subject to the conditions of
>the original question: that it allows a large number of identifiers to be
>declared in a single declaration, with the restriction that each identifier
>may appear at most once in that declaration.

	As others have pointed out, this is easy with a two-level grammar.
For details, see the Revised Report, or [better!] "Grammars of Programming
Languages" by Cleaveland and Uzgalis (Elsevier, 1977).

-- 
Andy Walker, Maths Dept., Nott'm Univ., UK.
anw@maths.nott.ac.uk

brnstnd@kramden.acf.nyu.edu (Dan Bernstein) (01/09/91)

In article <1991Jan8.165344.13870@maths.nott.ac.uk> anw@maths.nott.ac.uk (Dr A. N. Walker) writes:
> In article <11883:Jan502:21:0191@kramden.acf.nyu.edu>
> brnstnd@kramden.acf.nyu.edu (Dan Bernstein) writes:
> >Every coding stylebook I've seen recommends that you give names to
> >everything. It's a very good idea, you know.
  [ rebuttal ]

Sorry. What I meant by ``everything'' was ``every object.'' You want
references? How about Ledgard's books?

> (though I personally
> don't like multi-line defines).

Me neither. The C preprocessor sucks. But every language has its share
of syntactic problems.

> It has a snag:  I can imagine some very
> hard-to-find bugs of the sort
> 	{ double c = some very messy expression;
> 	#define FXPC ... as above ...

That isn't a snag: if c is local then you'll pass it to fxpc as well.
You can't criticize every language construct just because programs can
abuse global variables. (I note that most compilers will warn you about
the redefinition of c anyway.)

> >> [anonymous functions]
> >Who cares? How about because it's a pain to parse, compile, and read?
> 	Why is it easier to parse, compile or read a function body when
> it's tacked on to a declaration than when it occurs elsewhere?

Because juxtaposition is heavily overloaded, and when you allow function
definitions within expressions you start having to assign meanings to
(a)(b)(c)(d)(e).

---Dan

gudeman@cs.arizona.edu (David Gudeman) (01/09/91)

In article  <1991Jan7.185658.20240@basho.uucp> John Lacey writes:
]brnstnd@kramden.acf.nyu.edu (Dan Bernstein) writes:
]>Every coding stylebook I've seen recommends that you give names to
]>everything. It's a very good idea, you know.
]...
]Do you really gives names to everything?  Every integer constant,
]every string, every character?  Or do you let them go nameless?

More importantly, do you always give names to intermediate objects?
In C you have to give a name to a struct within a function when all
you are going to do is return it.  For integers you write

  return i + 1;

not 

  res = i + 1; return res;

But for structs you have to write

  res = malloc(sizeof(foo)); res.f1 = x1; res.f2 = x2;...; return res;

instead of

  return res(x1,x2,...);

Personally, I think this constitutes an important difference between
integers and structs in C.  I'm not sure that the term "first class"
should be used to make this distinction, though (a few weeks ago I
_knew_ that this usage of the term was wrong, but now I'm not so
sure).
--
					David Gudeman
gudeman@cs.arizona.edu
noao!arizona!gudeman

anw@maths.nott.ac.uk (Dr A. N. Walker) (01/10/91)

In article <14305:Jan800:43:4391@kramden.acf.nyu.edu>
brnstnd@kramden.acf.nyu.edu (Dan Bernstein) writes:
>[...]					      It really is possible to
>define a language feature in such a way that it cannot be turned into
>syntax.
>
[ example of "a%b + a/b*b = a" expressed as a switch deleted ]
>
> [...]				    Our syntactic model explodes. *This*
>is what makes the integer operations semantics and not syntax: no single
>syntactic definition suffices for arbitrarily large integers.

	But this depends on what you mean by a syntactic definition.
A two-level grammar is perfectly capable of dealing with arbitrarily
large integers (and so are other models of syntactic structure).  But
it starts to look like a programming language -- in other words, at one
extreme, a syntax is merely a program that when fed your program as input
will halt and either print "OK" or print "KO", and the *real* problem is
bootstrapping the syntax program ('cos we would like to write it in a
sensible language, and we can't until we know enough of the syntax of
the sensible language).  At the other extreme, we all agree that syntax
is (say) BNF, and then we find, as you say, that some things can't be
expressed.

>	 I'd say that a feature is semantic if the most powerful *fixed*
>grammar for the language can't express the feature.

	Explain "fixed".  A finite two-level grammar can express all
features of C (for example), including all those normally thought of as
semantic (such as the undesirability of following null pointers, or
indexing off the end of an array).  The only reason why this is more
interesting than the fact that a finite C program (such as a very
careful interpreting load-and-go compiler) can do the same thing is that
the two-level grammar *itself* has rather a simple syntax, so that most
people would [after some study] agree whether or not a 2LG is syntactically
correct, and it is probably rather easy to automate this.  (I don't know
whether anyone has tried.)

	To give a flavour of what a 2LG would look like for Dan's problem,
we would have an upper level which is very like traditional BNF, thus

		INT:	DIGIT | INT DIGIT
		DIGIT:  zero | one | two | ...

On the lower level, the higher level constructs can appear *inside the
LHS*, so that a finite collection of rules can represent an unbounded
collection of actual rules.  For example, we might have things like

	expression INT1 minus INT2: expression dec INT1 minus dec INT2.
	expression zero minus INT: negative INT.
	expression INT minus zero: INT.
	ANYTHING dec INT one: ANYTHING INT zero.
	ANYTHING dec INT two: ANYTHING INT one.
	ANYTHING dec INT zero: ANYTHING borrow dec INT.
	ANYTHING borrow INT: ANYTHING INT nine.
	...

In the lower level, occurrences of higher constructs such as INT must
be replaced identically on both sides, so, for example, the third rule
generates things like "expression one seven minus zero: one seven.", but
not "expression zero zero four three minus zero: six nine.".  Note that
you will also need some clever-ish rules to get rid of leading zeros, and
if you want to prevent chasing down blind alleys, you will need predicates
on the lines of

	expression INT1 minus INT2: where positive INT1 and positive INT2,
		expression dec INT1 minus dec INT2.

	After a page or three of such rules, we can have every expectation
that "expression one seven mod three plus one seven div three times three"
will eventually yield "one seven".  Full details are left to the reader.
[:-)]

-- 
Andy Walker, Maths Dept., Nott'm Univ., UK.
anw@maths.nott.ac.uk

karl@ima.isc.com (Karl Heuer) (01/11/91)

In article <1991Jan8.165344.13870@maths.nott.ac.uk> anw@maths.nott.ac.uk (Dr A. N. Walker) writes:
>In article <11883:Jan502:21:0191@kramden.acf.nyu.edu> brnstnd@kramden.acf.nyu.edu (Dan Bernstein) writes:
>>Every coding stylebook I've seen recommends that you give names to
>>everything. It's a very good idea, you know.

I'm afraid I can't agree with that statement in its full generality.

>Really?  Do you write [abridged --kwzh]
>	#define TWENTYFOUR	24
>	#define SIXTY	60
>	secs = (((days*TWENTYFOUR + hours)*SIXTY + minutes)*SIXTY + seconds;
>in preference to
>	secs = (((days * 24 + hours) * 60 + minutes) * 60 + seconds;
>?

On the other hand, I can't agree with this part of the counterexample either.
The first version is silly, but there does exist a non-silly version&:
	#define HOUR_DAY  24
	#define MIN_HOUR  60
	#define SEC_MIN   60
	secs = (((days*HOUR_DAY+hours)*MIN_HOUR+minutes)*SEC_MIN+seconds;
Note that the constant "60" deserves *two* names, since it has two independent
uses.  It's just a historical coincidence that they have the same value.

>Somewhere between that and
>	#define BUFSIZE 4096
>	char buffer[BUFSIZE];
>which I hope everyone prefers to
>	char buffer[4096]

No, I would not (necessarily).  BUFSIZE is already so horribly overloaded as
to be virtually meaningless.  ("Why does linbuf[] have BUFSIZE elements?"
"Oh, all the arrays do.  It's just some number that we hope is big enough to
prevent overflow."  "Is there any reason why linbuf[] and genbuf[] have to
have the same size?"  "No.")  If it is specific to this one buffer, then there
is little reason to prefer
	#define BUFSIZE 4096
	char buffer[BUFSIZE];
	read(fd, buffer, BUFSIZE);
to
	char buffer[4096];
	read(fd, buffer, sizeof(buffer));
which avoids the need to invent a separate name for the size.$

A good first approximation is to name something iff it gets used more than
once.  Anonymous functions would be useful for once-only instances such as
handing them off to qsort().
	qsort((void *)sa, n, sizeof(char *), (int (*)(void const *, void \
	const *)){ return strcmp(*(char const **)%0, *(char const **)%1); });

Karl W. Z. Heuer (karl@ima.isc.com or uunet!ima!karl), The Walking Lint
________
& This is not meant to imply that I always use the named form.  Since these
  particular constants are extremely unlikely to change and will probably be
  understood in context even without being named, I would not object to the
  unnamed version of the code fragment.
$ I'm assuming here that all the references are within the same file.

pcg@cs.aber.ac.uk (Piercarlo Grandi) (01/11/91)

On 8 Jan 91 17:33:02 GMT, anw@maths.nott.ac.uk (Dr A. N. Walker) said:

	[ ... on whether invalid declarations can be forbidden
	on purely syntactive grounds, which it cannot be done
	with a context free grammar ... ]

anw> As others have pointed out, this is easy with a two-level grammar.
anw> For details, see the Revised Report, or [better!]  "Grammars of
anw> Programming Languages" by Cleaveland and Uzgalis (Elsevier, 1977).

Ah, here you take out the book which is the nuclear weapon of context
sensitive grammar debates. I agree with what you say, as far as it goes,
but you omit to remind us that that book clearly demonstrates what the
*semantics* of a language can easily be formalized with a two level
grammar, which throws a spanner in the well oiled structure of this type
of debate (Bernstein writes provocative, edgy but sensible things,
everybody else stop chewing their sandwich and jumps on him with glee
:->).

Context sensitive grammar generators (they do exist!) can do pretty
impressive things -- just consider that people have been using simple
context free grammars to do, in favourable conditions, code generation!

Two level or attribute grammars are strange beasts indeed, at least from
the usual CS perspective under which grammars stop at context free.

End of debate while a lot of people stop jumping on Bernstein and rush
to the library (not the bookshop -- out of print for a long time) to get
a copy of the book over which to finish munching their sandwich? :-)
--
Piercarlo Grandi                   | ARPA: pcg%uk.ac.aber.cs@nsfnet-relay.ac.uk
Dept of CS, UCW Aberystwyth        | UUCP: ...!mcsun!ukc!aber-cs!pcg
Penglais, Aberystwyth SY23 3BZ, UK | INET: pcg@cs.aber.ac.uk

jeff@aiai.ed.ac.uk (Jeff Dalton) (02/01/91)

In article <28228:Jan902:38:1891@kramden.acf.nyu.edu> brnstnd@kramden.acf.nyu.edu (Dan Bernstein) writes:
>I'm still pretty sure that ``first-class'' means what we think it does.
>Unfortunately, it appears that the Scheme literature has gone out of its
>way to abuse the term, so we can't use the standard definition of
>``first-class'' without causing a ruckus. It's always a shame when
>useful terms are killed in this way.

Actually, it seems that some C fans are going out of their way
to abuse the term, because for some reason they can't stand the
idea that functions might not be first class in C.

Look at some non-Scheme literature such as Stoy's book on
denotational semantics, for example.