[comp.compilers] Low-Rent Syntax

Donald.Lindsay@MATHOM.GANDALF.CS.CMU.EDU (08/10/90)

In article <1990Jul27.034115.8747@esegue.segue.boston.ma.us> 
	moss@cs.umass.edu (Eliot Moss) writes:
>Related to BEGIN/END blocks is the argument over whether the ";" that follows
>some statements should be a separator or a terminator...

>CLU was actually able to *eliminate* the ";" statement separator/terminator,
>through *very* careful syntax design (and maybe it required more than one
>token lookahead, too; I don't recall clearly).

The Icon language (Arizona) and the Turing language (Toronto) both
have "low rent" syntax - that is, the ";" is only needed (as a
separator) when one writes multiple statements on a single line.  In
all other cases, it can be omitted.

Is there now a "usual" way to implement this ?

Don		D.C.Lindsay
-- 
Send compilers articles to compilers@esegue.segue.boston.ma.us
{spdcc | ima | lotus| world}!esegue.  Meta-mail to compilers-request@esegue.

norvell@csri.toronto.edu (Theo Norvell) (08/12/90)

In article <1990Aug09.180536.18782@esegue.segue.boston.ma.us> Donald Lindsay writes:
>In article <1990Jul27.034115.8747@esegue.segue.boston.ma.us> 
>	moss@cs.umass.edu (Eliot Moss) writes:
>>CLU was actually able to *eliminate* the ";" statement separator/terminator,
>>through *very* careful syntax design (and maybe it required more than one
>>token look-ahead, too; I don't recall clearly).
>
>The Icon language (Arizona) and the Turing language (Toronto) both
>have "low rent" syntax - that is, the ";" is only needed (as a
>separator) when one writes multiple statements on a single line.  In
>all other cases, it can be omitted.
>
>Is there now a "usual" way to implement this ?

In Turing, the semicolon is never needed.  The reason is careful syntax
design.  There is no trickery in the lexical analysis and the grammar
is LL(1).  Consider the following LL(1) grammar
	SS --> 
	     | S SS
	S --> var name : T
	    | procedure name A SS end name
	    | function name A : T SS end name
	    | E := E
	    | if E then SS [else SS] end if
	    | case E of {label E: SS} [label : SS] end case
	    | loop SS end loop
	    | for name : T SS end for
	    | exit [when E]
	E --> etc
	T --> etc
	A --> etc
No semicolons.  In Turing the second clause is really
	SS --> S [ ; ] SS
So you can sprinkle semicolons to taste.  Euclid had a similar syntax,
but a semicolon was still required in one obscure case.

In Icon, the newline takes the place of the semicolon when a semicolon
is syntactically allowed.  This means you have to write
	a := b + 
	     c
rather than
	a := b
	   + c
if you mean 
	a := b + c
The second is syntactically correct, but means something else.  Griswold has
a book on the implementation of Icon.  Perhaps it explains the
implementation of this rule.

I haven't a clue about CLU.

Theo Norvell
U of T
-- 
Send compilers articles to compilers@esegue.segue.boston.ma.us
{spdcc | ima | lotus| world}!esegue.  Meta-mail to compilers-request@esegue.

doug@nixtdc.UUCP (Doug Moen) (08/12/90)

Donald.Lindsay@MATHOM.GANDALF.CS.CMU.EDU:
>The Icon language (Arizona) and the Turing language (Toronto) both
>have "low rent" syntax - that is, the ";" is only needed (as a
>separator) when one writes multiple statements on a single line.  In
>all other cases, it can be omitted.
>
>Is there now a "usual" way to implement this ?

Actually, Icon and Turing are quite different in this respect.

In Icon, ";" is required as a separator in order to write multiple
statements on a single line.  If you need to split a single statement over
several lines, you must be careful about where you put the line break: the
part of the statement which preceeds the newline must not look like a
valid statement.

Turing, on the other hand, doesn't rely on such kludges.  Turing does not
use ";" as a statement separator or terminator, and it treats newline as
ordinary white space.  Instead, the grammar is designed so that you can
always unambigously tell when one statement stops and another begins.

Some examples:
 - an if statement always begins with "if", and ends with "end if".
 - a procedure call statement begins with an identifier,
   and ends with a ) or an identifier.
 - an assignment statement begins with an identifier,
   and ends with an identifier, a literal constant, or ).
Note that in Turing, you can't write an arbitrary expression as a statement,
as you can in Icon or C.  If you could, then the grammar would
be highly ambiguous.  For example, you would not be able to
tell if "a - b" were one statement (ie, a-b;) or two (ie, a; -b;).
[Any idea whether it's particularly easy or hard to diagnose syntax errors
in Turing programs? -John]
-- 
Send compilers articles to compilers@esegue.segue.boston.ma.us
{spdcc | ima | lotus| world}!esegue.  Meta-mail to compilers-request@esegue.

steve@taumet.com (Stephen D. Clamage) (08/13/90)

There has been some discussion about *how* to design languages which do
not need semicolons to separate or end statements.  No one has brought up
why you would want to.

I read a study a while back about 'semicolon' errors made by student
programmers.  Two languages were used by different groups, the only
difference was that one used ';' to end statements, and the other used ';'
to separate statements.  About the same number of ';' errors were made by
both groups.  One could possibly conclude that a language which did not
need ';' at all would be beneficial.

This also calls to mind the story about the professor who got tired of
seeing so many sytax errors in students' programs, as well as in his own.
So he designed a language in which there were no syntax errors -- that is,
every sequence of tokens was legal.  Good idea?

Some redundancy is helpful in verifying that a program does what is
intended.  That is why modern languages require declarations even in
contexts where default typing could be used (as in FORTRAN and BASIC).
Required declarations protect against misspelling of names as one class of
error, and against misuse of types as another.

One example was given of
	a = b
	    + c
as being a legal sequence of statements in one language.  Almost certainly
this was meant to be a single statement.  To avoid bothering the
programmer with piddling semicolon errors, an undetectable semantic error
was allowed to slip through -- one which would be very hard to find.
Would you rather have to go through a single edit/compile cycle to add the
semicolon, or spend days trying to find out why the program doesn't work?

Additionally, the use of possibly-redundant semicolons allows the compiler
to better-isolate errors, and issue better error messages.

Finally, beginning programmers are going to make all kinds of errors, for
all kinds of reasons.  For more-experienced programmers, are semicolon
errors a real problem -- as big as other kinds of syntax errors?  I'd say
no.

-- 
Steve Clamage, TauMetric Corp, steve@taumet.com
-- 
Send compilers articles to compilers@esegue.segue.boston.ma.us
{spdcc | ima | lotus| world}!esegue.  Meta-mail to compilers-request@esegue.