[comp.lang.misc] Algol-style semicolons

jejones@mcrware.UUCP (James Jones) (12/15/88)

I never could understand what the difficulty is with Algol-style semicolons.
Maybe it was the way it was explained to me (thank you, Professor Feyock,
wherever you are!).  In Algol, semicolons *separate* statements.  The glyph
"end" is *not* a statement; it and "begin" serve the same purpose as paren-
theses, and indeed, Algol 68 makes this even more obvious by allowing the
use of parentheses in place of begin and end if one wishes.

(By the way, not all statements in C are terminated with semicolons; otherwise,
there wouldn't be all that grief over

#define woof(x, y, z) {/* various statements */}
...

	if (p)
		woof(a, b, c);
	else
		/* etc. */

and the world would be a happier place, but then, isn't saving keystrokes
the true goal of all C programmers? :-)

People don't seem to have trouble with the convention that in English, items
in a list are separated by commas, so why do they have trouble with Algol
semicolon usage?  I don't know.

		James Jones

firth@sei.cmu.edu (Robert Firth) (12/15/88)

In article <868@mcrware.UUCP> jejones@mcrware.UUCP (James Jones) writes:
>I never could understand what the difficulty is with Algol-style semicolons..

I've used languages with various kinds of semicolon conventions, and must
admit that I like the Algol-68 way least.

The best, in my opinion, is to have the semicolon optional: a newline
will terminate the statement if possible.  So you can write

	x := x+1
	y := y+1

and it works, but

	x := a0+a1+a2+
	     a3+a4+a5

also works.  One hardly ever gets this wrong.

If the semicolon is mandatory, I prefer it to be a terminator, precisely
because I then don't have to edit two lines to insert or remove one
statement.

Finally, if the semicolon is a separator, the language should be kind
enough to allow an extra semicolon before a syntactic bracket, either
by having a "null statement" or by some other fudge.

Much the nastiest part, though, is that the compilers I have used have
been very obtuse about recovering from the simplest errors.  The Algol-68
compiler had a great repertoire of error messages, from

	OBJECT OF MODE PROC()VOID CANNOT BE WEAKLY COERCED TO REF CHAR

to some really frightening ones; what most of them meant was either
"missing semicolon on previous line" or "extra semicolon on previous
line".  And I remember the Algol-60 compiler that could not recover
from

	IF c THEN s; ELSE s;

which is a single-token correction.  The real booby price, though, goes
to the C compiler where omitting a semicolon before a right brace:

	if (...) { s1; s2 }

causes an indefinite number of bogus consequential errors, until either
the compiler gives up "too many errors - goodbye" or your program text
ends.  Since the repair here is perhaps the simplest possible, this
compiler feature represents incompetence of a rare order.

gary@milo.mcs.clarkson.edu (Gary Levin) (12/16/88)

In article <8008@aw.sei.cmu.edu> firth@sei.cmu.edu (Robert Firth) writes:
   ...
   The best, in my opinion, is to have the semicolon optional: a newline
   will terminate the statement if possible.  So you can write

	   x := x+1
	   y := y+1

   and it works, but

	   x := a0+a1+a2+
		a3+a4+a5

   also works.  One hardly ever gets this wrong.

However, if you are in an expression oriented language, it is possible
to get really strange effects here.  I would write the above statement
as:

	   x := a0+a1+a2
		+a3+a4+a5;

Having submitted this to Icon, I found that the program didn't work as
expected.  It took me quite a bit of debugging to find that this was
being treated as two statements, and so x was only assigned a0+a1+a2.
Further, I then tried to write this as

	   x := (a0+a1+a2
		+a3+a4+a5);

and I was informed that I was missing a right parenthesis after a2
(and before the inserted semicolon).  Note that Icon would correctly
handle the case of the trailing + as Firth wrote his example.

Generally, I prefer systems that have me say what I mean, rather than
guess.  And if they must guess, they should have the grace to indicate
their guesses.

It is certainly possible that Icon no longer suffers from this
particular lexical/syntactic quirk; I haven't used it in quite a while.
--

-----
Gary Levin/Dept of Math & CS/Clarkson Univ/Potsdam, NY 13676/(315) 268-2384
BitNet: gary@clutx   Internet: gary@clutx.clarkson.edu

chase@Ozona.orc.olivetti.com (David Chase) (12/16/88)

In article <GARY.88Dec15142623@milo.mcs.clarkson.edu> gary@milo.mcs.clarkson.edu (Gary Levin) writes:
>In article <8008@aw.sei.cmu.edu> firth@sei.cmu.edu (Robert Firth) writes:
>>   The best, in my opinion, is to have the semicolon optional: a newline
>>   will terminate the statement if possible.

>Further, I then tried to write this as
>
>	   x := (a0+a1+a2
>		+a3+a4+a5);
>
>and I was informed that I was missing a right parenthesis after a2
>(and before the inserted semicolon).  Note that Icon would correctly
>handle the case of the trailing + as Firth wrote his example.

A flaw in Icon's implementation of optional terminators is hardly a
condemnation of optional terminators; other parsers have been getting
it right for years.  I agree rather completely with Mr. Firth's
message; I've used a language with optional semicolon syntax, and
liked it very much.  It is astounding how much the little things often
matter.  (It's also astounding what horrible parsers some compilers
have.)

I should add that the trailing operator to indicate continuation
rapidly becomes a part of your style when working in such a language.

One actually wonders, after reading the keystroke-minimizing arguments
on "=/:=" vs "==/=" in Kernighan and Ritchie, why they made semicolons
mandatory.  BCPL didn't have them.  They could have saved one whole
character per statement.  Enquiring minds want to know.

David

chl@r1.uucp (Charles Lindsey) (12/16/88)

In article <208100002@s.cs.uiuc.edu> carroll@s.cs.uiuc.edu (Alan Carroll) writes:
>/* Written 10:10 am  Dec  4, 1988 by bct@lfcs.ed.ac.uk in s.cs.uiuc.edu:comp.lang.misc */
>/* ---------- "Re: What makes a language successfu" ---------- */
> Another sad point is that even died-in-the wool Algol 68 people like Charles
>Lindsey have drifted into putting semi-colons before ENDs.

Time for "died-in-the-wool Charles Lindsey" to say what he really does.

>I always thought that the 'no ; before END' part of PASCAL was one of the worst
>'features' of the language.

First let us understand the rules:
	PASCAL has a dummy-statement, so ';' before END is OK
	ALGOL 68 has no dummy-statement, so ';' before END is forbidden

In article <868@mcrware.UUCP> jejones@mcrware.UUCP (James Jones) writes:
>I never could understand what the difficulty is with Algol-style semicolons.
>...  In Algol, semicolons *separate* statements. ...

Yes, but Carroll still has a point

>... I can't count how many times I had to recompile
>because I had added a statement before an END and forgotten to put a ; on
>the *previous* statement.

Here is what Charles Lindsey actually does:
When writing in PASCAL, I always put a ';' at the end of each
	statement, even before an END, for the reason Carroll gave.
When writing in ALGOL 68, I always put a ';' BEFORE each statement (except the
	first). Here is the example that started all the fuss (so far as I
	remember it).

	BEGIN LOC INT i := 0, j := 1
	  ;   LOC REF INT ptr := i
	  ;   ptr := j
	  ;   print(i)
	END

>{James Jones again} "end" and "begin" serve the same purpose as paren-
>theses, and indeed, Algol 68 makes this even more obvious by allowing the
>use of parentheses in place of begin and end if one wishes.

Indeed, and it looks even better (and is easier to type) this way:

	( LOC INT i := 0, j := 1
	; LOC REF INT ptr := i
	; ptr := j
	; print(i)
	)

Charles Lindsey	chl@ux.cs.man.ac.uk

perelgut@turing.toronto.edu (Stephen Perelgut) (12/16/88)

Another method of handling semi-colons is to not have them.  All you
need to do is terminate multi-part statements.  For example:
	if x then
	    s1
	    s2
	elsif y then
	    s3
	    s4
	else
	    s5
	end if

There are no ambiguities there!  And the only cost is terminating a statement.
For example: if/end if; for/end for; loop/end loop; case/end case; etc.
This also has the effect of making programs more readable.

kwalker@arizona.edu (Kenneth Walker) (12/17/88)

In article <34366@oliveb.olivetti.com>, chase@Ozona.orc.olivetti.com (David Chase) writes:
> 
> A flaw in Icon's implementation of optional terminators is hardly a
> condemnation of optional terminators; other parsers have been getting
> it right for years.

The problem is more the fact that Icon's syntax was not designed with the
idea of making trailing semicolons options. As Gary Levin pointed out it
is an expression oriented laguage. Suppose you write the rather strange
expression

   return {
      y := a
      -b
      }

It is translated as

   return {
      y := a;
      -b;
      }

which is meaningful. y is assigned a and -b is returned. On the other hand

   y := a
   -b
   x := 2

is also translated similarly

   y := a;
   - b;
   x := 2

It is not until you get deeply into the semantics that you determine that
-b has no effect. At that point it is getting a little late to change your
mind on how to parse the expression (not that it is impossible with a
hand coded parser, but I wouldn't care to do it). Icon actually adds
semicolons in the lexical analyser and does so based only on the last
token of the line and the first one of the next. This is not a perfect
solution, but as David pointers out:

> I should add that the trailing operator to indicate continuation
> rapidly becomes a part of your style when working in such a language.
  
The syntax of C is actually worse than Icon if you want to make semicolons
optional "after the fact" (of language design). It has suffix operators, so
you cannot use the trailing operator rule to continue a line.

(I find I have no problems with Icon's approach to not requiring trailing
semicolons, except that the C compiler doesn't like my coding style
any more :-)

   Ken Walker / Computer Science Dept / Univ of Arizona / Tucson, AZ 85721
   +1 602 621 2858  kwalker@Arizona.EDU   {uunet|allegra|noao}!arizona!kwalker

toma@tekgvs.GVS.TEK.COM (Tom Almy) (12/20/88)

The question in my mind is *not* "Is a semicolon a statement separator
or statment delimiter?" but *is* "Why do we need semicolons at all?".

Many years ago (about 20) I took a compiler writing class where we had
to write a compiler for an "Algol" like language.  A few weeks into the
project my partner and I noticed that the only thing our recursive descent
parser did with semicolons was to issue an error message if they were
absent!  So we altered the language spec to eliminate semicolons altogether!

In the same time frame, I was using BCPL (the predicessor of C).  BCPL would
assume a semicolon at the end of each line which could be successfully parsed
as a statment.  This eliminated virtually all the semicolons in every BCPL
program I wrote, yet the rule cost virtually nothing in parsing overhead.

I would contend that most, if not all, Algol-derived languages would work
just fine without semicolons!  The only problem being the way some languages
define  control structures require the existance of the semicolon as a null
statement.

Tom Almy
toma@tekgvs.tek.com
Standard Disclaimers Apply

smryan@garth.UUCP (Xxxxxx Xxxx) (12/21/88)

>The question in my mind is *not* "Is a semicolon a statement separator
>or statment delimiter?" but *is* "Why do we need semicolons at all?".

Parse this:

     begin
       a
       +b
     end

Is that a+b or a;+b?

>I would contend that most, if not all, Algol-derived languages would work
>just fine without semicolons!

Most could also do without declarations, as does Fortran.

Redundancy is used to detect, maybe correct, errors in the presence of noise
(every use a backspace key). Natural language is chockful of redundancy, much
more than programming languages.
-- 
                                                   -- x x xxxx
+------------------------------------+-----------------------------------------+
|`Xx X-Xxxx xx xxxxx Xxxxxxx xxxxx,  |`Xxxxx xx xxxxxx xxx xxxx xxx            |
| Xxxxxxx xxx'x xxxxxx.'             |xxxxxxxxxxxxxx xxx xxxxx xxxx xxxxxxx xxx|
|           -Xxxxxx Xxxx             |xx. Xx xxxx xx xx xx xxxxxxxxx.'  -X Xxxx|
+------------------------------------+-----------------------------------------+

nevin1@ihlpb.ATT.COM (Liber) (12/21/88)

In article <88Dec16.100919est.4327@turing.toronto.edu> perelgut@turing.toronto.edu (Stephen Perelgut) writes:
|Another method of handling semi-colons is to not have them.  All you
|need to do is terminate multi-part statements.  For example:
|	if x then
|	    s1
|	    s2
|	elsif y then
|	    s3
|	    s4
|	else
|	    s5
|	end if
|
|There are no ambiguities there!  And the only cost is terminating a statement.
|For example: if/end if; for/end for; loop/end loop; case/end case; etc.
|This also has the effect of making programs more readable.

Yes, but what do you do with statements that span more than one line?
You still need a way of saying that an expression is to be continued on
the next line (or that you don't need to continue a statement by
use of a separator or terminator; this amounts to the same thing).
-- 
NEVIN ":-)" LIBER  AT&T Bell Laboratories  nevin1@ihlpb.ATT.COM  (312) 979-4751

eugene@eos.UUCP (Eugene Miya) (12/21/88)

Some random ponderings based on comments to ALGOL like languages
(gross oversimplifications):

I wonder if the COBOL people argue over the '.' [period] as a statement
terminator as the semicolon as a seperater or terminator?  {Perlis's
had an Epigram on this (I sent them to comp.lang.sigplan but have not
seen them reposted).}

Modula-2 is appears closer to Mesa than Modula[-1].  You only need read
the text.  Wonders what a sabbatical at PARC will do for you 8-).

On ":=" versus "=": Altos had an backward arrow assignment key, not "<-" for
Mesa.  (this could correspond to other language arrows (UP) ^ and (RIGHT)
"->".)  Just standardize some extra characters.  8-) It will never happen.

Another gross generalization from

--eugene miya, NASA Ames Research Center, eugene@aurora.arc.nasa.gov
  resident cynic at the Rock of Ages Home for Retired Hackers:
  "Mailers?! HA!", "If my mail does not reach you, please accept my apology."
  {uunet,hplabs,ncar,decwrl,allegra,tektronix}!ames!aurora!eugene
  "Send mail, avoid follow-ups.  If enough, I'll summarize."

anw@nott-cs.UUCP (12/21/88)

In article <4396@tekgvs.GVS.TEK.COM> toma@tekgvs.GVS.TEK.COM (Tom Almy) writes:
>
>[...] "Why do we need semicolons at all?".

	[He writes a compiler that doesn't use semicolons at all ...]

>I would contend that most, if not all, Algol-derived languages would work
>just fine without semicolons!  The only problem being the way some languages
>define  control structures require the existance of the semicolon as a null
>statement.

	When I was running a "Compilation Techniques" course, a few years
back, one of the assignments was to alter the Pascal syntax in various ways,
Yacc the results, and comment.  One of the ways was to throw away semicolons.
In Pascal, as TomA asserts, the *only* problem is that it becomes difficult
to see where the null statements are.  If null statements are -- as they
should be -- explicitly written, eg as "SKIP", even that problem goes away,
and the semicolon becomes totally redundant.

	In C and Algol, semicolons are harder to dispose of.  For example,
consider the Algol:

	UNION (REAL, PROC (REAL) REAL) foo = cos #;# (pi)
		# is "foo" set to "cos" or to "-1.0"? #
    or
	PROC REAL bar: (x := pi #;# - 2)
		# return "-2" and assign "pi", or assign & return "pi-2"? #
    or
	STRING s := "Hello World!" #;# [7] #;# REAL x
		# "s" is "Hello World!" and "x" is an array, or "s" is "W"
		  and "x" is a real variable? #

C examples left to the reader!

-- 
Andy Walker, Maths Dept., Nott'm Univ., UK.
anw@maths.nott.ac.uk

perelgut@csri.toronto.edu (Stephen Perelgut) (12/22/88)

> |perelgut@turing.toronto.edu (Stephen Perelgut) writes:
> | Another method of handling semi-colons is to not have them.  All you
> | need to do is terminate multi-part statements.  For example:
> | 
> | <Code fragment deleted>
> |
> 
> NEVIN LIBER  AT&T Bell Laboratories  nevin1@ihlpb.ATT.COM  (312) 979-4751
>  Yes, but what do you do with statements that span more than one line?
>  You still need a way of saying that an expression is to be continued on
>  the next line (or that you don't need to continue a statement by
>  use of a separator or terminator; this amounts to the same thing).

My example wasn't entirely clear on this point.  Careful definition of a
language will allow statements to unambiguously extend across any arbitrary
amounts of whitespace including blanks, tabs, form-feeds and newlines.  For
example, in the Turing programming language I might be acting dumb and write
a statement that looks like
	if x
    / 7 >
    12 then put "Hi there." else put
    "It's not quite right" end
    if

You might prefer to see this written as
	if x/7 > 12 then
	    put "Hi there."
	else
	    put "It's not quite right"
	end if

The Turing language definition (and the various interpreters and compilers)
would treat both exactly the same.  The paragraphing feature of the environment
supplied with the compilers would try to turn the first into a semblence of
the second.

One caveat, atomic elements cannot cross line boundaries so you can't split
a real number or a string literal across two lines.  But you can break up
statements at any other point, even expressions
	"This string contains the value" +
	" of 7 + 5 as a string literal: " + intstr ( 7
	+ 5)

phipps@garth.UUCP (Clay Phipps) (12/22/88)

In article <9235@ihlpb.ATT.COM> nevin1@ihlpb.UUCP (55528-Liber,N.J.) writes:
>In article <88Dec16.100919est.4327@turing.toronto.edu>, 
>perelgut@turing.toronto.edu (Stephen Perelgut) writes:
>|Another method of handling semi-colons is to not have them.  
>|All you need to do is terminate multi-part statements.  [example shown] 
>|There are no ambiguities there!  And the only cost is terminating a statement.

>You still need a way of saying that an expression is to be continued on
>the next line (or that you don't need to continue a statement by
>use of a separator or terminator; this amounts to the same thing).

I have long been fond of using the opposite of terminators; one might call 
them triggers, introductory tokens or "introducers" ("initiator", intended
as opposite of "terminator", might be confused with value initialization).

Most Algol-derived languages that I know of have an introducer -- a keyword
-- for every statement type except assignment and (usually) procedure calls
and (sometimes) value returns.  Treated as statements, "end if", "end loop",
&c. fit well into such a scheme: they are brackets at the semantic level.

The language CMS-2 uses few special characters; the almighty dollar "$" 
is its terminator symbol (one particularly appropriate for the military :-).
Unlike most languages on the Algol side of COBOL, CMS-2 uses 
an introductory token for every statement (except procedure calling ?)
When I first saw the assignment statement form:
 
	"SET" TargetObject "TO" SourceValue "$"

I immediately thought of COBOL verbosity, but was surprised to realize that 
in practice, I could type "SET" and "TO" faster than I could find and type
the (shifted) ":" and (unshifted) "=" to form an Algol assignment token 
(we used 3 different keyboards on that project, none with ASCII layout).
In the CMS-2Y "structured programming" dialect of CMS-2,
the "$" was more of an artifact than a necessity, even for error recovery.

I believe that a syntax style that uses introducers rather than
terminators for all (or all but one) statement type combines 
the source-layout and error-recovery advantages 
of always-terminated statements, and 
and the easy modifiability (lack of "%#@!  I forgot to insert|remove...!") 
and typing speed advantages of never-terminated statements,
to the greatest extent possible for those conflicting goals.

Fleshing out the example by perelgut@turing.toronto.edu (Stephen Perelgut)
along these lines, I have
 
 	if x then
	    get p from p^.next  -- compute the value p^.next; store in p
	    give p^             -- return the value p^
	elsif y then
	    get p from p^.prev  -- compute the value p^.prev; store in p
	    give p^             -- return the value p^
	else
	    give nil            -- return the nil value
	end if

Notes: The "--" introduces a comment, which continues to CR | LF.
I replaced "set" by "get" to avoid confusion with Pascal powersets, and 
chose the shortest sensible English words for keywords, to minimize typing.

The scheme allows not only statements continued over multiple lines, 
but also multiple statements on a single line (e.g., when nonpurists 
would agree that keeping everything on a page promotes comprehension
to a greater extent than isolating each statement on 1 line).
The only thing that does not work smoothly is imbedded assignment,
although that could be done via parentheses.
The style shown here best fits statement-oriented, rather than
expression-oriented, languages.

I recognize that all this is not earth-shaking programming language theory;
it is just my 2 bits worth on reducing common sources of aggravation.
It may already have been done, perhaps by the ABC folks at CWI/Amsterdam.
-- 
[The foregoing may or may not represent the position, if any, of my employer]
 
Clay Phipps                       {ingr,pyramid,sri-unix!hplabs}!garth!phipps
Intergraph APD, 2400#4 Geng Road, Palo Alto, CA 93403            415/494-8800

pokey@well.UUCP (Jef Poskanzer) (12/22/88)

In the referenced message, eugene@eos.UUCP (Eugene Miya) grossly generalizes:
>Modula-2 is appears closer to Mesa than Modula[-1].

But not close enough!  Modula-3 appears much closer.  Doesn't matter, though;
since it doesn't look like C, it won't catch on.  Perhaps someday someone
will graft Mesa's functionality onto C's syntax.  We can call the result
C Plus Mesa, or CP/M for short...

>On ":=" versus "=": Altos had an backward arrow assignment key, not "<-" for
>Mesa.  (this could correspond to other language arrows (UP) ^ and (RIGHT)
>"->".)  Just standardize some extra characters.  8-) It will never happen.

Old-timers will recall that on old teletypes, '_' was a back-arrow and '^'
was an up-arrow.  The switch was made around 1971, and for a while some
people wanted to look for old-style print heads for the new ttys we were
getting.  But eventually we decided it was wiser to stick with the standard.
Xerox went the other way, and kept arrow glyphs for those characters in
(most of) their fonts.  That was not too bad, but around 1984 they decided
to diverge even further from ASCII, and defined a bunch of characters in
the range 128-255.  They added arrows in all four directions, '<<' and
'>>' (French quotation characters, used in Mesa as comment delimiters),
and some other chud.  It made mail gatewaying rather, um, interesting.
---
Jef

             Jef Poskanzer   jef@rtsg.ee.lbl.gov   ...well!pokey
    "C combines the power of assembly language ... with the flexibility of
                             assembly language."

mike@arizona.edu (Mike Coffin) (12/25/88)

For an example of a language that tries to make everyone happy, see SR
[TOPLAS, Jan 1988].  All semicolons are optional, so you can pretend
they terminate statements, or separate statements, or you can leave
them out entirely.  The grammar was designed with this in mind, so
there is no ambiguity.

-- 
Mike Coffin				mike@arizona.edu
Univ. of Ariz. Dept. of Comp. Sci.	{allegra,cmcl2}!arizona!mike
Tucson, AZ  85721			(602)621-2858

djones@megatest.UUCP (Dave Jones) (12/28/88)

> In article <4396@tekgvs.GVS.TEK.COM> toma@tekgvs.GVS.TEK.COM (Tom Almy) writes:
>
>[...] "Why do we need semicolons at all?".
> 


They are "gobble-stoppers" -- symbols which clue the compiler as to
how to recover from a syntax error. A typical LR technique is the following:
On an error, the compiler discards states from the parse stack until a
state which can shift "error" is uncovered. It shifts the error. 
It then makes reductions any necessary before it discards (gobbles)
tokens which can not be shifted. Success is declared when some fixed
number of consecutive tokens have been shifted since the last error. The
usual number is three.  But when a gobble stopper is found, you
can say, "That's enough."

Here's a short yacc example from a production Pascal compiler currently
under construction.  The nonterminal "semicolon" is used all over the place. 
When a semicolon is actually found, yyerrok tells the parser that no 
more tokens need be shifted in order to consider that an error has been 
"recovered from".  If the semicolon is missing, we could be silent, but 
for the sake of portability, we issue a warning message.

(Note to gurus: The version of yacc being used is a repaired version
which does not have the bogus default reduction bug, which would
invalidate the error-recovery.)

semicolon:  ';' { yyerrok; }
         |  '.' { Mpc_warning("Period replaced with semicolon."); }
         |      { Mpc_warning("Semicolon inserted."); }

;


label_declaration
                : LABEL integer_list semicolon
                        { Mpc_add_labels($2); }
;

integer_list    : INTEGER
                     { $$ = List_new(); List_append($$, $1); }
                | integer_list ',' INTEGER
                      { List_append($1, $3);
                | error
                      { $$ = List_new(); }
                | integer_list ',' error
                      { $$ = $1; }