[comp.compilers] Semicolons

bart@videovax.tv.tek.com (Bart Massey) (08/14/90)

In article <1990Aug12.205529.11691@esegue.segue.boston.ma.us> Stephen D. Clamage <steve@taumet.com> writes:
> There has been some discussion about *how* to design languages which do
> not need semicolons to separate or end statements.  No one has brought up
> why you would want to.
...
> Finally, beginning programmers are going to make all kinds of errors, for
> all kinds of reasons.  For more-experienced programmers, are semicolon
> errors a real problem -- as big as other kinds of syntax errors?  I'd say
> no.

Well, I'll admit it :-).  I've been programming in C professionally for
about 5 years, and about 1/2 my extra trips into the editor are still to
correct missing semicolons.  I almost never *add* semicolons, though.
This is true of most C programmers I know.  And note that when the added
semicolon does appear, it usually takes the insidious form

	x = 100;
	while( x );/* a nasty case */
	/* 
	 * these comments are intended only to obscure the fact
	 * that the next statement will never be executed...
	 */
		x--;

I claim that a moment's thought tells us why it is that semis are such a big
problem in traditional languages -- we all have been taught a "structured"
style which emphasizes regular use of whitespace, but traditional compiled
languages are completely insensitive to the use of whitespace.

> One example was given of
> 	a = b
> 	    + c
> as being a legal sequence of statements in one language.  Almost certainly
> this was meant to be a single statement.

What made you think this?  The exactly-one-assignment-statement-per-line
convention, a very common idiom across many otherwise widely different
languages (LISP and friends being the most obvious exception I can think of
offhand).  And yet your compiler or interpreter probably wouldn't even
optionally whine about the above, much less refuse to generate code.

If I ever design a C-like language (which is unlikely, since C is
pretty good for this :-), it'll be spec'ed in such a way as to generate
warnings if line breaks appear in funny places, or the indentation
looks wrong.  It's a bit harder to implement, but I'm just plain tired
of debugging code (of my own and others' :-) like

	if( v )
		w;
		if( x )
			y;
	else /*XXX*/;
		z;
					Bart Massey
					..tektronix!videovax.tv.tek.com!bart
					..tektronix!reed.bitnet!bart
[I never found rogue semicolons to be such a problem, but I suspect that
my style uses a lot more braces than yours. -John]
-- 
Send compilers articles to compilers@esegue.segue.boston.ma.us
{ima | spdcc | world}!esegue.  Meta-mail to compilers-request@esegue.

peterd@uunet.UU.NET (Peter Desnoyers) (08/21/90)

bart@videovax.tv.tek.com (Bart Massey) writes:

>In article <1990Aug12.205529.11691@esegue.segue.boston.ma.us> Stephen D. Clamage <steve@taumet.com> writes:
>> There has been some discussion about *how* to design languages which do
>> not need semicolons to separate or end statements.  No one has brought up
>> why you would want to.
>...
>> Finally, beginning programmers are going to make all kinds of errors, for
>> all kinds of reasons.  For more-experienced programmers, are semicolon
>> errors a real problem -- as big as other kinds of syntax errors?  I'd say
>> no.

My experience in with CLU as an undergraduate was that terminating statements
with a semicolon is a good thing, as it makes it easier for a parser to 
detect errors quickly and accurately. My impression was that a syntax error
would cause the parser to trip up somewhere in the middle of the next page :-)
(slightly exaggerated, but the combination of a permissive grammar and a
non-industrial-grade compiler made for difficult detection of errors.)

				Peter Desnoyers
-- 
Send compilers articles to compilers@esegue.segue.boston.ma.us
{ima | spdcc | world}!esegue.  Meta-mail to compilers-request@esegue.

thomasm@llama.ingres.com (Tom Markson) (08/22/90)

> Well, I'll admit it :-).  I've been programming in C professionally for
> about 5 years, and about 1/2 my extra trips into the editor are still to
> correct missing semicolons.  I almost never *add* semicolons, though.
> This is true of most C programmers I know.  And note that when the added
> semicolon does appear, it usually takes the insidious form

> 	x = 100;
>	while( x );/* a nasty case */
>	/* 
>
>	 * these comments are intended only to obscure the fact
>	 * that the next statement will never be executed...
>
>	 */
>		x--;


The real problem here is not with the semicolon, it is with the grammar 
production that dominates C and Pascal:

while:	WHILE (expr) statement
	;

statement: '{' statements '}'
	 | while ';'
	 | /* etc */
	 ;

Thus, If WHILE were terminated with ENDWHILE, This error would not occur:

while:  WHILE (expr) 
		statements ';' 

	ENDWHILE ';'
	;

statements: statements ';' statement
	  | statement
	  ;


The semicolon in the erroneous code example above would simply be an empty 
statement.

> I'm just plain tired of debugging code (of my own and others' :-) like
>
>	if( v )
>		w;
>		if( x )
>			y;
>	else /*XXX*/;
>		z;

Again, this is the same problem as above.  As John mentioned, one safe
way to do it in C is to ALWAYS use braces after an if,for, etc..  You'll
find it makes your life much easier.

-- 
Tom Markson					Unix Systems
email: thomasm@llama.rtech.com			Ingres Corp
phone: 415-748-250
-- 
Send compilers articles to compilers@esegue.segue.boston.ma.us
{ima | spdcc | world}!esegue.  Meta-mail to compilers-request@esegue.

trt@rti.rti.org (Thomas Truscott) (08/24/90)

> >	while( x );/* a nasty case */
> >		x--;
> 
> The real problem here is not with the semicolon, it is with the grammar 
> production that dominates C and Pascal:

The problem is with the compiler that fails to issue warnings about the
obvious problems with the above code:

prog.c: line 12: degenerate "while" loop
prog.c: line 13: dubious indentation level.


Tinkering with language syntax can go only a short distance towards
stamping out the bugs that typical code contains.  Syntax restrictions
would be hard put to catch bugs such as:

prog.c: line 20: variable "i" may be used before set

	p = malloc(strlen(s));
			    ^
prog.c: line 32: was "strlen(s) + 1" intended?

I believe this type of diagnostic assistance, fueled by examples of
typical errors, is far more effective than adding syntactic handcuffs.  I
was quite surprised to learn, this evening, of a C compiler that actually
does this level of diagnostic analysis.  (Along with convenient options to
control the degree of nit-picking).  Perhaps if it becomes widely
available (and demanded), compilers for other languages will do it too, to
catch errors such as:

	while ( x )
		y--;
	endwhile;

Tom Truscott
[Which compiler is it?  And when I really do want to write a degenerate
while loop, how do I tell it not to complain? -John]
-- 
Send compilers articles to compilers@esegue.segue.boston.ma.us
{ima | spdcc | world}!esegue.  Meta-mail to compilers-request@esegue.

mjr@decuac.DEC.COM (Marcus J. Ranum) (08/26/90)

In article <4032@rtifs1.UUCP>, trt@rti.rti.org (Thomas Truscott) writes:

> prog.c: line 13: dubious indentation level.

	No! No, please! Not indentation in the compiler! :) Next we'd
see things like:

prog.c: line 666: complicated expression, please comment it

> 	p = malloc(strlen(s));
> 			    ^
> prog.c: line 32: was "strlen(s) + 1" intended?

	What if 'p' was part of (or going to be part of) some data
structure that contained the string size encoded in some other manner ? I
can imagine building such capabilities into a pre-processor of some sort,
that did idiot checking based on some information about the kind of things
you wanted to do, but I don't think it belongs in the compiler (please!).

	I'm really impressed with the functionality something like Saber-C
provides for catching stupid errors like the one above. Somehow, support
for catching that kind of thing needs to be built into the development
process at code-writing-time, not at compile-time. Do any syntax-directed
editors handle such things ? I can imagine a syntax directed editor that
understood The One True Brace Style :), had lint built into it, and so on.
That way I wouldn't have to use it!

mjr.
-- 
Send compilers articles to compilers@esegue.segue.boston.ma.us
{ima | spdcc | world}!esegue.  Meta-mail to compilers-request@esegue.

bart@videovax.tv.tek.com (Bart Massey) (08/27/90)

In article <3263@decuac.DEC.COM> mjr@decuac.DEC.COM (Marcus J. Ranum) writes:
...
> I can imagine building such capabilities into a pre-processor of some sort,
> that did idiot checking based on some information about the kind of things
> you wanted to do, but I don't think it belongs in the compiler (please!).
...

I, on the other hand, am quite happy with the general direction GNU CC has
taken, which is to provide a bunch of warnings for the kinds of suspect
constructs discussed in the above-referenced articles (although nothing that
clever, yet), but to require that the warnings be seperately and explicitly
enabled with compile-time switches.  Thus, I tend to use these things the
way I use lint, which is to turn them on only when I really can't understand
why some code is broken, and have been staring at it for hours.  Often, GCC
or lint will point out a legal but questionable line of code which will
suddenly make the whole problem clear to me -- especially if I was trying to
read someone else's code :-)...

					Bart Massey
					..tektronix!videovax.tv.tek.com!bart
					..tektronix!reed.bitnet!bart
-- 
Send compilers articles to compilers@esegue.segue.boston.ma.us
{ima | spdcc | world}!esegue.  Meta-mail to compilers-request@esegue.

ado@uunet.UU.NET (Arthur David Olson) (08/28/90)

> I was quite surprised to learn. . .of a C compiler that actually [will]
> catch errors such as:
> 
> 	while ( x )
> 		y--;
> 	endwhile;
> 
> [Which compiler is it?  And when I really do want to write a degenerate
> while loop, how do I tell it not to complain? -John]

A compiler used here complains about
	if ( expr ) ;
but not about
	if ( expr ) {}
since the second is much less likely to be a typo than the first.
-- 
		Arthur David Olson	ado@elsie.nci.nih.gov
		ADO and Elsie are Ampex and Borden trademarks
[Once again, which compiler is it? Inquiring minds want to know. -John]
-- 
Send compilers articles to compilers@esegue.segue.boston.ma.us
{ima | spdcc | world}!esegue.  Meta-mail to compilers-request@esegue.

anw@maths.nott.ac.uk (Dr A. N. Walker) (08/29/90)

|> >	while( x );/* a nasty case */
|> >		x--;
|
|prog.c: line 12: degenerate "while" loop
|
|Tom Truscott
|[Which compiler is it?  And when I really do want to write a degenerate
|while loop, how do I tell it not to complain? -John]

	while (x)
		skip;		/* Down with empty null statements! */

-- 
Andy Walker, Maths Dept., Nott'm Univ., UK.
anw@maths.nott.ac.uk
-- 
Send compilers articles to compilers@esegue.segue.boston.ma.us
{ima | spdcc | world}!esegue.  Meta-mail to compilers-request@esegue.

liam@cs.qmw.ac.uk (William Roberts) (09/04/90)

In <1990Aug29.140407.28378@maths.nott.ac.uk> anw@maths.nott.ac.uk (Dr A. N. Walker) writes:

>       while (x)
>               skip;           /* Down with empty null statements! */

The thing I hate most about Modula-2 is that it won't allow a semicolon on
the last statement of a loop-body, on the dubious grounds that you don't need
one. Languages which do that clearly aren't written to be edited: at least,
not without clever editors that won't let you forget the very necessary
semicolon between the old last-statement-of-the-loop and the new one you have
just inserted.

Language designers should try to remember that a) programs get modified, and
b) the syntax of the language is in part its "user interface" and should be
given some thought as to its usability, not just its denotational purity!
-- 
William Roberts                 ARPA: liam@cs.qmw.ac.uk
Queen Mary & Westfield College  UUCP: liam@qmw-cs.UUCP
Mile End Road                   AppleLink: UK0087
LONDON, E1 4NS, UK              Tel:  071-975 5250 (Fax: 081-980 6533)
[I have seen many reports that the semicolon as separator, as in Algol 60
and Pascal, is much harder to get right than the semicolon as terminator,
as in PL/I and C. -John]
-- 
Send compilers articles to compilers@esegue.segue.boston.ma.us
{ima | spdcc | world}!esegue.  Meta-mail to compilers-request@esegue.

firth@sei.cmu.edu (Robert Firth) (09/05/90)

In article <2753@sequent.cs.qmw.ac.uk> liam@cs.qmw.ac.uk (William Roberts) writes:
>The thing I hate most about Modula-2 is that it won't allow a semicolon on
>the last statement of a loop-body...

It will according to the syntax in PIM-2.  Well, technically the semicolon is
followed by a null statement, but the effect is what you want.
-- 
Send compilers articles to compilers@esegue.segue.boston.ma.us
{ima | spdcc | world}!esegue.  Meta-mail to compilers-request@esegue.

lins@apple.com (Chuck Lins) (09/05/90)

>The thing I hate most about Modula-2 is that it won't allow a semicolon on
>the last statement of a loop-body, on the dubious grounds that you don't need
>one.

There is considerable difference between 'not allowed' and 'not required'.
Modula-2 does not *require* the semicolon on the last statement of a loop body
(or any other place where a statement sequence is allowed). For the reason you
mention (maintainability) it's useful to always use the semicolon. I've been
doing this for 10 years now. If you compiler won't let you write,

WHILE (x > 0) DO
    DEC(x);      (* <-- a semicolon, gentle reader *)
END;

your compiler is broken. In Modula-2, empty statements may freely occur within
a statement sequence. An empty statement consists of just a semicolon since
it's merely a separator.

--
Chuck Lins (lins@apple.com)
Apple Computer, Inc
20525 Mariani Ave
MS 37-BD
Cupertino, CA 95014 USA
-- 
Send compilers articles to compilers@esegue.segue.boston.ma.us
{ima | spdcc | world}!esegue.  Meta-mail to compilers-request@esegue.

rob@baloo.eng.ohio-state.edu (Rob Carriere) (09/06/90)

In article <2753@sequent.cs.qmw.ac.uk> you write:
[agreeing with liam@cs.qmw.ac.uk]
>[I have seen many reports that the semicolon as separator, as in Algol 60
>and Pascal, is much harder to get right than the semicolon as terminator,
>as in PL/I and C. -John]

The people at the math department of the Eindhoven University of Technology
claimed that this was because everybody _wrote_ semicolons as terminators.
They said that in a language with semicolon separators one should not write

S1;
S2;
S3

but

 S1
;S2
;S3

I tried this with some Algol 60 programs, and my experience is that it looks
funny for a day or so, but after that you won't have any problems with
forgotten or extra semi's (you may have other problems :-)

Comment1: Accidentely forgetting to insert a semi when you add a statement at
the _top_ is also visually obvious:

 S0
 S1
;S2
;S3

Comment2: I never bothered to find out who invented this convention, so I
can't give proper credit.

Rob Carriere, rob@kaa.eng.ohio-state.edu
"De semicolonius non est disputandum"  :-)
[Ugh.  I rest my case. - John]
-- 
Send compilers articles to compilers@esegue.segue.boston.ma.us
{ima | spdcc | world}!esegue.  Meta-mail to compilers-request@esegue.

MERRIMAN@ccavax.camb.com (George Merriman -- CCA/NY) (09/06/90)

In article <2753@sequent.cs.qmw.ac.uk>, liam@cs.qmw.ac.uk (William Roberts) writes:
> [I have seen many reports that the semicolon as separator, as in Algol 60
> and Pascal, is much harder to get right than the semicolon as terminator,
> as in PL/I and C. -John]

I find it much easier to deal with semicolon-as-separator  (actually
semicolon-as-sequence-operator) if I use a coding convention I first
noticed in a E. Dykstra paper:

	While somesuch_boolean
    	    begin
      	      statement_1
    	    ; statement_2
    	    ; statement_3
    	    end

While I find this easy for me, and find the idea of
semicolon-as-sequence-operator very helpful, I don't dare use it in any real
code because the next poor sod will probably have no idea what I'm up to!
-- 
Send compilers articles to compilers@esegue.segue.boston.ma.us
{ima | spdcc | world}!esegue.  Meta-mail to compilers-request@esegue.

dik@cwi.nl (Dik T. Winter) (09/06/90)

In article <9009052108.AA15597@baloo.eng.ohio-state.edu> rob@baloo.eng.ohio-state.edu (Rob Carriere) writes:
[ about the convention to put semicolons in front of a statement ]
...
 > Comment2: I never bothered to find out who invented this convention, so I
 > can't give proper credit.

I believe it was Charles Lindsey in some issue of the Algol Bulletin.

An advantage is also that it clearly shows the pairs of opening and closing
delimiters of a sequence of statements.  I.e. what is the closing brace
belonging to this opening brace?
--
dik t. winter, cwi, amsterdam, nederland
dik@cwi.nl
-- 
Send compilers articles to compilers@esegue.segue.boston.ma.us
{ima | spdcc | world}!esegue.  Meta-mail to compilers-request@esegue.

rhl@grendel.princeton.edu (Robert Lupton (the Good)) (09/06/90)

I don't quite see the worry about syntax errors due to missing/extra
semicolons. I've been writing C for 10 years, and many tens of thousands
of lines later I can only remember one set of problems due to making
semicolon mistakes, namely in the days when I used

	float x,
	      y,	      
	      z;
	      
I'd sometimes change z's name to yy and switch the last two lines (had
to be alphabetic, you know |-) and end up with an int by mistake. Since
I stopped using this declaration style I don't remember a single ; error.

			Robert
[C has semicolon as terminator rather than as separator, which seems easier
to get right.  -John]


-- 
Send compilers articles to compilers@esegue.segue.boston.ma.us
{ima | spdcc | world}!esegue.  Meta-mail to compilers-request@esegue.

dolf@idca.tds.philips.nl (Dolf Grunbauer) (09/07/90)

>In article <2753@sequent.cs.qmw.ac.uk> Rob Carriere wrote:
>[re putting the semicolon at the front of this line in Algol-like languages]

I think it was Edgar Dijkstra but I am not quite sure. The purpose of these
leading semicolons was that you can easily see which closing bracket belongs
to which opening bracket as the semicolons make some sort of a line. Example:

if condition
then  statement1
;     statement2
;     if other_condition
      then statement3
      ;    statement4
      else statement5
      ;    statement6
      fi
;     statement7
fi
-- 
Dolf Grunbauer  Tel: +31 55 433233 Internet dolf@idca.tds.philips.nl
Philips Information Systems        UUCP     ...!mcsun!philapd!dolf


-- 
Send compilers articles to compilers@esegue.segue.boston.ma.us
{ima | spdcc | world}!esegue.  Meta-mail to compilers-request@esegue.