[comp.lang.misc] improve language by dropping ;

cosc13gb@jetson.uh.edu (02/14/91)

discussion not flame invite
  why does language has separator ; ? Is it for the benefit of the compiler
  can compiler now smart enough to drop the ;
  ; will always slow down programmer(human) who will forget them
  If compiler must have ; how about a tool that include with every compiler
  that will insert ; semi-automatic

torbenm@diku.dk (Torben [gidius Mogensen) (02/15/91)

cosc13gb@jetson.uh.edu writes:

>discussion not flame invite
>  why does language has separator ; ? Is it for the benefit of the compiler
>  can compiler now smart enough to drop the ;
>  ; will always slow down programmer(human) who will forget them
>  If compiler must have ; how about a tool that include with every compiler
>  that will insert ; semi-automatic

There was a long discussion about this on comp.lang.functional recently.

The ability to drop the ;'s on line-breaks is called off-side rules.
Languages that use off-side rules include Occam, Miranda and Haskell.

	Torben Mogensen (torbenm@diku.dk)

norvell@csri.toronto.edu (Theo Norvell) (02/16/91)

In article <1991Feb15.143327.22402@odin.diku.dk> torbenm@diku.dk (Torben [gidius Mogensen) writes:
>
>There was a long discussion about this on comp.lang.functional recently.
>
>The ability to drop the ;'s on line-breaks is called off-side rules.
>Languages that use off-side rules include Occam, Miranda and Haskell.
>
>	Torben Mogensen (torbenm@diku.dk)


Not really.  These languages use indentation to express grouping
(parenthesization).  Getting rid of semicolons is an orthogonal issue
and can be done very easily without any reference to white space.
Consider the following language
	Program --> S
	S --> empty | S S | var id : T | V := E | if E then S else S fi
	    | while E do S od
No semicolons, no whitespace tricks.  Euclid and related languages use such
syntax.

Theo Norvell					norvell@csri.toronto.edu

sommar@enea.se (Erland Sommarskog) (02/18/91)

Also sprach cosc13gb@jetson.uh.edu:
>  why does language has separator ; ? Is it for the benefit of the compiler
>  can compiler now smart enough to drop the ;
>  ; will always slow down programmer(human) who will forget them
>  If compiler must have ; how about a tool that include with every compiler
>  that will insert ; semi-automatic

Obviously you haven't programmed in Eiffel. The general rule
seems to be that if you get a syntax error you don't under-
stand, remove a semicolon.
-- 
Erland Sommarskog - ENEA Data, Stockholm - sommar@enea.se

rh@smds.UUCP (Richard Harter) (02/18/91)

In article <8507.27b91f9e@jetson.uh.edu>, cosc13gb@jetson.uh.edu writes:
> discussion not flame invite
>   why does language has separator ; ? Is it for the benefit of the compiler
>   can compiler now smart enough to drop the ;
>   ; will always slow down programmer(human) who will forget them
>   If compiler must have ; how about a tool that include with every compiler
>   that will insert ; semi-automatic

It is possible to design languages without separators; however it is
almost universally agreed that the hair isn't worth it.  There is, however,
a real split between languages which use end-of-line as a statement
separator and those which use semicolon.  (Fortran vs Algol.)  Since it
is normal practice to put one statement per line it is natural to use
end-of-line as a separator.  However the semicolon does let you put more
than one statement per line or to use more than one line for a statement.
One can get into some very heated arguments about which is better --
it's mostly a religious matter.  It all goes back to the days when some
people used cards and others used paper tapes.
-- 
Richard Harter, Software Maintenance and Development Systems, Inc.
Net address: jjmhome!smds!rh Phone: 508-369-7398 
US Mail: SMDS Inc., PO Box 555, Concord MA 01742
This sentence no verb.  This sentence short.  This signature done.

tchrist@convex.COM (Tom Christiansen) (02/19/91)

From the keyboard of rh@smds.UUCP (Richard Harter):
:It is possible to design languages without separators; however it is
:almost universally agreed that the hair isn't worth it.  There is, however,
:a real split between languages which use end-of-line as a statement
:separator and those which use semicolon.  (Fortran vs Algol.)  

Or awk vs C.  Personally, I prefer the C method.  I don't like
having to escape my newlines at the end of the line if it's not
the end of the statement.  But yes, some people have different
tastes.  They're wrong. :-)

--tom
--
Tom Christiansen		tchrist@convex.com	convex!tchrist
 "All things are possible, but not all expedient."  (in life, UNIX, and perl)

norvell@csri.toronto.edu (Theo Norvell) (02/19/91)

In article <329@smds.UUCP> rh@smds.UUCP (Richard Harter) writes:
>However the semicolon does let you put more
>than one statement per line or to use more than one line for a statement.

As was recently pointed out in this thread it is not hard to design a
language in which
	(a) The syntax is quite conventional except that it has no
	    statement separators and simple statements have no
	    terminators.  (I.e. no semicolons.)
	(b) The treatment of whitespace follows the algol tradition:
	    newline = blank, two blanks = one blank.
	(c) The syntax is presented with a plain context free (even LL(1))
	    grammar.
One can thus put several statements on the same line (sometimes with no
whitespace between them at all) or put one statement on several lines without
semicolons or anything similar.  See my previous article for a
example language.

lavinus@csgrad.cs.vt.edu (02/19/91)

There's a reason for the semicolon that seems to be missed here.  Yes, as
several of you have pointed out, it is perfectly feasible to define a language
which requires *no* end-of-statement separators - no semicolons, end-of-line's,
or anything.  One can even drop the semicolons from Pascal pretty easily. 
But.. this assumes that a compiler for this language is always given correct
programs.  The semicolons are there so that if the compiler hits an error, and
gets really confused, it may easily find the end of the statement and recover
to continue parsing the program.

-- Joe Lavinus, Virginia Tech

--
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
= Joseph W. Lavinus                           =     \  /      =
= Virginia Tech, Blacksburg, Virginia         =      \/__V    =
= email: lavinus@csgrad.cs.vt.edu             =      /\       =

bevan@cs.man.ac.uk (Stephen J Bevan) (02/19/91)

> There's a reason for the semicolon that seems to be missed here. 
[deleted]
> The semicolons are there so that if the compiler hits an error, and
> gets really confused, it may easily find the end of the statement and recover
> to continue parsing the program.

If your language has blocks, just recover to the end of the block
rather than the end of the statement.  If you don't have blocks, skip
to the end of what you do have (the file if necessary).

I don't particularly agree with the argument that you should minimise
the amount of code skipped.  If you have an error, then it should be
reported and after that all bets are off.

In order to continue parsing the compiler writer has to insert damage
control code to try and reduce the effect of an error.  As a user,
there is no simple way of knowing that the writer was successful and
that any subsequent errors aren't infact caused by the original error.
i.e. if you believe subsequent errors, you can waste time looking for
errors that don't acutally exist.  I grant that you can apply
intelligent filtering to this and make guesses as to what may or may
not be affected by previous errors, but they are just that, guesses.

If you are submitting batch jobs for compilation on some system, then
I can see a possible reason for trying to detect as many errors as
possible, but what percentage of people still write code like this?
I would have thought that most code is now developed in PCs,
workstations and alike, where the edit-compile-run cycle is quite
small.

Stephen J. Bevan		bevan@cs.man.ac.uk

kers@hplb.hpl.hp.com (Chris Dollin) (02/20/91)

There's been lots of comments about to-semi-or-not-to-semi, including:
lavinus@csgrad.cs.vt.edu:

   There's a reason for the semicolon that seems to be missed here.  Yes, as
   several of you have pointed out, it is perfectly feasible to define a
   language which requires *no* end-of-statement separators - no semicolons,
   end-of-line's, or anything.  One can even drop the semicolons from Pascal 
   pretty easily. 

   But.. this assumes that a compiler for this language is always given correct
   programs.  The semicolons are there so that if the compiler hits an error,
   and gets really confused, it may easily find the end of the statement and 
   recover to continue parsing the program.

I implemented a semicolon-free language (it happened to be a "real-time
blackboard system" language). The compiler never seemed to have any problems
finding the end of statements and recovering; neither did the programmer. The
only usability issue we noticed was that where a statement was incomplete (but
not visibly so) and was followed by a long comment, the compiler would object
at the first token following the comment, rather than at the non-existant
semicolon after the incomplete statement.

This seemed to me a reasonable trade; the programmer *never* made trivial
semicolon errors, and semicolon errors seem to occur with monotonous regularity
in Pascal or C.
--

Regards, Kers.      | "You're better off  not dreaming of  the things to come;
Caravan:            | Dreams  are always ending  far too soon."

rgh@inmet.inmet.com (02/21/91)

The Icon language has an interesting approach to the semicolon issue.
The language has a somewhat C-like syntax, and includes semicolon as
a statement separator.  However, end-of-line is treated as a statement
separator if the statement could end there.  That is,

	a := b
	c := d

is two statements, since "a := b" is a complete statement, but

	a := b +
	     c

is one statement, since a statement can't end with "+".
In practice, this lets you get away without any semicolons, except to
separate multiple statements on the same line.
This sensible and workable rule could be adapted to most Algol-ish languages.


Randy Hudson  rgh@inmet.inmet.com

nick@cs.edinburgh.ac.uk (Nick Rothwell) (02/21/91)

In article <21900005@inmet>, rgh@inmet.inmet.com writes:
> 
> 	a := b +
> 	     c
> 
> This sensible and workable rule could be adapted to most Algol-ish languages.

...and was used in the Algol-ish language IMP back in the mid 70's.

-- 
Nick Rothwell,	Laboratory for Foundations of Computer Science, Edinburgh.
		nick@lfcs.ed.ac.uk    <Atlantic Ocean>!mcsun!ukc!lfcs!nick
~~ ~~ ~~ ~~  Captain Waldorf has analogue filters. You do not.  ~~ ~~ ~~ ~~
~~ ~~ ~~ ~~ Do not try to imitate them or any of their actions. ~~ ~~ ~~ ~~

dww@math.fu-berlin.de (Debora Weber-Wulff) (02/22/91)

Now what do I do about statements like

  a := b
       + c ?

I gotta backtrack over the statement I thought was complete. 
And how do I express this mathematically? 

When you write prose (flames, etc.) you use punctuation to
show where one thought ends and another one begins, or to
set off collections of words to make the sentence less
ambiguous. Since programming is (weakly) linked to expressing
thought, it shouldn't be too much to ask for you to let
the compiler know when a statement is completed, and for you to
explicitly mark your blocks. As a side-effect, humans who
read your code have a fighting chance of figuring out what
you meant. 

Try finding out (without using a ruler or other aid) to which
block the last statement belongs:

-- apologize for all the lines, but it has to go off your screen to be effective
SEQ
  x := 1
  SEQ
    x := x + y
    PAR
      y := y - 1
      x := x - 1
      SEQ
	-- and all kinds of stuff 
	--
	--
	--
	--
	--
	--
	SEQ
	  --
	  --
	  --
	  --
	  SEQ
	    --
	    --
	    --
	    --
	    SEQ
	      ALT
		--
		  --
		  --
		  --
		  --
                --
		  SEQ
		    --
		    --
		    --
		    --
		    SEQ
		      --
		      --
		      --
		      --
		      --
		      --
		      --
		      --
		      --
		      PAR
			--
			--
			--
			--
			--
      -- Which one?

Debbie Weber-Wulff
weberwu@fubinf.uucp

oz@yunexus.yorku.ca (Ozan Yigit) (02/22/91)

In article <UUEO3RO@math.fu-berlin.de>
dww@math.fu-berlin.de (Debora Weber-Wulff) writes:

>Now what do I do about statements like
>
>  a := b
>       + c ?

I am not sure what you mean. What is there to do? ``a := b'' statement is
complete, and the ``+ c'' statement is in error, at least in most of the
languages I know of that do away with the semicolon. In such languages
[e.g. Icon, Bell Labs S etc] a statement is allowed to continue only when
it is clear that the end of the expression cannot have occurred yet. In
practice, this is easy to implement, and works rather well. 

oz
---
Why should the status of my code depend on  | Internet: oz@nexus.yorku.ca
what RMS had for breakfast? -- Jay Maynard  | Uucp: utzoo/utai!yunexus!oz

denelsbe@hatteras.cs.unc.edu (Kevin Denelsbeck) (02/24/91)

This is just a quick plug for Turing, a language recently developed at the
University of Toronto.  Turing's grammar was designed so that no statement
separators (other than whitespace) are needed.  So you can write stuff like

	x := x + y +
	z * sin(
	phi
	) / pi

or somesuch, if you wanted, as well as squeeze all sorts of statements on
one line, such as

	x:=x+y y:=x-y x:=x-y put x,y

which swaps x and y and displays their values (working properly if x and y
aren't aliased to the same var).  Indenting is left up to the programmer.
Turing handles its control structures by requiring them to follow the
form

	<command>
	   ...
	   {code}
	   ...
	end <command>

so we have "if ... end if", "loop ... end loop", and so on.  This gives a
great deal of flexibility to the programmer to format things the way he/she
wants.  A possible drawback, especially with large programs, is that syntax
checking may take longer if the compiler thinks an expression is continuing
over several lines when in fact the programmer forgot something silly like
a right parenthesis.

Kev @ UNC

denelsbe@hatteras.cs.unc.edu (Kevin Denelsbeck) (02/26/91)

In article <8531@plains.NoDak.edu> person@plains.NoDak.edu (Brett G. Person) writes:
>
>A BAD idea!  The semi-colon is there to denote the end of a particular piece
>of code.  I find it a lot easier to read a language that uses the semi-colon
>because it acts like a place holder for the little code-parser in my head.
>Try reading "efficient" BASIC code sometime.  
>

I disagree!  Semicolons, braces, and "anonymous" begin-end pairs are just a
lot of clutter, as far as I can see.  I'm willing to bet that most programmers
operate with the "one-command-per-line" mindset (except for a few little hacks
like putting a simple if-then or loop structure on one line) and that, in the
long run, eliminating semicolons and other such detritus makes for an easier-
to-read program.  In other words, I admire languages that rely on whitespace
for statement separators and end<command> statements for statement terminators.
It makes for a cleaner format and automatically does some documentation at the
same time (as opposed to the anonymous begin-ends or braces alluded to above).
It takes a little getting used to, but so do semicolons, and I think the
approach I mentioned is more intuitive.
	By the way, BASIC has come a long way, and I feel it's an unfairly
maligned language.  I'm not familiar with the details of True BASIC, but I
do know that a lot of dialects of BASIC have gone under some Algolization
and have some rather elegant features, such as conditional looping, additional
implicit types, etc.  The one I use, GFA BASIC for the Atari ST, has a lot of
Algolisms, as well as automatic code formatting (indenting, etc.) and some
C-like stuff to take advantage of incrementing vars and so on.  The compiler
generates some good code and the source is quite readable.
	Any other BASIC fans out there?

>-- 
>Brett G. Person
>North Dakota State University
>uunet!plains!person | person@plains.bitnet | person@plains.nodak.edu

Kev @ UNC

lavinus@csgrad.cs.vt.edu (02/26/91)

Well, people seem to have gotten the wrong idea about my previous posting:
I do not advocate, like, nor justify the presence of semicolons in any
language.  The less I have to worry about cosmetics (that includes semicolons,
whitespace, and other stuff which has no semantic usefulness), the happier
I am.  However, I was trying to point out that that is (at least part of) the
reason why semicolons are in these languages.

BTW, what do people think of languages which use *no* separators whatsoever - 
for that matter, I can't even think of one right now; the closest one being
Lisp, which still requires some whitespace to separate elements of lists, etc.

So long...

Joe Lavinus

P.S.WhitespaceisnotstrictlynecessaryinEnglish,either.
--
_______________________________________________________________
                                                   _  _  __
  Joseph W. Lavinus (lavinus@csgrad.cs.vt.edu)     | / \ |_
  Virginia Tech, Blacksburg, Virginia            __| \_/ |_

igl@ecs.soton.ac.uk (Ian Glendinning) (02/27/91)

In <UUEO3RO@math.fu-berlin.de> dww@math.fu-berlin.de (Debora Weber-Wulff) writes:

>Try finding out (without using a ruler or other aid) to which
>block the last statement belongs:

>SEQ
>  x := 1
>  SEQ
>    x := x + y
>    PAR
>      y := y - 1
>      x := x - 1
       ...  all kinds of stuff in here
       ...  lots more stuff tucked away in here
>      -- Which one?

Easy when you have a folding editor, isn't it?  No one in their right
mind would try to read a 'flat' occam program of any non-trivial
nature.  In fact, I wouldn't write code in *any* language without one
nowadays.  If don't have one already, you can get a copy the Origami
folding editor sources (C version, for Unix) by sending a message
containing the following lines:

archiver tar
pack compress
send origamic

to archive-server@inmos.com

A Turbo Pascal version and PC executable version can be obtained from
the same place.  Full information about the server can be obtained by
sending a message containing a single line saying "help" (no quotes).

Enjoy!
--
I.Glendinning@ecs.soton.ac.uk        Ian Glendinning
Tel: +44 703 593081                  Electronics and Computer Science
Fax: +44 703 593045                  University of Southampton SO9 5NH England

ok@goanna.cs.rmit.oz.au (Richard A. O'Keefe) (03/01/91)

BCPL's rule about treating the end of a line as a ';' token unless it is
"obvious" that the line must be continued is extremely easy to add to a
parser.  For example, I once had a "mid-processor" for C that would do
things like inserting ';' before '}' unless there was already a ';' or
'}' there, and it implemented the BCPL rule.  (It also filtered := to =.)
The scheme was
	cc -E foo.c | midprocessor >temp.c; cc temp.c
It took an afternoon, and was really tiny.  I eventually threw it away,
because it was rather like the strange case of the Dvorak keyboard.
(Hint:  people trained with the Dvorak keyboard are all of 5% faster
than people trained on QWERTY.  Big deal.)  It really wasn't much of an
improvement.

Don't forget one important thing that distinguishes C from Turing and
even from BCPL and S:  the C pre-processor.  Should one apply the BCPL
rule before or after macro expansion?  It can make a huge difference.
-- 
The purpose of advertising is to destroy the freedom of the market.

rst@cs.hull.ac.uk (Rob Turner) (03/01/91)

Randy Hudson writes:

> ... end-of-line is treated as a statement
>separator if the statement could end there. That is,
>
>      a := b
>      c := d
>
>is two statements, since "a := b" is a complete statement, but
>
>      a := b +
>           c
>
>is one statement, since a statement can't end with "+".

This is similar to the way BCPL parses programs.

Let's carry it further, though. What about

       a := b c := d

A parser could read in the "c" after the "b", and, instead of issuing
an error message, assume that the previous statement has ended.

I acknowledge that debugging programs would become a lot harder if
this technique was used. Let's stick with semicolons.

Rob

sw@smds.UUCP (Stephen E. Witham) (03/02/91)

In article <1875@borg.cs.unc.edu>, denelsbe@hatteras.cs.unc.edu (Kevin Denelsbeck) writes:
     Semicolons, braces, and "anonymous" begin-end pairs are just a
     lot of clutter, as far as I can see.  I'm willing to bet that most 
     programmers operate with the "one-command-per-line" mindset (except for 
     a few little hacks like putting a simple if-then or loop structure on one 
     line) and that, in the long run, eliminating semicolons and other such 
     detritus makes for an easier-to-read program.  
In printed English, indentation is used (as above) for grouping, and statements
are separated by punctuation.  More than one statement can fit on a line, and
a single statement can take up multiple lines.  I kinda like it that way.
My C code does things like:
     x = 0;  y = 0;
and
     z = a + veryLongNamedFunctionOf( b + importantOffset )
           + anotherVeryLongNamedFunction( d ) 
           - theExcess;

Kev asks,
     Any other BASIC fans out there?
Yeah.  People pick on BASIC too much, especially now that it's
come out of the dark ages.  Basic is the only language in which
I can write extemporaneously, off-the-cuff, top-to-bottom, without
writer's block.

--Steve Witham sw@smds.uucp
Not-the-fault-of: SMDS, Concord, MA

gudeman@cs.arizona.edu (David Gudeman) (03/02/91)

In article  <5622.9103011021@olympus.cs.hull.ac.uk> Rob Turner writes:
]Randy Hudson writes:
]
]> ... end-of-line is treated as a statement
]>separator if the statement could end there. That is,...
]
]Let's carry it further, though. What about
]
]       a := b c := d
]
]A parser could read in the "c" after the "b", and, instead of issuing
]an error message, assume that the previous statement has ended.
]
]I acknowledge that debugging programs would become a lot harder if
]this technique was used. Let's stick with semicolons.

That is a fallacious argument:

(1) Y is bad,
(2) Y is an extension of X,
therefore
(3) X is bad.

Statement (3) does not follow logically from (1) and (2).
--
					David Gudeman
gudeman@cs.arizona.edu
noao!arizona!gudeman

brnstnd@kramden.acf.nyu.edu (Dan Bernstein) (03/02/91)

In article <6968@ecs.soton.ac.uk> igl@ecs.soton.ac.uk (Ian Glendinning) writes:
> Easy when you have a folding editor, isn't it?

Even easier when you're using WEB or something equivalent. The only
reason C and Fortran make this difficult is that they don't let you move
a block of code somewhere else without changing the control structure
and data visibility of the block.

---Dan

shack@cs.arizona.edu (David Shackelford) (03/08/91)

NO, please don't take away my ;'s
My right pinkie is programmed to type ; <return> and
I have to tell it not to put the ;'s in if they;
aren't needed.                                 ^ oops :-)

Why mess with something that works perfectly OK?

Dave   | shack@cs.arizona.edu;

lavinus@csgrad.cs.vt.edu (03/24/91)

Hello out there!

I'm going to open myself up to a huge deluge of flames by saying that that's
why IBM/DOS computers are so big these days as well.  It's an awful thing, but
perhaps an inevitable one - I completely agree that Dvorak is a better layout,
but (a) I type 90wpm on a qwerty, and it'd take me quite a while to get up to
that speed on a Dvorak, and more importantly (b) I'd have to switch back to
qwerty anytime I went anywhere that didn't use it (i.e., anywhere).

Joe

P.S. BTW, the qwerty layout was designed so that keys which were commonly hit
consecutively weren't next to one another on the strikers - so that they didn't
get stuck.

--
_______________________________________________________________
                                                   _  _  __
  Joseph W. Lavinus (lavinus@csgrad.cs.vt.edu)     | / \ |_
  Virginia Tech, Blacksburg, Virginia            __| \_/ |_