jejones@mcrware.UUCP (James Jones) (12/15/88)
I never could understand what the difficulty is with Algol-style semicolons. Maybe it was the way it was explained to me (thank you, Professor Feyock, wherever you are!). In Algol, semicolons *separate* statements. The glyph "end" is *not* a statement; it and "begin" serve the same purpose as paren- theses, and indeed, Algol 68 makes this even more obvious by allowing the use of parentheses in place of begin and end if one wishes. (By the way, not all statements in C are terminated with semicolons; otherwise, there wouldn't be all that grief over #define woof(x, y, z) {/* various statements */} ... if (p) woof(a, b, c); else /* etc. */ and the world would be a happier place, but then, isn't saving keystrokes the true goal of all C programmers? :-) People don't seem to have trouble with the convention that in English, items in a list are separated by commas, so why do they have trouble with Algol semicolon usage? I don't know. James Jones
firth@sei.cmu.edu (Robert Firth) (12/15/88)
In article <868@mcrware.UUCP> jejones@mcrware.UUCP (James Jones) writes: >I never could understand what the difficulty is with Algol-style semicolons.. I've used languages with various kinds of semicolon conventions, and must admit that I like the Algol-68 way least. The best, in my opinion, is to have the semicolon optional: a newline will terminate the statement if possible. So you can write x := x+1 y := y+1 and it works, but x := a0+a1+a2+ a3+a4+a5 also works. One hardly ever gets this wrong. If the semicolon is mandatory, I prefer it to be a terminator, precisely because I then don't have to edit two lines to insert or remove one statement. Finally, if the semicolon is a separator, the language should be kind enough to allow an extra semicolon before a syntactic bracket, either by having a "null statement" or by some other fudge. Much the nastiest part, though, is that the compilers I have used have been very obtuse about recovering from the simplest errors. The Algol-68 compiler had a great repertoire of error messages, from OBJECT OF MODE PROC()VOID CANNOT BE WEAKLY COERCED TO REF CHAR to some really frightening ones; what most of them meant was either "missing semicolon on previous line" or "extra semicolon on previous line". And I remember the Algol-60 compiler that could not recover from IF c THEN s; ELSE s; which is a single-token correction. The real booby price, though, goes to the C compiler where omitting a semicolon before a right brace: if (...) { s1; s2 } causes an indefinite number of bogus consequential errors, until either the compiler gives up "too many errors - goodbye" or your program text ends. Since the repair here is perhaps the simplest possible, this compiler feature represents incompetence of a rare order.
gary@milo.mcs.clarkson.edu (Gary Levin) (12/16/88)
In article <8008@aw.sei.cmu.edu> firth@sei.cmu.edu (Robert Firth) writes:
...
The best, in my opinion, is to have the semicolon optional: a newline
will terminate the statement if possible. So you can write
x := x+1
y := y+1
and it works, but
x := a0+a1+a2+
a3+a4+a5
also works. One hardly ever gets this wrong.
However, if you are in an expression oriented language, it is possible
to get really strange effects here. I would write the above statement
as:
x := a0+a1+a2
+a3+a4+a5;
Having submitted this to Icon, I found that the program didn't work as
expected. It took me quite a bit of debugging to find that this was
being treated as two statements, and so x was only assigned a0+a1+a2.
Further, I then tried to write this as
x := (a0+a1+a2
+a3+a4+a5);
and I was informed that I was missing a right parenthesis after a2
(and before the inserted semicolon). Note that Icon would correctly
handle the case of the trailing + as Firth wrote his example.
Generally, I prefer systems that have me say what I mean, rather than
guess. And if they must guess, they should have the grace to indicate
their guesses.
It is certainly possible that Icon no longer suffers from this
particular lexical/syntactic quirk; I haven't used it in quite a while.
--
-----
Gary Levin/Dept of Math & CS/Clarkson Univ/Potsdam, NY 13676/(315) 268-2384
BitNet: gary@clutx Internet: gary@clutx.clarkson.edu
chase@Ozona.orc.olivetti.com (David Chase) (12/16/88)
In article <GARY.88Dec15142623@milo.mcs.clarkson.edu> gary@milo.mcs.clarkson.edu (Gary Levin) writes: >In article <8008@aw.sei.cmu.edu> firth@sei.cmu.edu (Robert Firth) writes: >> The best, in my opinion, is to have the semicolon optional: a newline >> will terminate the statement if possible. >Further, I then tried to write this as > > x := (a0+a1+a2 > +a3+a4+a5); > >and I was informed that I was missing a right parenthesis after a2 >(and before the inserted semicolon). Note that Icon would correctly >handle the case of the trailing + as Firth wrote his example. A flaw in Icon's implementation of optional terminators is hardly a condemnation of optional terminators; other parsers have been getting it right for years. I agree rather completely with Mr. Firth's message; I've used a language with optional semicolon syntax, and liked it very much. It is astounding how much the little things often matter. (It's also astounding what horrible parsers some compilers have.) I should add that the trailing operator to indicate continuation rapidly becomes a part of your style when working in such a language. One actually wonders, after reading the keystroke-minimizing arguments on "=/:=" vs "==/=" in Kernighan and Ritchie, why they made semicolons mandatory. BCPL didn't have them. They could have saved one whole character per statement. Enquiring minds want to know. David
chl@r1.uucp (Charles Lindsey) (12/16/88)
In article <208100002@s.cs.uiuc.edu> carroll@s.cs.uiuc.edu (Alan Carroll) writes: >/* Written 10:10 am Dec 4, 1988 by bct@lfcs.ed.ac.uk in s.cs.uiuc.edu:comp.lang.misc */ >/* ---------- "Re: What makes a language successfu" ---------- */ > Another sad point is that even died-in-the wool Algol 68 people like Charles >Lindsey have drifted into putting semi-colons before ENDs. Time for "died-in-the-wool Charles Lindsey" to say what he really does. >I always thought that the 'no ; before END' part of PASCAL was one of the worst >'features' of the language. First let us understand the rules: PASCAL has a dummy-statement, so ';' before END is OK ALGOL 68 has no dummy-statement, so ';' before END is forbidden In article <868@mcrware.UUCP> jejones@mcrware.UUCP (James Jones) writes: >I never could understand what the difficulty is with Algol-style semicolons. >... In Algol, semicolons *separate* statements. ... Yes, but Carroll still has a point >... I can't count how many times I had to recompile >because I had added a statement before an END and forgotten to put a ; on >the *previous* statement. Here is what Charles Lindsey actually does: When writing in PASCAL, I always put a ';' at the end of each statement, even before an END, for the reason Carroll gave. When writing in ALGOL 68, I always put a ';' BEFORE each statement (except the first). Here is the example that started all the fuss (so far as I remember it). BEGIN LOC INT i := 0, j := 1 ; LOC REF INT ptr := i ; ptr := j ; print(i) END >{James Jones again} "end" and "begin" serve the same purpose as paren- >theses, and indeed, Algol 68 makes this even more obvious by allowing the >use of parentheses in place of begin and end if one wishes. Indeed, and it looks even better (and is easier to type) this way: ( LOC INT i := 0, j := 1 ; LOC REF INT ptr := i ; ptr := j ; print(i) ) Charles Lindsey chl@ux.cs.man.ac.uk
perelgut@turing.toronto.edu (Stephen Perelgut) (12/16/88)
Another method of handling semi-colons is to not have them. All you need to do is terminate multi-part statements. For example: if x then s1 s2 elsif y then s3 s4 else s5 end if There are no ambiguities there! And the only cost is terminating a statement. For example: if/end if; for/end for; loop/end loop; case/end case; etc. This also has the effect of making programs more readable.
kwalker@arizona.edu (Kenneth Walker) (12/17/88)
In article <34366@oliveb.olivetti.com>, chase@Ozona.orc.olivetti.com (David Chase) writes: > > A flaw in Icon's implementation of optional terminators is hardly a > condemnation of optional terminators; other parsers have been getting > it right for years. The problem is more the fact that Icon's syntax was not designed with the idea of making trailing semicolons options. As Gary Levin pointed out it is an expression oriented laguage. Suppose you write the rather strange expression return { y := a -b } It is translated as return { y := a; -b; } which is meaningful. y is assigned a and -b is returned. On the other hand y := a -b x := 2 is also translated similarly y := a; - b; x := 2 It is not until you get deeply into the semantics that you determine that -b has no effect. At that point it is getting a little late to change your mind on how to parse the expression (not that it is impossible with a hand coded parser, but I wouldn't care to do it). Icon actually adds semicolons in the lexical analyser and does so based only on the last token of the line and the first one of the next. This is not a perfect solution, but as David pointers out: > I should add that the trailing operator to indicate continuation > rapidly becomes a part of your style when working in such a language. The syntax of C is actually worse than Icon if you want to make semicolons optional "after the fact" (of language design). It has suffix operators, so you cannot use the trailing operator rule to continue a line. (I find I have no problems with Icon's approach to not requiring trailing semicolons, except that the C compiler doesn't like my coding style any more :-) Ken Walker / Computer Science Dept / Univ of Arizona / Tucson, AZ 85721 +1 602 621 2858 kwalker@Arizona.EDU {uunet|allegra|noao}!arizona!kwalker
toma@tekgvs.GVS.TEK.COM (Tom Almy) (12/20/88)
The question in my mind is *not* "Is a semicolon a statement separator or statment delimiter?" but *is* "Why do we need semicolons at all?". Many years ago (about 20) I took a compiler writing class where we had to write a compiler for an "Algol" like language. A few weeks into the project my partner and I noticed that the only thing our recursive descent parser did with semicolons was to issue an error message if they were absent! So we altered the language spec to eliminate semicolons altogether! In the same time frame, I was using BCPL (the predicessor of C). BCPL would assume a semicolon at the end of each line which could be successfully parsed as a statment. This eliminated virtually all the semicolons in every BCPL program I wrote, yet the rule cost virtually nothing in parsing overhead. I would contend that most, if not all, Algol-derived languages would work just fine without semicolons! The only problem being the way some languages define control structures require the existance of the semicolon as a null statement. Tom Almy toma@tekgvs.tek.com Standard Disclaimers Apply
smryan@garth.UUCP (Xxxxxx Xxxx) (12/21/88)
>The question in my mind is *not* "Is a semicolon a statement separator >or statment delimiter?" but *is* "Why do we need semicolons at all?". Parse this: begin a +b end Is that a+b or a;+b? >I would contend that most, if not all, Algol-derived languages would work >just fine without semicolons! Most could also do without declarations, as does Fortran. Redundancy is used to detect, maybe correct, errors in the presence of noise (every use a backspace key). Natural language is chockful of redundancy, much more than programming languages. -- -- x x xxxx +------------------------------------+-----------------------------------------+ |`Xx X-Xxxx xx xxxxx Xxxxxxx xxxxx, |`Xxxxx xx xxxxxx xxx xxxx xxx | | Xxxxxxx xxx'x xxxxxx.' |xxxxxxxxxxxxxx xxx xxxxx xxxx xxxxxxx xxx| | -Xxxxxx Xxxx |xx. Xx xxxx xx xx xx xxxxxxxxx.' -X Xxxx| +------------------------------------+-----------------------------------------+
nevin1@ihlpb.ATT.COM (Liber) (12/21/88)
In article <88Dec16.100919est.4327@turing.toronto.edu> perelgut@turing.toronto.edu (Stephen Perelgut) writes: |Another method of handling semi-colons is to not have them. All you |need to do is terminate multi-part statements. For example: | if x then | s1 | s2 | elsif y then | s3 | s4 | else | s5 | end if | |There are no ambiguities there! And the only cost is terminating a statement. |For example: if/end if; for/end for; loop/end loop; case/end case; etc. |This also has the effect of making programs more readable. Yes, but what do you do with statements that span more than one line? You still need a way of saying that an expression is to be continued on the next line (or that you don't need to continue a statement by use of a separator or terminator; this amounts to the same thing). -- NEVIN ":-)" LIBER AT&T Bell Laboratories nevin1@ihlpb.ATT.COM (312) 979-4751
eugene@eos.UUCP (Eugene Miya) (12/21/88)
Some random ponderings based on comments to ALGOL like languages (gross oversimplifications): I wonder if the COBOL people argue over the '.' [period] as a statement terminator as the semicolon as a seperater or terminator? {Perlis's had an Epigram on this (I sent them to comp.lang.sigplan but have not seen them reposted).} Modula-2 is appears closer to Mesa than Modula[-1]. You only need read the text. Wonders what a sabbatical at PARC will do for you 8-). On ":=" versus "=": Altos had an backward arrow assignment key, not "<-" for Mesa. (this could correspond to other language arrows (UP) ^ and (RIGHT) "->".) Just standardize some extra characters. 8-) It will never happen. Another gross generalization from --eugene miya, NASA Ames Research Center, eugene@aurora.arc.nasa.gov resident cynic at the Rock of Ages Home for Retired Hackers: "Mailers?! HA!", "If my mail does not reach you, please accept my apology." {uunet,hplabs,ncar,decwrl,allegra,tektronix}!ames!aurora!eugene "Send mail, avoid follow-ups. If enough, I'll summarize."
anw@nott-cs.UUCP (12/21/88)
In article <4396@tekgvs.GVS.TEK.COM> toma@tekgvs.GVS.TEK.COM (Tom Almy) writes: > >[...] "Why do we need semicolons at all?". [He writes a compiler that doesn't use semicolons at all ...] >I would contend that most, if not all, Algol-derived languages would work >just fine without semicolons! The only problem being the way some languages >define control structures require the existance of the semicolon as a null >statement. When I was running a "Compilation Techniques" course, a few years back, one of the assignments was to alter the Pascal syntax in various ways, Yacc the results, and comment. One of the ways was to throw away semicolons. In Pascal, as TomA asserts, the *only* problem is that it becomes difficult to see where the null statements are. If null statements are -- as they should be -- explicitly written, eg as "SKIP", even that problem goes away, and the semicolon becomes totally redundant. In C and Algol, semicolons are harder to dispose of. For example, consider the Algol: UNION (REAL, PROC (REAL) REAL) foo = cos #;# (pi) # is "foo" set to "cos" or to "-1.0"? # or PROC REAL bar: (x := pi #;# - 2) # return "-2" and assign "pi", or assign & return "pi-2"? # or STRING s := "Hello World!" #;# [7] #;# REAL x # "s" is "Hello World!" and "x" is an array, or "s" is "W" and "x" is a real variable? # C examples left to the reader! -- Andy Walker, Maths Dept., Nott'm Univ., UK. anw@maths.nott.ac.uk
perelgut@csri.toronto.edu (Stephen Perelgut) (12/22/88)
> |perelgut@turing.toronto.edu (Stephen Perelgut) writes: > | Another method of handling semi-colons is to not have them. All you > | need to do is terminate multi-part statements. For example: > | > | <Code fragment deleted> > | > > NEVIN LIBER AT&T Bell Laboratories nevin1@ihlpb.ATT.COM (312) 979-4751 > Yes, but what do you do with statements that span more than one line? > You still need a way of saying that an expression is to be continued on > the next line (or that you don't need to continue a statement by > use of a separator or terminator; this amounts to the same thing). My example wasn't entirely clear on this point. Careful definition of a language will allow statements to unambiguously extend across any arbitrary amounts of whitespace including blanks, tabs, form-feeds and newlines. For example, in the Turing programming language I might be acting dumb and write a statement that looks like if x / 7 > 12 then put "Hi there." else put "It's not quite right" end if You might prefer to see this written as if x/7 > 12 then put "Hi there." else put "It's not quite right" end if The Turing language definition (and the various interpreters and compilers) would treat both exactly the same. The paragraphing feature of the environment supplied with the compilers would try to turn the first into a semblence of the second. One caveat, atomic elements cannot cross line boundaries so you can't split a real number or a string literal across two lines. But you can break up statements at any other point, even expressions "This string contains the value" + " of 7 + 5 as a string literal: " + intstr ( 7 + 5)
phipps@garth.UUCP (Clay Phipps) (12/22/88)
In article <9235@ihlpb.ATT.COM> nevin1@ihlpb.UUCP (55528-Liber,N.J.) writes: >In article <88Dec16.100919est.4327@turing.toronto.edu>, >perelgut@turing.toronto.edu (Stephen Perelgut) writes: >|Another method of handling semi-colons is to not have them. >|All you need to do is terminate multi-part statements. [example shown] >|There are no ambiguities there! And the only cost is terminating a statement. >You still need a way of saying that an expression is to be continued on >the next line (or that you don't need to continue a statement by >use of a separator or terminator; this amounts to the same thing). I have long been fond of using the opposite of terminators; one might call them triggers, introductory tokens or "introducers" ("initiator", intended as opposite of "terminator", might be confused with value initialization). Most Algol-derived languages that I know of have an introducer -- a keyword -- for every statement type except assignment and (usually) procedure calls and (sometimes) value returns. Treated as statements, "end if", "end loop", &c. fit well into such a scheme: they are brackets at the semantic level. The language CMS-2 uses few special characters; the almighty dollar "$" is its terminator symbol (one particularly appropriate for the military :-). Unlike most languages on the Algol side of COBOL, CMS-2 uses an introductory token for every statement (except procedure calling ?) When I first saw the assignment statement form: "SET" TargetObject "TO" SourceValue "$" I immediately thought of COBOL verbosity, but was surprised to realize that in practice, I could type "SET" and "TO" faster than I could find and type the (shifted) ":" and (unshifted) "=" to form an Algol assignment token (we used 3 different keyboards on that project, none with ASCII layout). In the CMS-2Y "structured programming" dialect of CMS-2, the "$" was more of an artifact than a necessity, even for error recovery. I believe that a syntax style that uses introducers rather than terminators for all (or all but one) statement type combines the source-layout and error-recovery advantages of always-terminated statements, and and the easy modifiability (lack of "%#@! I forgot to insert|remove...!") and typing speed advantages of never-terminated statements, to the greatest extent possible for those conflicting goals. Fleshing out the example by perelgut@turing.toronto.edu (Stephen Perelgut) along these lines, I have if x then get p from p^.next -- compute the value p^.next; store in p give p^ -- return the value p^ elsif y then get p from p^.prev -- compute the value p^.prev; store in p give p^ -- return the value p^ else give nil -- return the nil value end if Notes: The "--" introduces a comment, which continues to CR | LF. I replaced "set" by "get" to avoid confusion with Pascal powersets, and chose the shortest sensible English words for keywords, to minimize typing. The scheme allows not only statements continued over multiple lines, but also multiple statements on a single line (e.g., when nonpurists would agree that keeping everything on a page promotes comprehension to a greater extent than isolating each statement on 1 line). The only thing that does not work smoothly is imbedded assignment, although that could be done via parentheses. The style shown here best fits statement-oriented, rather than expression-oriented, languages. I recognize that all this is not earth-shaking programming language theory; it is just my 2 bits worth on reducing common sources of aggravation. It may already have been done, perhaps by the ABC folks at CWI/Amsterdam. -- [The foregoing may or may not represent the position, if any, of my employer] Clay Phipps {ingr,pyramid,sri-unix!hplabs}!garth!phipps Intergraph APD, 2400#4 Geng Road, Palo Alto, CA 93403 415/494-8800
pokey@well.UUCP (Jef Poskanzer) (12/22/88)
In the referenced message, eugene@eos.UUCP (Eugene Miya) grossly generalizes: >Modula-2 is appears closer to Mesa than Modula[-1]. But not close enough! Modula-3 appears much closer. Doesn't matter, though; since it doesn't look like C, it won't catch on. Perhaps someday someone will graft Mesa's functionality onto C's syntax. We can call the result C Plus Mesa, or CP/M for short... >On ":=" versus "=": Altos had an backward arrow assignment key, not "<-" for >Mesa. (this could correspond to other language arrows (UP) ^ and (RIGHT) >"->".) Just standardize some extra characters. 8-) It will never happen. Old-timers will recall that on old teletypes, '_' was a back-arrow and '^' was an up-arrow. The switch was made around 1971, and for a while some people wanted to look for old-style print heads for the new ttys we were getting. But eventually we decided it was wiser to stick with the standard. Xerox went the other way, and kept arrow glyphs for those characters in (most of) their fonts. That was not too bad, but around 1984 they decided to diverge even further from ASCII, and defined a bunch of characters in the range 128-255. They added arrows in all four directions, '<<' and '>>' (French quotation characters, used in Mesa as comment delimiters), and some other chud. It made mail gatewaying rather, um, interesting. --- Jef Jef Poskanzer jef@rtsg.ee.lbl.gov ...well!pokey "C combines the power of assembly language ... with the flexibility of assembly language."
mike@arizona.edu (Mike Coffin) (12/25/88)
For an example of a language that tries to make everyone happy, see SR [TOPLAS, Jan 1988]. All semicolons are optional, so you can pretend they terminate statements, or separate statements, or you can leave them out entirely. The grammar was designed with this in mind, so there is no ambiguity. -- Mike Coffin mike@arizona.edu Univ. of Ariz. Dept. of Comp. Sci. {allegra,cmcl2}!arizona!mike Tucson, AZ 85721 (602)621-2858
djones@megatest.UUCP (Dave Jones) (12/28/88)
> In article <4396@tekgvs.GVS.TEK.COM> toma@tekgvs.GVS.TEK.COM (Tom Almy) writes: > >[...] "Why do we need semicolons at all?". > They are "gobble-stoppers" -- symbols which clue the compiler as to how to recover from a syntax error. A typical LR technique is the following: On an error, the compiler discards states from the parse stack until a state which can shift "error" is uncovered. It shifts the error. It then makes reductions any necessary before it discards (gobbles) tokens which can not be shifted. Success is declared when some fixed number of consecutive tokens have been shifted since the last error. The usual number is three. But when a gobble stopper is found, you can say, "That's enough." Here's a short yacc example from a production Pascal compiler currently under construction. The nonterminal "semicolon" is used all over the place. When a semicolon is actually found, yyerrok tells the parser that no more tokens need be shifted in order to consider that an error has been "recovered from". If the semicolon is missing, we could be silent, but for the sake of portability, we issue a warning message. (Note to gurus: The version of yacc being used is a repaired version which does not have the bogus default reduction bug, which would invalidate the error-recovery.) semicolon: ';' { yyerrok; } | '.' { Mpc_warning("Period replaced with semicolon."); } | { Mpc_warning("Semicolon inserted."); } ; label_declaration : LABEL integer_list semicolon { Mpc_add_labels($2); } ; integer_list : INTEGER { $$ = List_new(); List_append($$, $1); } | integer_list ',' INTEGER { List_append($1, $3); | error { $$ = List_new(); } | integer_list ',' error { $$ = $1; }