[comp.lang.functional] concrete syntax

acha@CS.CMU.EDU (Anurag Acharya) (01/23/91)

It seems that in focussing on abstract syntax, folks designing lazy functional
programming languages like Miranda and Haskell have pretty much ignored 
concrete syntax. I have heard the comment - "who cares about concrete syntax ?
parsing is a problem of the 60s and 70s." Personally, I think that this is
a rather callous approach since it pays little or no attention to the 
usability of the language. Attitudes like this result in anachronisms like
the "off-side rule"  and white-space significance hanging around long
after the rest of the world has gone on.

Great deal of attention has been paid to ensuring that the semantics of these 
languages is rigorously specified and "clean". I would expect that a fraction
of this rigor and cleanliness would carry over to the specification of syntax.
....


anurag

jgk@osc.COM (Joe Keane) (01/24/91)

I think the people arguing for significant use of whitespace have good
intentions, but they're a little misguided.  Basically i think they're solving
a problem that doesn't exist.

In C the syntax of statements and blocks is so simple, i can state it in less
than a line: a statement ends with `;', and `{' and `}' delimit blocks.  I've
been using C for a long time and i have a number of gripes about the language,
but this is not one of them.  The syntax is simple and easy to use.

Because of the simple syntax, editing C code is easy.  You can cut and paste
arbitrary blocks, without having to adjust the indentation if you don't feel
like it.  A neat property is that you can hit M-q (fill-paragraph) in Emacs,
and the code may become very hard to read but it still works fine.

In English the rule is simple: sentences end in `.' and questions end in `?'.
You could propose a scheme where periods are optional at the end of a line,
and add some way to indicate that a sentence continues to the next line.  But
if someone did this, people would laugh at him.  Why fix what isn't broken?

Also there is the issue of readability.  Some people claim that the form
without a semicolon `looks better'.  This is clearly a matter of taste, and it
probably depends on whether you're used to reading pseudo-code or C code.  I
actually like the other form better.  If you use a normal indenting style,
it's true that the semicolon is redundant, but a little redundancy can make it
easier to read.

In FLs we have a number of schemes which make use of whitespace.  But what do
you gain from this?  What it comes down to is that you can avoid typing a
semicolon or braces in some cases.  But these schemes are more complicated
than those they replace.  Some languages say that you can use a semicolon to
separate statements, but that it's optional in certain cases depending on
surrounding whitespace.  By the time you have rules like this, the elegance
has been lost, and you have to wonder if it wouldn't be easier to just always
put the semicolon.

S.Clayman@cs.ucl.ac.uk (01/26/91)

I have just read about 20 messages concerning off-side rules,
white space having meaning etc...

Firstly I would like to know how many of the people doing
the criticism have actually USED a language with off-side rules.

Secondly, languages aren't necessarily designed so that the compiler
writer's task is made easier.  Progamming language design should
consider the users of the language; ease of expressing abstract ideas
should be more important than how many minutes Eric Compiler-Writer
saved when doing the parser.

The most important thing I want to say is I have NEVER had a problem
with off-side rules and white space introducing bugs, but I have
introduced bugs in C programs by having single statements and then
adding another statement at the same indentation thinking thaey are
the same block of code. Having both lines indented to the same place
has caused the confusion.

I have just written a 2000 line Miranda program and off-side and white
space aided me in the expression of my ideas, helped avoid silly
errors, and made writing bug free code easier.

Indentation is used for local definitions; if the compiler complained
about things being off-side I went straight to them, and easily saw
what the problems were.

Also, i have been teaching students Miranda, they easily grasp the
concept of left hand-sides and right hand-sides of expressions
with local definitions to the right of the =.
These are 1st years, some of whom have never seen a computer before.
They can write working programs within a day, and have done complex
projects after 1 term (e.g. text formatters, symbolic differentiators,
stock control systems)
They are now learning C; whose layout and syntax is not so easily
learnt, and has a large syntactic overhead for the expressivness.
I cant imagine any of the writing similar projects after 1 term of C.

I would like to add that the Haskell approach of a formal translation
to a form with no off-side is a good idea, someone somewhere
is to be commended for that.

stuart

dww@informatik.uni-kiel.dbp.de (Debbie Weber-Wulff) (01/28/91)

Just my 2pf worth in the offside-rule/concrete-syntax discussion:

Since concrete-syntax and parsing are "solved problems" :-)
I am attempting to formally prove properties about a scanner and
parser for a subset of OCCAM, a language using the offside rule.

You would not believe the amount of effort needed to prove that the
simple algorithm of counting spaces works properly! The problem is
always in "going back", i.e. reducing the indentation level. One must
prove that the algorithm doesn't go back past 0, etc. This property
makes the algorithm "non-compositional" : you can take 2 properly
indented pieces of code that are syntactically legal, and after
concatenating them get an indented piece of code that is not legal.

The continuation algorithm for occam is unpleasant : one may only
use a new line after a binary operator (because we know that something
else must be coming...) or after certain keywords, and the indentation
can be *more* but not less that the previous line. 

Comments have to be indented at least as much as the *following* line,
and what do you do with blank lines and tab characters and on and on.

For the folks that feel this is context-free for a fixed k : have you
ever written out the LR(80) tables for such a grammar? Theoretically,
one can write a nice van Wijngaarden grammar (aka two-level grammar) that
has an infinite number of members of the token class "level parenthesis",
but unfortunately I have found no efficient way of transforming such
a grammar into code except by LL methods. 

As others have said: why on earth would one introduce so much muck
just to save a keystroke? Belonging to the Ann-Landers-School of
if-you-turned-it-on-turn-it-off-if-you-opened-it-shut-it, I like
the clarity of proper begins and ends. But then I am just a Lisp
hacker which explains a lot.

Debbie Weber-Wulff
FU Berlin
weberwu@fubinf.uucp

dll@ut-emx.uucp (Don Loflin) (01/29/91)

In article <4167@osc.COM> jgk@osc.COM (Joe Keane) writes:
[In 'C' notation]:
--------
{ { I think the people arguing for significant use of whitespace
have good intentions, but they're a little misguided;  Basically
i think they're solving a problem that doesn't exist; } { In C the
syntax of statements and blocks is so simple, i can state it in
less than a line: a statement ends with `;', and `{' and `}' delimit
blocks;  I've been using C for a long time and i have a number of
gripes about the language, but this is not one of them;  The syntax
is simple and easy to use;} { Because of the simple syntax, editing
C code is easy;  You can cut and paste arbitrary blocks, without
having to adjust the indentation if you don't feel like it;  A neat
property is that you can hit M-q (fill-paragraph) in Emacs, and
the code may become very hard to read but it still works fine;} {
In English the rule is simple:  sentences end in `.' and questions
end in `?'; You could propose a scheme where periods are optional
at the end of a line, and add some way to indicate that a sentence
continues to the next line; But if someone did this, people would
laugh at him;  Why fix what isn't broken?} { Also there is the
issue of readability;  Some people claim that the form without a
semicolon `looks better'; This is clearly a matter of taste, and
it probably depends on whether you're used to reading pseudo-code
or C code;  I actually like the other form better;  If you use a
normal indenting style, it's true that the semicolon is redundant,
but a little redundancy can make it easier to read;} { In FLs we
have a number of schemes which make use of whitespace;  But what
do you gain from this; What it comes down to is that you can avoid
typing a semicolon or braces in some cases;  But these schemes are
more complicated than those they replace;  Some languages say that
you can use a semicolon to separate statements, but that it's
optional in certain cases depending on surrounding whitespace;  By
the time you have rules like this, the elegance has been lost, and
you have to wonder if it wouldn't be easier to just always put the
semicolon;}}
--------


You really think that's more readable?  Well, to each his own...:-)
OK, sure, I took out all the indentation, but then, I could do the
same in C, right?  And the fact is, because you CAN leave it out,
many DO.  Take a look at any of Lee Adams'(TAB) books, the C versions
-- that code is unreadable.  He might as well have left it in BASIC.

Yes, English uses periods, and if C used them instead of semicolons, I'd
be overjoyed.  And look! English uses whitespace (!) to delimit blocks
(paragraphs) instead of ugly {}'s.  And can you imagine if sub-paragraphs
were used in English (not just Legalese) to the extent they are in
C or Lisp, and we had to use {}'s or ()'s to delimit them all?
Yuck.  I'd go learn something easier, like Mandarin(!).

Basically, explicit statement delimitation is probably a good idea, but
block delimitation should use whitespace.  In off-side rule languages,
you do run into the problem of indenting yourself off the page.  You
could avoid that, but still reap the benefits of the off-side rule by
using it only for block delimitation.  I also suggest a modification
to the rule such that the amout of white space in the first line of a
block determines the amount that delimits.

I.E: the C code:

if (test) {
  statement1;
  while (test) {
    statement2; statement3;
    statement4;
  }
  if !(test) {
    statement5;
  else {
    statement6;
  }
}

becomes: 

if test
  statement1.
  while test
      statement2. statement3.
      statement4. 
  if !test
    statement5.
  else
    statement6.



---Don Loflin, dll@emx.utexas.edu

kinnersley@kuhub.cc.ukans.edu (Bill Kinnersley) (01/29/91)

For all of those who've been saying that programming languages such as
Miranda and occam, where leading whitespace is used in place of begin..end
are difficult to implement, not context free, beyond the capability of
lex and yacc, a crime against nature, etc...

You might want to ftp to watserv1.uwaterloo.edu, and pick up the occam
compiler there which uses lex and yacc, and see for yourself how really
*trivial* it is to implement this feature!

--
--Bill Kinnersley

thorinn@diku.dk (Lars Henrik Mathiesen) (01/30/91)

dll@ut-emx.uucp (Don Loflin) writes:
>Basically, explicit statement delimitation is probably a good idea, but
>block delimitation should use whitespace.  In off-side rule languages,
>you do run into the problem of indenting yourself off the page.  You
>could avoid that, but still reap the benefits of the off-side rule by
>using it only for block delimitation.  I also suggest a modification
>to the rule such that the amout of white space in the first line of a
>block determines the amount that delimits.

By definition (Landin's) a syntactic entity governed by the off-side
rule extends up to a line less indented than its _first_token_. It
gets a little tiresome to see repeated suggestions that it should be
modified to do what it has always done. (There is a widespread _style_
in Miranda that starts off-side blocks without going to a new line.
It looks nice in typeset examples....)

Further, both Miranda and Haskell use the off-side rule for a very
limited number of syntactic constructs. Haskell comes quite close to
your block-only idea, but allows implicit ``statement'' delimiters by
returning to the indentation of the first line in the ``block''.
Miranda wants to use the off-side rule to separate the uses of "=" as
definition and equality, so it's a little weird. (Guess where the
three off-side blocks are in
	foo x y = 0, if e
	        = 1, otherwise
	          where e = x = y
.)

Lastly, occam is not a functional language, and it doesn't use
Landin's off-side rule. Criticism should go to comp.sys.transputer, if
anywhere.

--
Lars Mathiesen, DIKU, U of Copenhagen, Denmark      [uunet!]mcsun!diku!thorinn
Institute of Datalogy -- we're scientists, not engineers.      thorinn@diku.dk