[comp.arch] Algol was an advance, was He's not the only one at it again!

johnl@esegue.segue.boston.ma.us (John R. Levine) (07/31/90)

In article <58372@lanl.gov> jlg@lanl.gov (Jim Giles) writes:
>> One of the most important aspects of ALGOL is the grammar on which
>> is was based. Context free grammars have since been almost universal
>> for high level languages, ...

>Context free means something different than you are using it for here.
>What you are talking about (aparently) is free-form syntax (which is
>a mixed bag - at least not all of the aspects of it as ALGOL defined
>them are a good idea).  Context free has to do with the formal specification
>of the syntax - Fortran is context free (in fact, it's LR(k) - it _would_
>be LR(1) if blanks had been significant).

Algol 60 was the first language to be defined with a formal syntax, specified
in what has come to be called BNF.  The syntax of Fortran was given informally
in the manual.  Parsing Fortran is quite context dependent (I know, because
I've written some real Fortran parsers.)  Since you have to ignore blanks, at
least until F90 becomes widely accepted, the token boundaries depend entirely
on the context, e.g. 10E5 is sometimes a floating point number, and sometimes
not, as in these:

	DO 10E5 = 10E5
	DO 10E5 = 1, 10

The first is an assignment to DO10E5, and the second is a DO loop using E5 as
a variable name.  If your Fortran allows long variable names, this is usually
the first line of a function:

	REAL FUNCTION X(N)

unless it is preceded by a PARAMETER statement that defines N, in which case
it's a declaration of the array FUNCTIONX.  Even if you made spaces
significant and token boundaries were obvious, Fortran appears to me not to be
LR(k) for any finite k since you can write this sort of thing:

	FORMAT(I1,I2,I3,I4,I5) = 3

and until you see the = sign, you don't know whether it's a statement function
or a format statement.  (A limit on K is provided by the maximum length of a
statement, but that's a pretty cheesy out.)

My goal here is not to dump on Fortran, its syntactic peculiarities are well
known and entirely tractable, albeit not by conventional tools like lex and
yacc without a tremendous amount of kludgery.  The point is that formally
specifying Algol was a big step forward, both since formal specification
languages are the cornerstone of every modern compiler writing system, and
because it is my observation that context free languages make it a lot harder
accidentally to write a program in which small typos make it mean something
entirely different from what it looks like it means.

(Of course, just because Algol has a context-free easy-to-tokenize syntax
doesn't mean that actual implementations all do.  There was IBM's Algol F,
which ignored blanks and required you to put quotes around your keywords,
proving once again that Real programmers can write Fortran programs in any
language.

-- 
John R. Levine, Segue Software, POB 349, Cambridge MA 02238, +1 617 864 9650
johnl@esegue.segue.boston.ma.us, {ima|lotus|spdcc}!esegue!johnl
Marlon Brando and Doris Day were born on the same day.

tom@stl.stc.co.uk (Tom Thomson) (07/31/90)

In article <1990Jul30.174035.26412@esegue.segue.boston.ma.us> johnl@esegue.segue.boston.ma.us (John R. Levine) writes:
>(Of course, just because Algol has a context-free easy-to-tokenize syntax
>doesn't mean that actual implementations all do.  There was IBM's Algol F,
>which ignored blanks and required you to put quotes around your keywords,
>proving once again that Real programmers can write Fortran programs in any
>language.

What's been forgotten here is that ALL the early implementations of Algol
had this problem, it's NOT an IBM Algol F problem.
 
The Algol report used BOLD TYPE to distinguish keyword tokens from identifiers,
so asking ffor quotes around keywords (or "@" symbols, or "_" symbols, or "%"
symbols, as in some versions) was just asking the programmer to indicate
which bits were supposed to be in bold type.
 
Also, the only reason you can claim the syntax is context free is to choose to
assert that programs that use undeclared identifiers are syntactically correct.
Probably the best way of handling the syntax is to put hooks into the parser
that adds identifiers to the appropriate (real, integer, boolean) class as they
are declared and not recognise arbitrary identifier-like strings as identifiers
as valid identifiers unless they have been declared. This results in a parser
that behaves nothing like a context free one.
 
Tom Thomson   [tom @ nw.stl.stc.co.uk

boebert@sctc.com (Earl Boebert) (08/01/90)

tom@stl.stc.co.uk (Tom Thomson) writes:

>In article <1990Jul30.174035.26412@esegue.segue.boston.ma.us> johnl@esegue.segue.boston.ma.us (John R. Levine) writes:
>>(Of course, just because Algol has a context-free easy-to-tokenize syntax
>>doesn't mean that actual implementations all do.  There was IBM's Algol F,
>>which ignored blanks and required you to put quotes around your keywords,
>>proving once again that Real programmers can write Fortran programs in any
>>language.

>What's been forgotten here is that ALL the early implementations of Algol
>had this problem, it's NOT an IBM Algol F problem.
>
 
[Stuff deleted ...]

>Tom Thomson   [tom @ nw.stl.stc.co.uk

I don't recall GEIR Algol as having special delimeters for reserved words;
will check my old class notes from Naur's course to be sure and report
later ...

Earl

boebert@sctc.com (Earl Boebert) (08/01/90)

boebert@sctc.com (Earl Boebert) writes:

>tom@stl.stc.co.uk (Tom Thomson) writes:

>>In article <1990Jul30.174035.26412@esegue.segue.boston.ma.us> johnl@esegue.segue.boston.ma.us (John R. Levine) writes:
>>>(Of course, just because Algol has a context-free easy-to-tokenize syntax
>>>doesn't mean that actual implementations all do.  There was IBM's Algol F,
>>>which ignored blanks and required you to put quotes around your keywords,
>>>proving once again that Real programmers can write Fortran programs in any
>>>language.

>>What's been forgotten here is that ALL the early implementations of Algol
>>had this problem, it's NOT an IBM Algol F problem.
>>
> 
>[Stuff deleted ...]

>>Tom Thomson   [tom @ nw.stl.stc.co.uk

>I don't recall GEIR Algol as having special delimeters for reserved words;
>will check my old class notes from Naur's course to be sure and report
>later ...

>Earl

Well I misremembered, including the spelling of the machine's name ...
GIER Algol *underlined* reserved words, an elegance made possible
because I/O on the machine was through a Flexowriter. This also gave
them upper and lower case; we 026-limited Yankees were pretty envious
of that!  For the historically minded, the project was started in
January of 1962 and a running compiler was available in February of
1963. GIER was a Danish machine: 1024 42-bit words of core w/8.8
microseconds access and 12,800 words on drum with 20 milliseconds to
transfer a 40-word track; rather a speedly little devil for its time.
The compiler was, to say the least, a beautiful piece of work, and all
us budding compiler-writers went out of our way to sit at Naur's feet
and hear how he did it.

kc@oz.rci.dk (Knud Christensen) (08/02/90)

boebert@sctc.com (Earl Boebert) writes:

>tom@stl.stc.co.uk (Tom Thomson) writes:

>>In article <1990Jul30.174035.26412@esegue.segue.boston.ma.us> johnl@esegue.segue.boston.ma.us (John R. Levine) writes:
>>>(Of course, just because Algol has a context-free easy-to-tokenize syntax
>>>doesn't mean that actual implementations all do.  There was IBM's Algol F,
>>>which ignored blanks and required you to put quotes around your keywords,
>>>proving once again that Real programmers can write Fortran programs in any
>>>language.

>>What's been forgotten here is that ALL the early implementations of Algol
>>had this problem, it's NOT an IBM Algol F problem.
>>
> 
>[Stuff deleted ...]

>>Tom Thomson   [tom @ nw.stl.stc.co.uk

>I don't recall GEIR Algol as having special delimeters for reserved words;
>will check my old class notes from Naur's course to be sure and report
>later ...

>Earl


GIER Algol used underscored keywords to distinguish them from identifiers.
This was possible because the GIER computer used the flexowriter character
set (the underscore character was non spacing).


Knud Christensen
kc@oz.rci.dk

Inside every large program is a small program struggling to get out. - Hoare -