[net.lang] Lexical analyzers and parsers

steven@mcvax.UUCP (Steven Pemberton) (04/26/84)

> Curiously [in FORTRAN], it was decided to ignore blanks,
> EVEN within identifiers.

I don't find this at all curious, but in fact eminently reasonable.
Many, maybe most, identifiers in the real world contain spaces,
so why not include it in programming languages?
For instance my identifier (Steven Pemberton) contains a space.
I don't find StevenPemberton or Steven_Pemberton or whatever
any more readable.

There's much discussion about how best to represent spaces in identifiers in
programming languages, but I frankly find the FORTRAN/Algol way of just using
spaces the best.

Steven Pemberton, CWI, Amsterdam; steven@mcvax.

liberte@uiucdcs.UUCP (04/29/84)

#R:mcvax:-579300:uiucdcs:26400009:000:559
uiucdcs!liberte    Apr 29 15:48:00 1984

I have a feeling that parsing will be used less often as smarter editors
are developed that avoid parsing altogether by having the user enter the
structure of the program directly.  Thus arbitrary identifiers may be used.

Parsers will still be used to get programs in and out of the editor, but
they may take longer cracking more ambiguous syntax.  Oh yes, the 
editor would output intermediate code to the language compiler/interpreter.

Daniel LaLiberte,  U of Illinois, Urbana-Champaign, Computer Science
{moderation in all things - including moderation}

rcd@opus.UUCP (Dick Dunn) (05/01/84)

<>
>> Curiously [in FORTRAN], it was decided to ignore blanks,
>> EVEN within identifiers.
>
>I don't find this at all curious, but in fact eminently reasonable.
>Many, maybe most, identifiers in the real world contain spaces,
>so why not include it in programming languages?
>For instance my identifier (Steven Pemberton) contains a space.

Come on!  You can figure out that your name is two identifiers!
Doyoureallythinkthatspacesdon'tmakethatmuchdifference?
Con side rit careful lying eneralbe fore jump ingt other espons etha
tyoudon't.
-- 
...Relax...don't worry...have a homebrew.		Dick Dunn
{hao,ucbvax,allegra}!nbires!rcd				(303) 444-5710 x3086

ntt@dciem.UUCP (Mark Brader) (05/03/84)

>>> Curiously [in FORTRAN], it was decided to ignore blanks,
>>> EVEN within identifiers.
>>
>>I don't find this at all curious, but in fact eminently reasonable.
>>Many, maybe most, identifiers in the real world contain spaces,
>>so why not include it in programming languages?
>>For instance my identifier (Steven Pemberton) contains a space.
>
>Come on!  You can figure out that your name is two identifiers!
>Doyoureallythinkthatspacesdon'tmakethatmuchdifference?
>Con side rit careful lying eneralbe fore jump ingt other espons etha
>tyoudon't.

Even if you don't admit that "Mark Brader" is one identifier (which is
how I, too, interpret it--"Mark" is a potentially ambiguous abbreviation),
what about a surname such as "de Broglie"?  But to get back to programming
languages, the point is that FORTRAN *ignores* the spaces, obliterating
any distinction between THE RAPIST and THERAPIST.  (On the other hand, on
formatted numeric input, it treats spaces as zeros...)

Mark Brader

grass@uiuccsb.UUCP (05/05/84)

#R:mcvax:-579300:uiuccsb:8900009:000:151
uiuccsb!grass    May  4 20:18:00 1984

<>
I can't resist... Some languages get along fine with no blanks.
Japanese doesn't use them, for instance.  I think it's what you're
used to.
	- Judy

dont@tekig1.UUCP (05/05/84)

X
     I think the question to answer is 'what were the driving forces behind the
decisions made at the time?'  This whole field grew out of two different camps,
the math professors and the electrical engineering professors, and a lot of
schools STILL reflect that fact, but that is neither here nor there.  I would
think that the products of the time would reflect this.  I have read comments
that Von Neumann and colleges of the math department were so good at mapping
problems into a fixed point notation space that they considered floating point
support a waste of precious resources.  The sorts of things I have wondered are
   assuming mathmeticians familiarity with functions and arguments, where are
      the early examples of 'functional syntax' languages?
   I assume, considering we are looking at early '50s work, the power of the
      idea of a stack had not been discovered, and that is why we do not see
      early stack oriented languages out of the math depts, true?
   With the fixed format decisions that went into FORTRAN, seemingly making
      an easier job, what drove the design of the language?  We have to 
      recognize that at the time, there was absolutely no theory to support
      what was being done, I seem to remember somewhere that the very first
      FORTRAN, a fairly simple language by undergrad compiler class standards,
      took 18 man years to complete.

Don Taylor
tektronix!tekig1!dont

richard@sequent.UUCP (05/06/84)

>>> Curiously [in FORTRAN], it was decided to ignore blanks,
>>> EVEN within identifiers.

When I was tutoring CSc in college, I helped an engineering student
whose instructor had the curious notion that any and all spaces were
*illegal* in fortran.  This was in '81.  The guy hadn't programmed in
so long, maybe once he was right.  But I pity his students: his sample
programs were about 1/5 to 1/4 "goto"s.  What astonished me even more
were the flow-charts he handed out as "coding problems."  They very
clearly illustrated his lack of mental structure.  Imagine a flow-chart
with 1/5 to 1/4 "branches."  (Not a complicated program either). This
is one of the possible results when instructors aren't "encouraged"
to stay on top of their field.

___________________________________________________________________________
The preceding should not to be construed as the statement or opinion of the
employers or associates of the author.   It might not even be the author's.

I try to make a point of protecting the innocent,
	but none of them can be found...
							...!sequent!richard

bwm@ccieng2.UUCP (05/07/84)

Fortran, simple???

Any language that you have to parse most of a line and then back up to find the
token (like fortran) has severe brain damage. Consider:

IF(I.EQ.2)22,23,22

It isn't until we hit the first comma that fortran realizes it is not looking
at an identifier! It then realizes that an identifier cannot be followed by
a comma, and backs up looking for a reserved word!!!!!!

GAG!


-- 
...[cbrma, rlgvax, ritcv]!ccieng5!ccieng2!bwm

guy@rlgvax.UUCP (05/07/84)

While we're on the subject of Fortran and lexical analysis, here's the worst
consequence of the "blanks are ignored" rule I've seen yet, which appeared
recently in net.unix-wizards:

> Newsgroups: net.unix-wizards,net.bugs.usg,net.lang.f77
> Subject: weird f77 bug; anyone seen this one?

> We are running USG 5.0 on a VAX 11/780.  Here is part of a subroutine
> from an f77 program that compiles properly:

> 	subroutine scale (buff2,f2,ctscl2,type2)

> 	character buff2*132,cont2*2,type2*4
> 	integer a2,d,e2,f2,ctscl2

> 	cont2 = ' 	'
> 	a2 = index (buff2,'scale')
> 	if (a2 .ne. 0) then
> 	  do 100 d = a2+5,len(buff2)
> 	    if (f2 .eq. 1) goto 200
> 	     if ((buff2(d,d) .eq. ' ') .or.(buff2(d,d) .eq. '	'))
>      &		go to 100
> 	     e2=d
> 	     f2=1
> 	     go to 100
> 200	    if ((buff2(d,d) .ne. ' ') .or. (buff2(d,d) .ne. '	'))
>      &		go to 100
> c	    i = strtok (ctscl2,buff2,cont2,'i')
> 100	  continue
> 	endif


> If, however, you change the name of the variable 'd' to 'd2' throughout,
> the compiler flags a syntax error on the 'do 100 ...' statement.

> Any guesses?

> Newsgroups: net.unix-wizards,net.bugs.usg,net.lang.f77
> Subject: weird f77 bug solved....

> The reason for this is the f77 extension which allows a double
> precision exponent, which is a multiplier base 10.  In other words,
> '1d2' equates to '100' (1 * 10^2).  f77, on the statement above,
> was squeezing out the space between the '100' and the 'd2' and
> parsing it as a double precision constant '100d2'; thus the
> syntax error.

How'd ya like that one, folks?  A program that, by all rights, *should*
be perfectly legal is made illegal by an unfortunate (but otherwise legal!)
choice of variable name.  ("e2" would have been just as bad, for the same
reason.)

	Guy Harris
	{seismo,ihnp4,allegra}!rlgvax!guy

jlg@lanl-a.UUCP (05/10/84)

iiiii

The statement 

      IF ( I.EQ.J ) 50,60,70

Normally parses to <ident> <lparen> <ident> <point> ...
which is illegal in Fortran,  the IF is identified when the point is
found, and is NOT delayed until the first comma is shifted.  An alternate
way of parsing Fortran is to assume that every statement begins with a keyword
(all but the assignment statement do).  If a keyword is not found, then assume 
it's an identifier.  This means you only have to backtrack when an identifier 
has a prefix that exactly matches a keyword.

For example:

      IF(X.EQ.Y)GOTO 50
      ABC=3.
      DONT(55)=1.

Backtrack is only required for the third line since it is first parsed as
<do> <ident> ...
which causes an error.  Backtrack can then find the correct parse:
<ident> <lparen> <intconst> <rparen> <equal> <floatconst>

This method of parsing is as fast as parsing a language with reserved keywords
and significant blanks except for those statements which cause the backtrack.
Since most identifiers don't begin with a keyword (many keywords are too long
to be mistaken for identifiers anyway), the speed penalty is only rarely paid.

Don't think from this that I'm in favor of ignoring blanks, I'm not.  There
are ambiguities caused by ignoring blanks that can't be resolved by any
parsing scheme.  Besides, the complexity of coding a parser that can backtrack
is unnecessary for languages with significant blanks.  

henry@utzoo.UUCP (Henry Spencer) (05/12/84)

Don Taylor asks, in part:

   With the fixed format decisions that went into FORTRAN, seemingly making
      an easier job, what drove the design of the language?  We have to 
      recognize that at the time, there was absolutely no theory to support
      what was being done, I seem to remember somewhere that the very first
      FORTRAN, a fairly simple language by undergrad compiler class standards,
      took 18 man years to complete.

According to Backus's paper in the symposium a few years ago on the
history of programming languages, Fortran was barely "designed" at all.
The idea was to make a programming language based on mathematical equations;
everything else was kind of an ad-hoc afterthought.  They just sort of made
it up as they went along.

He also explained, in detail, why the first Fortran compiler took so long.
Some of it was lack of previous work to draw on, but a lot of it was the
required quality of object code.  There was great fear that Fortran would
not be taken seriously unless the compiler produced *very* good code.  Most
"programming languages" of the time were interpretive, and everybody knew
that they were useless for serious number-crunching.  As a result, the
first real compiler also had to be the first super-optimizing compiler.
The amazing thing was that they succeeded!  According to Backus, the
production compiler could produce (correct) code whose derivation from
the source could be understood only with the assistance of three or four
members of the compiler team.  The very first Fortran compiler produced
better code than most of its successors.
-- 
				Henry Spencer @ U of Toronto Zoology
				{allegra,ihnp4,linus,decvax}!utzoo!henry

debray@sbcs.UUCP (05/15/84)

<!>
	> Any language that you have to parse most of a line and then back
	> up to find the token (like fortran) has severe brain damage.
	> Consider:
	>
	> IF(I.EQ.2)22,23,22
	>
	> It isn't until we hit the first comma that fortran realizes it is
	> not looking at an identifier! It then realizes that an identifier
	> cannot be followed by a comma, and backs up looking for a reserved
	> word!!!!!!

(1) I agree that FORTRAN isn't a dream language.  It's been years since I've
touched the language, but if memory serves me right, FORTRAN identifiers
can't have special symbols like '(' and '.' in them, so I'm not sure how
valid the above example is.  The DO statement, 

	DO 100 J = 1, 10

might be the example you're looking for.

(2) In general, backing up in execution is a pretty messy affair, and to my
knowledge, only one language, Prolog, does a decent job of it. The problem,
of course, is with undoing global changes, e.g. to the symbol table. In the
above example, though, it would seem that simply increasing the amount of
lookahead would take care of the problem.
-- 
Saumya Debray, 	SUNY at Stony Brook

	uucp:
	    {cbosgd, decvax, ihnp4, mcvax, cmcl2}!philabs \
		    {amd70, akgua, decwrl, utzoo}!allegra  > !sbcs!debray
	       		{teklabs, hp-pcd, metheus}!ogcvax /
	CSNet: debray@suny-sbcs@CSNet-Relay

scw@cepu.UUCP (05/15/84)

I recall, from back in the dim days of learning, that the first FORTRAN
compiler was susposed to generate code at least 90% as fast as well designed
Asm code.  That's impressive!!
-- 
Stephen C. Woods (VA Wadsworth Med Ctr./UCLA Dept. of Neurology)
uucp:	{ {ihnp4, uiucdcs}!bradley, hao, trwrb, sdcsvax!bmcg}!cepu!scw
ARPA: cepu!scw@ucla-locus       location: N 34 06'37" W 118 25'43"

ron@brl-vgr.ARPA (Ron Natalie <ron>) (05/15/84)

If you want a good one, how about

	INTEGER E7
	DO 10 E7 = 1, 10, 1

Doesn't work.  Line numbers can't be over 99999.

-Ron