[comp.lang.fortran] FORTRAN 8x parser/lexer .NE. compiler

rowan@mozart.uucp (Steve Rowan) (10/19/88)

Up to this point I have been satisfied sitting on the sidelines listening
to the debate on 8x.  I cannot sit back and stay quiet when I find comments
in the WG5 reports like the following:

>An 8x lexer and parser was demonstrated for the full Fortran
>8x language.  The demonstration showed that the
>implementation of full Fortran 8x can be done effectively
>and in a timely manner.

I realize that person was just reporting what happened at the WG 5 meeting
but the statement is still wrong.

Anyone that expects the lexer and parser for any language, even FORTRAN
8x or Ada, to take more than 5% of the development time does not
understand the problem.  Anyone who has taken an undergraduate compiler
course should be able to create a lexer and parser for FORTRAN 8x in six
months.  I don't think I would hire someone right out of school if I
didn't think they could do it in three months.

In the early days of computing, when parsers were written by hand and
code generators were naive, parsers were half the job.  Those days ended
in the late 60's however.

Let me give some real world examples from our Ada complier.  A much
closer approximation to a FORTRAN 8x compiler than this example.
Granted, an Ada compiler is simpler than a FORTRAN 8x compiler but this
should provide some idea of the magnitude of the problem.

The front-end of the CONVEX Ada compiler, the portion that performs
lexing, parsing, and semantic analysis is 100,000 lines of C code.
Roughly the size of our ENTIRE vectorizing/parallelizing FORTRAN
compiler.  The Ada parser and lexer come to less than 5,000 lines of the
front-end total.  Now lets add the vectorized and code generator.  Let's
say another 80,000 lines of C again taking the CONVEX Ada compiler as an
example.  Before it's said that we don't need a vectorized let me point
out that one has to do exactly the same data dependency analysis to
determine when the extra temps introduced by FORTRAN 8x's copy semantics
can be removed.  One also has the fuse the implied 8x array notation
loops into larger loops so you don't pay the loop overhead so often, then
you have to vectorize the merged loops.  Remember we want a good FORTRAN
8x compiler for whatever the target machine may be.

Now lets add say 20,000 lines of code for the library managers necessary
for dependent compilation.  Now we have a compilation system that is
twice as big as the compiler Callahan, Dongarra and Levine found to be
the best vectorizing FORTRAN compiler in the world (See "Vectorizing
Compilers: A Test Suite and Results, Technical Memorandum No. 109,
Mathematics and Computer Science Division, Argonne National Laboratory).

To develop a full FORTRAN 8x compiler which will optimize and do
parallelization (without vectorization) will take many person-years to
complete.  I think one can see how the numbers add up.

One other Ada example and I will stop beating this dead horse.  Ada
lexers and parsers were available in 1979.  The first production quality
Ada compiler was not validated until 1985.  I choose the DEC compiler as
the first real Ada compiler.  The academic Ada-ED compiler just does not
count in my book.

If I have missed something here I would be interested in hearing the
counter argument.  I have only been writing vectorizing compilers,
FORTRAN, C and, Ada for the past ten years.

I hope this did not come off as too much of a flame but as my public
review comment stated, I feel we are doing the FORTRAN community a
disservice by proposing a language with features the average FORTRAN user
does not want or need.  I also think WG5 is doing the community a
disservice by proposing the next ALGOL 68, an elegant language that no
one implements a compiler for, but that is their business.  I assume I
will get a change to voice my opinions about the ISO FORTRAN 88 draft
early next year.


Steve Rowan
rowan@convex.COM

johnl@ima.ima.isc.com (John R. Levine) (10/19/88)

In article <656@convex.UUCP> rowan@convex.COM (Steve Rowan) writes:
>Anyone that expects the lexer and parser for any language, even FORTRAN
>8x or Ada, to take more than 5% of the development time does not
>understand the problem.  Anyone who has taken an undergraduate compiler
>course should be able to create a lexer and parser for FORTRAN 8x in six
>months.  ...

Six months? I wrote a lexer and parser for F77 last week, for some program
analysis experiments I'm doing. Parsing Fortran is, if anything, easier than
parsing other languages because once you've found the token boudaries, the
syntax is very simple. Tokenizing is kludgy but by the application of some
well-known (to Fortran compiler writers, at least) techniques, it's no big
deal. F88 syntax is doubtless more complicated, but I doubt it's an order of
magnitude more complicated.

Based on some experience writing INfort, an early F77 compiler, I'd say that
the mandatory data analysis (common, equivalence, adjustable dimensioned
arrays) would take another week or two, a simple code generator a month or
two, optimized code another year or two, and correct optimized code a year
after that.

Don't expect to see any working F88 compilers before 1991, even if they
approve the standard tomorrow.
-- 
John R. Levine, IECC, PO Box 349, Cambridge MA 02238-0349, +1 617 492 3869
{ bbn | think | decvax | harvard | yale }!ima!johnl, Levine@YALE.something
Rome fell, Babylon fell, Scarsdale will have its turn.  -G. B. Shaw

link@stew.ssl.berkeley.edu (Richard Link) (10/20/88)

In article <656@convex.UUCP> rowan@convex.COM (Steve Rowan) writes:
>
>Granted, an Ada compiler is simpler than a FORTRAN 8x compiler

Oh my God! Is this really true?

The biggest gripe against Ada is it's size; in the same vein that an
elephant is a mouse designed to mil specs.

Dr. Richard Link
Space Sciences Laboratory
University of California, Berkeley
link@ssl.berkeley.edu

rowan@mozart.uucp (Steve Rowan) (10/20/88)

In article <15739@agate.BERKELEY.EDU> link@stew.ssl.berkeley.edu (Richard Link) writes:
>>
>>Granted, an Ada compiler is simpler than a FORTRAN 8x compiler
>
>Oh my God! Is this really true?
>
>The biggest gripe against Ada is it's size; in the same vein that an
>elephant is a mouse designed to mil specs.

My assertion that an Ada compiler will be simpler that a FORTRAN 8x
compiler is based on the following observations.

1. FORTRAN 8x contains every feature of Ada except tasking.

2. EVERY legal FORTRAN 77 program is a legal FORTRAN 8x program.  That
   means one has to deal with insignificant blanks, interspersing DATA
   statements with executable code, no reserved words, and so forth.  No
   one has proven to me that all the modern language features can be
   parsed in program units with statements like

	 DO10I=1.5
	 IF(I)=5
	 DO10I=1,5
	 IF(I)5,5,6
   
3. Any production quality FORTRAN 8x compiler will have to accept all the
   standard extensions like REAL*4 and INCLUDE statements that X3J3 has
   chosen not to standardize.

4. FORTRAN 8x has features like the powerful array notation that do not
   exist in Ada.

I could go on but I am sure X3J3 will address many of these issues when
it responds to my public review comment and those of the 600+ other
concerned FORTRAN users and vendors.

Steve Rowan
convex.COM

david@june.cs.washington.edu (David Callahan) (10/21/88)

The following is a reply to Steve Rown (rowan@convex.COM) who used the
Convex Ada implementation to measure the difficulties in implementing
Fortran 8x.  I am neither an expert on Fortran 8X nor on Ada so please bare
with my mistakes and questions.

#Granted, an Ada compiler is simpler than a FORTRAN 8x compiler but this
#should provide some idea of the magnitude of the problem.

Would someone please motivate this statement? Ada requires some
type-inference, generic packages, and handling of multi-tasking and
execptions. What are the features (aside from array-valued expressions)in
Fortran 8X that require similar amounts if implementation effort?

#The front-end of the CONVEX Ada compiler, the portion that performs
#lexing, parsing, and semantic analysis is 100,000 lines of C code.
# ...  The Ada parser and lexer come to less than 5,000 lines of the
#front-end total.  

It does not follow that a Fortran 8X front end will have similar 
characteristics.

#Now lets add the vectorized and code generator.  Let's
#say another 80,000 lines of C again taking the CONVEX Ada compiler as an
#example.  
#Before it's said that we don't need a vectorized let me point
#out that one has to do exactly the same data dependency analysis to
#determine when the extra temps introduced by FORTRAN 8x's copy semantics
#can be removed. 

(I don't want to argue against using vectorization technology for
optimizing Fortran 8X, but the author implies that it is necessary
for essentially any efficient implementation).

Dependence analysis is only part of a vectorizer. In particular, to optimize
array operations, you don't have to recognize induction variables and loop
invariant expressions. You don't have to pattern-match for things like sum
reductions and min/max reductions. Hence, the 80k figure may be very high.
The necessity of removing the implied temporary is unclear since its
lifetime is short. Of coarse, when adding one to every element of a
gig-word array you don't want to copy it, but it seems reasonable that for a
compiler to handle the easy cases well and the let the programmer help
(eliminate copies by adding loops) when things get out of hand. Explicit
array syntax is intended to improve the expressiveness of the language not
allow programmers to be oblivous to the efficiency of processing huge
amounts of data.

#Now lets add say 20,000 lines of code for the library managers necessary
#for dependent compilation.  

What are the aspects of Fortran8x that require more library management than
"incude" files? The UNIX world learned how to handle "compilation
dependencies" outside of the compiler. What about Fortran 8x invalidates
those techniques?.

# ... the compiler Callahan, Dongarra and Levine found to be
#the best vectorizing FORTRAN compiler in the world (See "Vectorizing
#Compilers: A Test Suite and Results, Technical Memorandum No. 109,
#Mathematics and Computer Science Division, Argonne National Laboratory).

Not to disagree with the statement, but this is not a conclusion the
authors make in the cited tech report.

#To develop a full FORTRAN 8x compiler which will optimize and do
#parallelization (without vectorization) will take many person-years to
#complete.  I think one can see how the numbers add up.

Why is parallelization necessary? You work for a company that markets
a parallel computer and I work for one that intends to, but that is 
certainly not a requirement of most implementations of Fortran 8X.

#One other Ada example and I will stop beating this dead horse.  Ada
#lexers and parsers were available in 1979.  The first production quality
#Ada compiler was not validated until 1985.  I choose the DEC compiler as
#the first real Ada compiler.  The academic Ada-ED compiler just does not
#count in my book.

Ada was developed from scatch without existing experience with a subset and
includes a very complex runtime environment to support multitasking and
execption handling. Aside from recursion and dynamic allocation, does
Fortran 8x have any feature that requies similar support?

Won't any substantive change to the language (like user defined types,
recursion, bit types, and dynamic memory allocation) require several years
before compilers are available and reliable? Aside from the array syntax,
what features can be deleted that will significantly reduce the development
time?

#If I have missed something here I would be interested in hearing the
#counter argument.  I have only been writing vectorizing compilers,
#FORTRAN, C and, Ada for the past ten years.
#
#I hope this did not come off as too much of a flame but as my public
#review comment stated, I feel we are doing the FORTRAN community a
#disservice by proposing a language with features the average FORTRAN user
#does not want or need.  

A couple of  final questions in reply: is it necessary that the "average"
Fortran user (if such a beast exists) want or need a feature for that
feature to be added to the language? Shouldn't it be sufficient that some
substantial portion of the community would find it useful, particularly
given that Fortran 77 is a subset so that the only noticable change would be
in compile speed? 

#Steve Rowan
#rowan@convex.COM

David Callahan
Tera Computer Company
(david@june.cs.washington.edu)