[net.lang.f77] Wanted: Ultra-fast fortran compiler for UNIX

richmon@astrovax.UUCP (Michael Richmond) (07/17/85)

Can anyone point to a company that supplies a UNIX Fortran compiler which
executes much faster than f77 (say, on par with the VMS compilers or better)?
Please reply via mail and I will post a summary to the net if one is warranted.
We run 4.2BSD on an 11/750 if it makes any difference.
-- 
Michael Richmond			Princeton University, Astrophysics

{allegra,akgua,burl,cbosgd,decvax,ihnp4,noao,princeton,vax135}!astrovax!richmon

conor@Glacier.ARPA (Conor Rafferty) (07/21/85)

>Can anyone point to a company that supplies a UNIX Fortran compiler which
>executes much faster than f77 (say, on par with the VMS compilers or better)?

Actually there is limited room for improvement.  The 4.2BSD compiler is
considerably better than the original f77 in that respect.  Published
work (by Jack Dongarra at Argonne National Laboratory
[dongarra@anl-mcs]) shows about 25-30% slower runtimes for the 4.2BSD
compiler over the VMS 4.1 compiler, for dense linear algebra.  I've
also coded some sparse linear algebra (essentially Yalepack) in
assembly and found only 30-35% speed up.  Let's give credit where
credit is due!

              Cheers,
conor rafferty == decwrl!glacier!conor == conor@su-glacier.arpa

wls@astrovax.UUCP (William L. Sebok) (07/22/85)

In article <9871@Glacier.ARPA> conor@Glacier.UUCP (Conor Rafferty) writes:

>Actually there is limited room for improvement.  The 4.2BSD compiler is
>considerably better than the original f77 in that respect.  Published
>work (by Jack Dongarra at Argonne National Laboratory
>[dongarra@anl-mcs]) shows about 25-30% slower runtimes for the 4.2BSD
>compiler over the VMS 4.1 compiler, for dense linear algebra.  I've
>also coded some sparse linear algebra (essentially Yalepack) in
>assembly and found only 30-35% speed up.  Let's give credit where
>credit is due!

Somehow 25%-35% sounds like a fair amount of room for improvement to me.
However that is better than the factor of two that VMS advocates have been
shoving in my face.
-- 
Bill Sebok			Princeton University, Astrophysics
{allegra,akgua,burl,cbosgd,decvax,ihnp4,noao,princeton,vax135}!astrovax!wls

sra@oddjob.UUCP (Scott R. Anderson) (07/23/85)

In article <9871@Glacier.ARPA> conor@Glacier.UUCP (Conor Rafferty) writes:

>Published work (by Jack Dongarra at Argonne National Laboratory
>[dongarra@anl-mcs]) shows about 25-30% slower runtimes for the 4.2BSD
>compiler over the VMS 4.1 compiler, for dense linear algebra.

I have a copy of Jack Dongarra's Technical Memorandum No. 23 (dated
July 18, 1985) which was passed around at a recent conference here
on high-performance computing.  For those who are interested, here
are the results of solving a linear system of equations of order
100 with LINPACK on a VAX 11/780 with FPA:

Precision    Operating System    MFLOPS    Time (sec)    Unit (usec)
---------    ----------------    ------    ----------    -----------

Double         VMS v4.1           0.14        4.96          14.4
               UNIX 4.2BSD        0.13        5.67          16.5

Single         VMS v4.1           0.25        2.74          7.98
               UNIX 4.2BSD        0.21        3.25          9.47

(Unit is the execution time for the statement y(i) = y(i) + t * x(i).)

These results verify that the UNIX f77 compiler is not as efficient
as the VMS fortran compiler, but the "slowness" factor is actually
14-19%, not 25-30%.

				Scott Anderson
				ihnp4!oddjob!kaos!sra

grandi@noao.UUCP (Steve Grandi) (07/25/85)

> >Can anyone point to a company that supplies a UNIX Fortran compiler which
> >executes much faster than f77 (say, on par with the VMS compilers or better)?
> 
> Actually there is limited room for improvement.  The 4.2BSD compiler is
> considerably better than the original f77 in that respect.  Published
> work (by Jack Dongarra at Argonne National Laboratory
> [dongarra@anl-mcs]) shows about 25-30% slower runtimes for the 4.2BSD
> compiler over the VMS 4.1 compiler, for dense linear algebra.

Let's consider two cases.  First, pure floating point crunching as exemplified 
by the double precision LINPACK benchmark from Jack Dongarra-DONGARRA@ANL-MCS.
On a VAX-11/750 with FPA, this program compiled runs some 30% faster
on VMS (compiled with the VMS v4.1 Fortran compiler) than on 4.2BSD
Unix (with the optimizer on and with the Donn Seely f77 patches applied).

Second, let's consider the Whetstone benchmark.  For single precision
calculations, the VMS program (v3 compiler) ran 220% faster than the 4.2
program!  Why the difference between the Whetstones and the LINPACK
results?  I think the difference is largely due to the terribly
inefficient Unix math library functions: the loop

	T1=0.50025
	X=0.75
	DO 110 I=1,N11
		X=SQRT(EXP(ALOG(X)/T1))
110	CONTINUE

runs 4.9 times faster on VMS than on Unix.  4.3BSD supposedly has a
math library optimized for a VAX; let's hope so!!

Another related issue.  4.2BSD f77 with the optimizer on is VERY SLOW;
it takes about 2-4 times longer to compile a program than VMS fortran.

The bottom line is that VMS provides a significantly more efficient
Fortran system than 4.2BSD for VAXes.  Our users note the difference!
As for general software development and general timesharing, I will choose Unix
4.2BSD any day of the week over VMS; but maybe all this explains why
our 8600 runs VMS.
-- 
Steve Grandi, National Optical Astronomy Observatories, Tucson, AZ, 602-325-9228
{arizona,decvax,hao,ihnp4,seismo}!noao!grandi  noao!grandi@lbl-csam.ARPA

conor@Glacier.ARPA (Conor Rafferty) (07/27/85)

In article <867@oddjob.UUCP> sra@oddjob.UUCP (Scott R. Anderson) writes:
>These results verify that the UNIX f77 compiler is not as efficient
>as the VMS fortran compiler, but the "slowness" factor is actually
>14-19%, not 25-30%.


The comparison between compilers is quite machine dependent, 
hence my vague "25-30%". You quote the numbers for the 780/FPA, which
reflect best on the BSD compiler. I think the rest of the numbers
are also interesting. This is Dongarra again; I have added the time
differences in parenthesis.

                                        RATIO     MFLOP     TIME   UNIT 
Double:  
======
VAX 11/785 FPA    VMS v4.1               63       .20     3.50   10.2
                  UNIX 4.2 bsd f77       67       .18     3.75   10.9 (7%)

VAX 11/780 FPA    VMS v4.1               89       .14     4.96   14.4
                  UNIX 4.2 BSD f77      101       .13     5.67   16.5 (14%)

VAX 11/750 FPA    VMS v4.1               99       .12     5.52   16.1
                  UNIX 4.2 bsd f77      128      .096     7.15   20.8 (30%)

VAX 11/750        VMS v4.1              215      .057     12.1   35.1
                  UNIX 4.2 bsd f77      422      .029     23.7   69.0 (96%)

Single:
======
VAX 11/785 FPA    VMS v4.1               31       .40     1.72    5.02
                  UNIX 4.2 bsd f77       40       .31     2.27    6.50 (31%)

VAX 11/780 FPA    VMS v4.1               49       .25     2.74    7.98
                  UNIX 4.2 BSD f77       58       .21     3.25    9.47 (19%)

VAX 11/750 FPA    VMS v4.1               67       .18     3.75    10.9
                  UNIX 4.2 bsd f77       91       .13     5.12    14.9 (37%)

VAX 11/750        VMS v4.1               38      .089     7.71    22.5
                  UNIX 4.1 bsd f77       04      .060     11.4    33.3 (48%)


Can anyone explain to me why the comparison comes out so different
on the 780/FPA and the 750, for instance?
Anyway. Cheers,

conor rafferty == decwrl!glacier!conor == conor@su-glacier.arpa

chris@umcp-cs.UUCP (Chris Torek) (07/28/85)

>Can anyone explain to me why the comparison comes out so different
>on the 780/FPA and the 750, for instance?

(Fools rush in...)

I believe that the VMS compiler goes to great pains to avoid using
the floating point instructions when another instruction will (at
least 95% of the time) achieve the same result.  The original 4.2
f77 used "movf" and "movd" to copy real*4 and real*8 variables back
and forth; VMS Fortran uses "movl" and "movq".

Amusing anecdote: this apparently caused someone grief when a
program using real*8 datatypes to move integer or string values
around ``worked just fine under VMS, and gives me a "Floating
exception (core dumped)" under Unix.  What's wrong with the Unix
compiler?''  The problem, of course, was that the values being
moved were illegal floating point numbers, and were causing reserved
operand faults when "movd" tried to read them.

The f77 compiler that will be in 4.3BSD now uses movl and movq....
-- 
In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 4251)
UUCP:	seismo!umcp-cs!chris
CSNet:	chris@umcp-cs		ARPA:	chris@maryland

doug@escher.UUCP (Douglas J Freyburger) (07/28/85)

> >Can anyone point to a company that supplies a UNIX Fortran compiler which
> >executes much faster than f77 (say, on par with the VMS compilers or better)?
> 
> Actually there is limited room for improvement.  The 4.2BSD compiler is
> considerably better than the original f77 in that respect.  Published
> work (by Jack Dongarra at Argonne National Laboratory
> [dongarra@anl-mcs]) shows about 25-30% slower runtimes for the 4.2BSD
> compiler over the VMS 4.1 compiler, for dense linear algebra.  I've
> also coded some sparse linear algebra (essentially Yalepack) in
> assembly and found only 30-35% speed up.  Let's give credit where
> credit is due!
> 

I'm sorry, but I have troubles giving credit to a compiler
that is fully 30% slower than a competitors compiler for the
same machine architecture in the same language.  Does going
from blind translation into assembler really cost 30% more
execution time?  I always thought that good optimization was
less than that.  Does the Berkeley compiler do no
loop-invarient migration, common expression elimination or
ANYTHING?  It is true that DEC worked very hard optimizing
its ForTran's output, but 30%?  I haven't done ForTran work
on any on my unix machines yet, just C and Pascal, and this
makes me pretty happy about it.

Now I really understand the motivation behind the original
posting.  The only one I know about for unix is Green
Hills.  They have C, Pascal, ForTran (and P/LM?) for
assorted machines especially unix VAXen.

Doug Freyburger		DOUG@JPL-VLSI, DOUG@JPL-ROBOTICS
JPL 171-235		...escher!doug, doug@aerospace
Pasadena, CA 91106	etc.

pavlov@hscfvax.UUCP (840033@G.Pavlov) (07/29/85)

 Fortran (unfortunately ?) is important to us; we've looked at Fortran execu-
 tion times closely on our Sys III derivative, compared to BSD 4.2, Tops 20,
 and VAX VMS. 
 
 The primary problems are most definitely in the math libraries. A loop such
 as
    DO .....

     j = j+k
     r = a*b
       ......

will stay within the 30% performance range mentioned previously. But insert
cos(), atan(), etc, and Unix Fortran slows to a crawl.

I/O isn't much better .................


  greg pavlov, FSTRF, Amherst, N.Y.

conor@Glacier.ARPA (Conor Rafferty) (07/29/85)

>Amusing anecdote: this apparently caused someone grief when a
>program using real*8 datatypes to move integer or string values
>around...

Even more amusing, one such program was SPICE (Berkeley's
famous circuit simulation program).

conor rafferty == decwrl!glacier!conor == conor@su-glacier.arpa

richmon@astrovax.UUCP (Michael Richmond) (07/29/85)

>
>Now I really understand the motivation behind the original
>posting.  The only one I know about for unix is Green
>Hills.  They have C, Pascal, ForTran (and P/LM?) for
>assorted machines especially unix VAXen.
>
>Doug Freyburger		DOUG@JPL-VLSI, DOUG@JPL-ROBOTICS

  sorry, but in the course of my search I called up Green Hills. Although
they advertise such a compiler, or seem to, anyway, I was told that
they had nothing of the sort for market now. Any other ideas?
 

-- 
Michael Richmond			Princeton University, Astrophysics

{allegra,akgua,burl,cbosgd,decvax,ihnp4,noao,princeton,vax135}!astrovax!richmon

michael@python.UUCP (M. Cain) (07/29/85)

It is educational to consider the origins of the f77 program
when asking questions about why the code it produces runs so
slowly.  According to the original BTL documentation, the author's
real motivation was to have the first full '77 Standard compiler.
It was generated in a hurry using lex and yacc, and the I/O library
was thrown together very quickly.  I believe that it met the
author's goal, but the approach is not one that generally
leads to a production-quality compiler.

As a benchmark of the quality control, the distributed, "supported"
f77 that I used soon after I joined BTL had a bug in the format-free
input routines for floating point numbers.  It caused a value like
-1.2 to be stored as -0.8.  Why?  Because the minus sign was applied
only to the integer part of the number, and then the integer and
fractional parts were added together.  Fixing the source for this
routine not only made it correct, but reduced its size considerably.

My own experience is that recoding routines in C results in
a 30-35% improvement in speed -- about the same as people are
quoting for the VMS compiler.

Michael Cain
Bell Communications Research

arnold@gatech.CSNET (Arnold Robbins) (07/31/85)

Has anyone cosidered writing an f77 front end for the Amsterdam Compiler
Kit?  I understand that the back end(s) generates pretty good code. 'All'
you'd need to do is write (or port) the front end for it.  (If only someone
wanted to pay me to do it...)
-- 
Arnold Robbins
CSNET:	arnold@gatech	ARPA:	arnold%gatech.csnet@csnet-relay.arpa
UUCP:	{ akgua, allegra, hplabs, ihnp4, seismo, ut-sally }!gatech!arnold

Hello. You have reached the Coalition to Eliminate Answering Machines.
Unfortunately, no one can come to the phone right now....

donn@utah-cs.UUCP (Donn Seeley) (08/01/85)

I wasn't going to get into this discussion, since my opinions on f77
are well known and (in part) unprintable, but I decided I ought to
contribute a few remarks on Mr. Freyburger's article...

	From: doug@escher.UUCP (Douglas J Freyburger)

	I'm sorry, but I have troubles giving credit to a compiler that
	is fully 30% slower than a competitors compiler for the same
	machine architecture in the same language.

I'm not sure where Mr. Freyburger comes from...  He doesn't seem to be
acquainted with bad software.  The 4.1 BSD f77 compiler was quite
capable of producing code that ran 2 or 3 times slower than code from
VMS Fortran.  The VMS compiler ran faster, too.  Bad code quality is
not uncommon in the industry, unfortunately, and VMS Fortran is a
shining example of how to do the job right.

Also, unless the numbers have changed since I last saw them, the
margin VMS Fortran has over 4.2 f77 on the LINPACK benchmark is not
as great as 30%; someone else has mentioned this as well.

	Does going from blind translation into assembler really cost
	30% more execution time?  I always thought that good
	optimization was less than that.

It depends on what kind of machine you're on, what kind of compiler
you've got, and how good you are at assembly programming.  When I
rewrite routines into assembler, I'm disappointed if I can't double
their speed; if I can't do that well, I rarely bother.  This applies
even to C routines.  The VAX has a number of peculiar instructions that
make assembly coding more useful, however...  (For example I recently
doubled the speed of Berkeley Mail by recoding fgets() and fputs() in
assembler.  A Fortran program which did direct I/O improved in speed by
a full order of magnitude when loaded with the 4.3 BSD fread()/fwrite().
Due to the peculiarities of the VAX architecture, writing these
routines in C would have been very machine-dependent at best and
impossible at worst.)

	Does the Berkeley compiler do no loop-invarient migration,
	common expression elimination or ANYTHING?

The Berkeley compiler does loop optimization, common subexpression
elimination, register allocation and a number of other optimizations.
As I said above, if you don't do these things, your difference from
the optimum can be 300% instead of 30%.

	It is true that DEC worked very hard optimizing its ForTran's
	output, but 30%?  ...

To say that 'DEC worked very hard' is a gross understatement.  Have you
priced their compiler?  The fact that their Fortran compiler costs
considerably more than an entire distribution from Berkeley should tell
you something...  And I will say right here that if your site does
nothing but number crunching with Fortran, your money is better spent
on the VMS Fortran compiler than on 4.3 BSD.  If you run Unix you must
have other reasons for doing so (and apparently many people do, or you
most likely wouldn't be reading this message).

Have any of you ever wondered what it takes to get someone to work on
free software?  Anybody who knows anything goes off to start their own
company...  Getting people to produce a fast math library for Unix on
the VAX is not unlike getting people to contribute to Harold Stassen's
campaign fund.  (And yes, I still consider the math library to be the
worst aspect of computing with f77.) Only a sucker would waste their
time writing free software for Unix when they could go private and cash
in.

Still a sucker,

Donn Seeley    University of Utah CS Dept    donn@utah-cs.arpa
40 46' 6"N 111 50' 34"W    (801) 581-5668    decvax!utah-cs!donn

peters@cubsvax.UUCP (Peter S. Shenkin) (08/02/85)

In article <> donn@utah-cs.UUCP writes:
>I wasn't going to get into this discussion, since my opinions on f77
>are well known and (in part) unprintable, but I decided I ought to
>contribute a few remarks on Mr. Freyburger's article...
>
>	From: doug@escher.UUCP (Douglas J Freyburger)
>
>	I'm sorry, but I have troubles giving credit to a compiler that
>	is fully 30% slower than a competitors compiler for the same
>	machine architecture in the same language.
>
>>>>>>>>>>  etc.  >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>

I sympathize with the Freyburgers and am grateful to the Seeleys of the 
world.  But what I wonder is why DEC doesn't market a version of their
FORTRAN compiler that will run under UNIX, or at least under ULTRIX!!
Seems that it shouldn't take them too much work (I speak from blissful
ignorance), and would require only a few extensions:  ability to link
to C programs, and ability to get UNIX command-line arguments.

Some context:  we do biological image processing, for which all the 
programs are in C, and have many wierd devices on line whose drivers were
easier to write under UNIX than they would have been under VMS.  We also do
molecular modeling, for which most of the code is in FORTRAN, much of it
ported from VMS sites.  We're thinking of going to VMS, at which point
we'll have to shell out additional mucho bucks for DEC's FORTRAN...  if
we could shell it out now and run under UNIX, we'd have the best of
both worlds....

>>>>>>>>>>>>>>>>>>> Are you listening, DEC? <<<<<<<<<<<<<<<<<<<<<<<<<

Peter S. Shenkin		philabs!cubsvax!peters
Columbia Univ., Department of Biological Sciences.