richmon@astrovax.UUCP (Michael Richmond) (07/17/85)
Can anyone point to a company that supplies a UNIX Fortran compiler which executes much faster than f77 (say, on par with the VMS compilers or better)? Please reply via mail and I will post a summary to the net if one is warranted. We run 4.2BSD on an 11/750 if it makes any difference. -- Michael Richmond Princeton University, Astrophysics {allegra,akgua,burl,cbosgd,decvax,ihnp4,noao,princeton,vax135}!astrovax!richmon
conor@Glacier.ARPA (Conor Rafferty) (07/21/85)
>Can anyone point to a company that supplies a UNIX Fortran compiler which >executes much faster than f77 (say, on par with the VMS compilers or better)? Actually there is limited room for improvement. The 4.2BSD compiler is considerably better than the original f77 in that respect. Published work (by Jack Dongarra at Argonne National Laboratory [dongarra@anl-mcs]) shows about 25-30% slower runtimes for the 4.2BSD compiler over the VMS 4.1 compiler, for dense linear algebra. I've also coded some sparse linear algebra (essentially Yalepack) in assembly and found only 30-35% speed up. Let's give credit where credit is due! Cheers, conor rafferty == decwrl!glacier!conor == conor@su-glacier.arpa
wls@astrovax.UUCP (William L. Sebok) (07/22/85)
In article <9871@Glacier.ARPA> conor@Glacier.UUCP (Conor Rafferty) writes: >Actually there is limited room for improvement. The 4.2BSD compiler is >considerably better than the original f77 in that respect. Published >work (by Jack Dongarra at Argonne National Laboratory >[dongarra@anl-mcs]) shows about 25-30% slower runtimes for the 4.2BSD >compiler over the VMS 4.1 compiler, for dense linear algebra. I've >also coded some sparse linear algebra (essentially Yalepack) in >assembly and found only 30-35% speed up. Let's give credit where >credit is due! Somehow 25%-35% sounds like a fair amount of room for improvement to me. However that is better than the factor of two that VMS advocates have been shoving in my face. -- Bill Sebok Princeton University, Astrophysics {allegra,akgua,burl,cbosgd,decvax,ihnp4,noao,princeton,vax135}!astrovax!wls
sra@oddjob.UUCP (Scott R. Anderson) (07/23/85)
In article <9871@Glacier.ARPA> conor@Glacier.UUCP (Conor Rafferty) writes: >Published work (by Jack Dongarra at Argonne National Laboratory >[dongarra@anl-mcs]) shows about 25-30% slower runtimes for the 4.2BSD >compiler over the VMS 4.1 compiler, for dense linear algebra. I have a copy of Jack Dongarra's Technical Memorandum No. 23 (dated July 18, 1985) which was passed around at a recent conference here on high-performance computing. For those who are interested, here are the results of solving a linear system of equations of order 100 with LINPACK on a VAX 11/780 with FPA: Precision Operating System MFLOPS Time (sec) Unit (usec) --------- ---------------- ------ ---------- ----------- Double VMS v4.1 0.14 4.96 14.4 UNIX 4.2BSD 0.13 5.67 16.5 Single VMS v4.1 0.25 2.74 7.98 UNIX 4.2BSD 0.21 3.25 9.47 (Unit is the execution time for the statement y(i) = y(i) + t * x(i).) These results verify that the UNIX f77 compiler is not as efficient as the VMS fortran compiler, but the "slowness" factor is actually 14-19%, not 25-30%. Scott Anderson ihnp4!oddjob!kaos!sra
grandi@noao.UUCP (Steve Grandi) (07/25/85)
> >Can anyone point to a company that supplies a UNIX Fortran compiler which > >executes much faster than f77 (say, on par with the VMS compilers or better)? > > Actually there is limited room for improvement. The 4.2BSD compiler is > considerably better than the original f77 in that respect. Published > work (by Jack Dongarra at Argonne National Laboratory > [dongarra@anl-mcs]) shows about 25-30% slower runtimes for the 4.2BSD > compiler over the VMS 4.1 compiler, for dense linear algebra. Let's consider two cases. First, pure floating point crunching as exemplified by the double precision LINPACK benchmark from Jack Dongarra-DONGARRA@ANL-MCS. On a VAX-11/750 with FPA, this program compiled runs some 30% faster on VMS (compiled with the VMS v4.1 Fortran compiler) than on 4.2BSD Unix (with the optimizer on and with the Donn Seely f77 patches applied). Second, let's consider the Whetstone benchmark. For single precision calculations, the VMS program (v3 compiler) ran 220% faster than the 4.2 program! Why the difference between the Whetstones and the LINPACK results? I think the difference is largely due to the terribly inefficient Unix math library functions: the loop T1=0.50025 X=0.75 DO 110 I=1,N11 X=SQRT(EXP(ALOG(X)/T1)) 110 CONTINUE runs 4.9 times faster on VMS than on Unix. 4.3BSD supposedly has a math library optimized for a VAX; let's hope so!! Another related issue. 4.2BSD f77 with the optimizer on is VERY SLOW; it takes about 2-4 times longer to compile a program than VMS fortran. The bottom line is that VMS provides a significantly more efficient Fortran system than 4.2BSD for VAXes. Our users note the difference! As for general software development and general timesharing, I will choose Unix 4.2BSD any day of the week over VMS; but maybe all this explains why our 8600 runs VMS. -- Steve Grandi, National Optical Astronomy Observatories, Tucson, AZ, 602-325-9228 {arizona,decvax,hao,ihnp4,seismo}!noao!grandi noao!grandi@lbl-csam.ARPA
conor@Glacier.ARPA (Conor Rafferty) (07/27/85)
In article <867@oddjob.UUCP> sra@oddjob.UUCP (Scott R. Anderson) writes: >These results verify that the UNIX f77 compiler is not as efficient >as the VMS fortran compiler, but the "slowness" factor is actually >14-19%, not 25-30%. The comparison between compilers is quite machine dependent, hence my vague "25-30%". You quote the numbers for the 780/FPA, which reflect best on the BSD compiler. I think the rest of the numbers are also interesting. This is Dongarra again; I have added the time differences in parenthesis. RATIO MFLOP TIME UNIT Double: ====== VAX 11/785 FPA VMS v4.1 63 .20 3.50 10.2 UNIX 4.2 bsd f77 67 .18 3.75 10.9 (7%) VAX 11/780 FPA VMS v4.1 89 .14 4.96 14.4 UNIX 4.2 BSD f77 101 .13 5.67 16.5 (14%) VAX 11/750 FPA VMS v4.1 99 .12 5.52 16.1 UNIX 4.2 bsd f77 128 .096 7.15 20.8 (30%) VAX 11/750 VMS v4.1 215 .057 12.1 35.1 UNIX 4.2 bsd f77 422 .029 23.7 69.0 (96%) Single: ====== VAX 11/785 FPA VMS v4.1 31 .40 1.72 5.02 UNIX 4.2 bsd f77 40 .31 2.27 6.50 (31%) VAX 11/780 FPA VMS v4.1 49 .25 2.74 7.98 UNIX 4.2 BSD f77 58 .21 3.25 9.47 (19%) VAX 11/750 FPA VMS v4.1 67 .18 3.75 10.9 UNIX 4.2 bsd f77 91 .13 5.12 14.9 (37%) VAX 11/750 VMS v4.1 38 .089 7.71 22.5 UNIX 4.1 bsd f77 04 .060 11.4 33.3 (48%) Can anyone explain to me why the comparison comes out so different on the 780/FPA and the 750, for instance? Anyway. Cheers, conor rafferty == decwrl!glacier!conor == conor@su-glacier.arpa
chris@umcp-cs.UUCP (Chris Torek) (07/28/85)
>Can anyone explain to me why the comparison comes out so different >on the 780/FPA and the 750, for instance? (Fools rush in...) I believe that the VMS compiler goes to great pains to avoid using the floating point instructions when another instruction will (at least 95% of the time) achieve the same result. The original 4.2 f77 used "movf" and "movd" to copy real*4 and real*8 variables back and forth; VMS Fortran uses "movl" and "movq". Amusing anecdote: this apparently caused someone grief when a program using real*8 datatypes to move integer or string values around ``worked just fine under VMS, and gives me a "Floating exception (core dumped)" under Unix. What's wrong with the Unix compiler?'' The problem, of course, was that the values being moved were illegal floating point numbers, and were causing reserved operand faults when "movd" tried to read them. The f77 compiler that will be in 4.3BSD now uses movl and movq.... -- In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 4251) UUCP: seismo!umcp-cs!chris CSNet: chris@umcp-cs ARPA: chris@maryland
doug@escher.UUCP (Douglas J Freyburger) (07/28/85)
> >Can anyone point to a company that supplies a UNIX Fortran compiler which > >executes much faster than f77 (say, on par with the VMS compilers or better)? > > Actually there is limited room for improvement. The 4.2BSD compiler is > considerably better than the original f77 in that respect. Published > work (by Jack Dongarra at Argonne National Laboratory > [dongarra@anl-mcs]) shows about 25-30% slower runtimes for the 4.2BSD > compiler over the VMS 4.1 compiler, for dense linear algebra. I've > also coded some sparse linear algebra (essentially Yalepack) in > assembly and found only 30-35% speed up. Let's give credit where > credit is due! > I'm sorry, but I have troubles giving credit to a compiler that is fully 30% slower than a competitors compiler for the same machine architecture in the same language. Does going from blind translation into assembler really cost 30% more execution time? I always thought that good optimization was less than that. Does the Berkeley compiler do no loop-invarient migration, common expression elimination or ANYTHING? It is true that DEC worked very hard optimizing its ForTran's output, but 30%? I haven't done ForTran work on any on my unix machines yet, just C and Pascal, and this makes me pretty happy about it. Now I really understand the motivation behind the original posting. The only one I know about for unix is Green Hills. They have C, Pascal, ForTran (and P/LM?) for assorted machines especially unix VAXen. Doug Freyburger DOUG@JPL-VLSI, DOUG@JPL-ROBOTICS JPL 171-235 ...escher!doug, doug@aerospace Pasadena, CA 91106 etc.
pavlov@hscfvax.UUCP (840033@G.Pavlov) (07/29/85)
Fortran (unfortunately ?) is important to us; we've looked at Fortran execu- tion times closely on our Sys III derivative, compared to BSD 4.2, Tops 20, and VAX VMS. The primary problems are most definitely in the math libraries. A loop such as DO ..... j = j+k r = a*b ...... will stay within the 30% performance range mentioned previously. But insert cos(), atan(), etc, and Unix Fortran slows to a crawl. I/O isn't much better ................. greg pavlov, FSTRF, Amherst, N.Y.
conor@Glacier.ARPA (Conor Rafferty) (07/29/85)
>Amusing anecdote: this apparently caused someone grief when a >program using real*8 datatypes to move integer or string values >around... Even more amusing, one such program was SPICE (Berkeley's famous circuit simulation program). conor rafferty == decwrl!glacier!conor == conor@su-glacier.arpa
richmon@astrovax.UUCP (Michael Richmond) (07/29/85)
> >Now I really understand the motivation behind the original >posting. The only one I know about for unix is Green >Hills. They have C, Pascal, ForTran (and P/LM?) for >assorted machines especially unix VAXen. > >Doug Freyburger DOUG@JPL-VLSI, DOUG@JPL-ROBOTICS sorry, but in the course of my search I called up Green Hills. Although they advertise such a compiler, or seem to, anyway, I was told that they had nothing of the sort for market now. Any other ideas? -- Michael Richmond Princeton University, Astrophysics {allegra,akgua,burl,cbosgd,decvax,ihnp4,noao,princeton,vax135}!astrovax!richmon
michael@python.UUCP (M. Cain) (07/29/85)
It is educational to consider the origins of the f77 program when asking questions about why the code it produces runs so slowly. According to the original BTL documentation, the author's real motivation was to have the first full '77 Standard compiler. It was generated in a hurry using lex and yacc, and the I/O library was thrown together very quickly. I believe that it met the author's goal, but the approach is not one that generally leads to a production-quality compiler. As a benchmark of the quality control, the distributed, "supported" f77 that I used soon after I joined BTL had a bug in the format-free input routines for floating point numbers. It caused a value like -1.2 to be stored as -0.8. Why? Because the minus sign was applied only to the integer part of the number, and then the integer and fractional parts were added together. Fixing the source for this routine not only made it correct, but reduced its size considerably. My own experience is that recoding routines in C results in a 30-35% improvement in speed -- about the same as people are quoting for the VMS compiler. Michael Cain Bell Communications Research
arnold@gatech.CSNET (Arnold Robbins) (07/31/85)
Has anyone cosidered writing an f77 front end for the Amsterdam Compiler Kit? I understand that the back end(s) generates pretty good code. 'All' you'd need to do is write (or port) the front end for it. (If only someone wanted to pay me to do it...) -- Arnold Robbins CSNET: arnold@gatech ARPA: arnold%gatech.csnet@csnet-relay.arpa UUCP: { akgua, allegra, hplabs, ihnp4, seismo, ut-sally }!gatech!arnold Hello. You have reached the Coalition to Eliminate Answering Machines. Unfortunately, no one can come to the phone right now....
donn@utah-cs.UUCP (Donn Seeley) (08/01/85)
I wasn't going to get into this discussion, since my opinions on f77 are well known and (in part) unprintable, but I decided I ought to contribute a few remarks on Mr. Freyburger's article... From: doug@escher.UUCP (Douglas J Freyburger) I'm sorry, but I have troubles giving credit to a compiler that is fully 30% slower than a competitors compiler for the same machine architecture in the same language. I'm not sure where Mr. Freyburger comes from... He doesn't seem to be acquainted with bad software. The 4.1 BSD f77 compiler was quite capable of producing code that ran 2 or 3 times slower than code from VMS Fortran. The VMS compiler ran faster, too. Bad code quality is not uncommon in the industry, unfortunately, and VMS Fortran is a shining example of how to do the job right. Also, unless the numbers have changed since I last saw them, the margin VMS Fortran has over 4.2 f77 on the LINPACK benchmark is not as great as 30%; someone else has mentioned this as well. Does going from blind translation into assembler really cost 30% more execution time? I always thought that good optimization was less than that. It depends on what kind of machine you're on, what kind of compiler you've got, and how good you are at assembly programming. When I rewrite routines into assembler, I'm disappointed if I can't double their speed; if I can't do that well, I rarely bother. This applies even to C routines. The VAX has a number of peculiar instructions that make assembly coding more useful, however... (For example I recently doubled the speed of Berkeley Mail by recoding fgets() and fputs() in assembler. A Fortran program which did direct I/O improved in speed by a full order of magnitude when loaded with the 4.3 BSD fread()/fwrite(). Due to the peculiarities of the VAX architecture, writing these routines in C would have been very machine-dependent at best and impossible at worst.) Does the Berkeley compiler do no loop-invarient migration, common expression elimination or ANYTHING? The Berkeley compiler does loop optimization, common subexpression elimination, register allocation and a number of other optimizations. As I said above, if you don't do these things, your difference from the optimum can be 300% instead of 30%. It is true that DEC worked very hard optimizing its ForTran's output, but 30%? ... To say that 'DEC worked very hard' is a gross understatement. Have you priced their compiler? The fact that their Fortran compiler costs considerably more than an entire distribution from Berkeley should tell you something... And I will say right here that if your site does nothing but number crunching with Fortran, your money is better spent on the VMS Fortran compiler than on 4.3 BSD. If you run Unix you must have other reasons for doing so (and apparently many people do, or you most likely wouldn't be reading this message). Have any of you ever wondered what it takes to get someone to work on free software? Anybody who knows anything goes off to start their own company... Getting people to produce a fast math library for Unix on the VAX is not unlike getting people to contribute to Harold Stassen's campaign fund. (And yes, I still consider the math library to be the worst aspect of computing with f77.) Only a sucker would waste their time writing free software for Unix when they could go private and cash in. Still a sucker, Donn Seeley University of Utah CS Dept donn@utah-cs.arpa 40 46' 6"N 111 50' 34"W (801) 581-5668 decvax!utah-cs!donn
peters@cubsvax.UUCP (Peter S. Shenkin) (08/02/85)
In article <> donn@utah-cs.UUCP writes: >I wasn't going to get into this discussion, since my opinions on f77 >are well known and (in part) unprintable, but I decided I ought to >contribute a few remarks on Mr. Freyburger's article... > > From: doug@escher.UUCP (Douglas J Freyburger) > > I'm sorry, but I have troubles giving credit to a compiler that > is fully 30% slower than a competitors compiler for the same > machine architecture in the same language. > >>>>>>>>>> etc. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I sympathize with the Freyburgers and am grateful to the Seeleys of the world. But what I wonder is why DEC doesn't market a version of their FORTRAN compiler that will run under UNIX, or at least under ULTRIX!! Seems that it shouldn't take them too much work (I speak from blissful ignorance), and would require only a few extensions: ability to link to C programs, and ability to get UNIX command-line arguments. Some context: we do biological image processing, for which all the programs are in C, and have many wierd devices on line whose drivers were easier to write under UNIX than they would have been under VMS. We also do molecular modeling, for which most of the code is in FORTRAN, much of it ported from VMS sites. We're thinking of going to VMS, at which point we'll have to shell out additional mucho bucks for DEC's FORTRAN... if we could shell it out now and run under UNIX, we'd have the best of both worlds.... >>>>>>>>>>>>>>>>>>> Are you listening, DEC? <<<<<<<<<<<<<<<<<<<<<<<<< Peter S. Shenkin philabs!cubsvax!peters Columbia Univ., Department of Biological Sciences.