[net.unix-wizards] f77 performance

T S Lande <bassen@oslo-vax.ARPA> (11/30/84)

	 A user on our VAX-11/780 are running performance-tests on
	 different machines. The program is aprox. 7000 lines of
	 FORTRAN. The performance on the 780 is only 0.75 of a 750
	 running VMS. The comparison was done between computer-bound
	 parts of the program. When it comes to I/O it's even worse.

	 I have heard that the f77-compiler is not too good. Is this
	 the main reason or is UNIX giving less troughput than VMS?
	 What about C? Would C-coded programs increase performance
	 significantly?
	 
	 Bassen

donn@utah-gr.UUCP (Donn Seeley) (11/30/84)

From T S Lande <bassen@oslo-vax.ARPA>:

	A user on our VAX-11/780 are running performance-tests on
	different machines. The program is aprox. 7000 lines of
	FORTRAN. The performance on the 780 is only 0.75 of a 750
	running VMS. The comparison was done between computer-bound
	parts of the program. When it comes to I/O it's even worse.

I assume you're talking about the 4.2 BSD compiler since your machine
prompts with '4.2 BSD UNIX (oslo-vax)' when I telnet to it...  There
are a few things you should know about f77 performance:

 +	The distributed f77 compiler is a mess.  I have installed > 50
	bug fixes in the compiler since I got it a year or so ago.
	Some of these fixes and some minor performance enhancements
	have been posted to the net.

 +	The performance of f77 routines which don't use I/O or math
	functions is reasonable if the optimizer is enabled when you
	compile.  Reasonable means within 10-20% of code compiled under
	VMS Fortran.  Optimized f77 code can run twice as fast or
	faster than unoptimized code.

 +	I/O is slow.  Formatted I/O is beastly, although that is partly
	due to the nature of the problem.  Unformatted I/O will get a
	big boost when the new Berkeley C library appears, because the
	limiting factor currently is the speed of fread() and
	fwrite().  For large writes, the improvement is amazing: on the
	order of 10 times faster.  The new fread() and fwrite() have
	essentially the same modifications as the System V r2
	versions.  Not much can be done to help until the next
	release, though...

 +	The portable math library is slow.  Any program that calls it
	loses horribly compared to the same program compiled with the
	very nice math library on VMS.  The speed of math functions can
	be approximately doubled by using the 'native math library'
	(/usr/lib/libnm.a).  Unfortunately both of these libraries do
	all of their computations in double precision, another major
	lose.  I have spent some time recently hacking at f77 to use a
	new single-precision library sqrt() from Berkeley and have
	found that at least one benchmark runs 5.5 times faster when it
	uses the enhanced single-precision native sqrt() instead of the
	portable math library sqrt().  There is some hope that these
	hacks will find their way into f77 in the next Berkeley
	distribution.

I distribute f77 fixes via ftp to ARPAnet sites; contact me by mail if
you want a set.  I have been providing tapes to the occasional
desperate person without ARPAnet access but as my boss reminds me on a
regular basis, no one is paying me (or him) for the work I do on the
f77 (or C!) compilers.  This is also my excuse for the fact that the
production of bug fixes and bug reports is irregular at best...  If
it's any reassurance, most of the work that I have done, together with
work from numerous other people who have contributed to the current
version of the compiler, will be incorporated in the next Berkeley
release.

	What about C? Would C-coded programs increase performance
	significantly?

It depends on the nature of the program.  Some programs will benefit
vastly, others less so, and if you fail to optimize your C by hand then
you can certainly do much worse than f77.  f77 does the job of
allocating register variables, moving invariant code out of loops,
finding common subexpressions and so on automatically, but programmers
must do these things to a C program themselves (with most current C
compilers).  C loses badly for applications which require
single-precision floating point, a problem which the ANSI C committee
is addressing.  It's a difficult decision to make and it can depend a
lot on issues other than raw compute speed, such as portability,
programmer productivity and so on.

Hope this helps,

Donn Seeley    University of Utah CS Dept    donn@utah-cs.arpa
40 46' 6"N 111 50' 34"W    (801) 581-5668    decvax!utah-cs!donn

mikem@uwstat.UUCP (11/30/84)

> 
> 	 A user on our VAX-11/780 are running performance-tests on
> 	 different machines. The program is aprox. 7000 lines of
> 	 FORTRAN. The performance on the 780 is only 0.75 of a 750
> 	 running VMS. The comparison was done between computer-bound
> 	 parts of the program. When it comes to I/O it's even worse.

I don't know about the i-o, but compute times could easily be like this
if one machine had a Floating Point Unit and the other did not.  My
limited timings indicate that with FPA units, a 750 and 780 are about
equivalent on floating point intensive code.

-- 

Mike Meyer --  Phone (608) 262-1157

EASY ARPA:	mikem@statistics
CORRECT ARPA:	mikem@wisc-stat.arpa
UUCP	...!{allegra,ihnp4,seismo,ucbvax,
	     pyr_chi,heurikon,uwm-evax}!uwvax!uwstat!mikem