[comp.sys.atari.st] Comparison of C Compilers on Different Machines

jmk@asr2.UUCP (06/05/87)

Several articles have been posted here which give benchmark results of
the popular C compilers for the ST. I have Lattice version 3.03.01, and am
extremely dissatisfied with its performance on the only benchmark that
really counts: what *I* am trying to do with it. The program predicts
Earth satellite orbit positions and spends 99% of its time number-crunching.

While most benchmarks show Lattice to be somwhat slow in terms
of execution speed, what I am looking for is a factor of twenty improvement.
My wish is not unreasonable, since I have compiled the identical program
on a PC6300+ (80286 @6 MHz) using Microsoft C, and it executed 22 times
faster. Is there any reason why the ST can't outcrunch an AT-clone?

My "benchmark" results are given below for all the machines/compilers
I have tried. If owners of Megamaxx, Mark Williams, and others would
be so kind as to compile the program for me, please send me E-mail and
I will mail you back the source.

The basic problem is: given the orbital parameters of a near-earth 
satellite, the location of the observer, and the day of interest,
compute the position (azimuth, elevation, range, latitude, longitude)
of the satellite for the whole day in two-minute increments. A record
is printed to a file every time the satellite is above the horizon.

Here is a sample of the output:

	noaa-9 Element Set 158

	Doppler calculated for freq = 137.500000 MHz
	Saturday  9 May 87  ----Orbit # 12384-----
 	U.T.C.   Az   El  Doppler  Range  Height  Lat  Long  Phase(360)
	0822:00   26    3    3020    3052     866   61    61   160
	0824:00   34   12    2884    2342     866   54    65   167
	0826:00   50   23    2501    1725     866   47    69   174
	0828:00   83   34    1498    1356     866   40    71   181
	0830:00  127   31    -367    1446     866   34    73   188
	0832:00  153   19   -1974    1933     866   27    75   195
	0834:00  165    8   -2699    2598     865   20    77   202
	0836:00  172    1   -2971    3330     865   13    79   209
	Saturday  9 May 87  ----Orbit # 12385-----
 	U.T.C.   Az   El  Doppler  Range  Height  Lat  Long  Phase(360)
	1002:00    2    0    3051    3373     865   68    80   153

	(etc.)

The source program is public-domain and is written in a generic style
that compiles with very few changes on all compilers I have tried so
far. Here is a summary of the results:

machine		compiler	run time (s)	accuracy
===========	============	========	==========
520ST		Lattice 3.01	480		very good
AT&T 6300+	Microsoft 	22		excellent
VAX 11/785	Unix V cc	4		excellent
Amiga 1000	Manx 		25		poor

Notes -
	My only means of checking the accuracy was to actually pick
	up the VHF beacon of a polar-orbiting satellite as it passed
	overhead. The Lattice results predict the appearance of the
	beacon to within a minute or so and thus my only gripe is
	the abysmally slow speed.

	Curiously, the 6300+ and VAX outputs were *identical*. The
	output is line after line of numbers, yet 'diff' produces
	no output! Obviously, the two compilers are using some kind
	of floating-point standard (IEEE) as well as standard techniques
	of computing trigonometric functions. I therefore am assuming that
	they set the accuracy standard. The Lattice results were 
	slightly different.

	The Amiga/Manx results were obtained using the '-lm' library,
	which the documentation calls the 'Motorola Fast Floating Point
	Package.' Well, it's fast. Unfortunately, the numeric results
	were unusable for this application. They were completely off.
	Manx has other math packages, including IEEE standard. When
	compiled with this package, execution time shot up to 400 seconds,
	but I must've done something wrong, because the output
	was even more ridiculous. If anyone has an Amiga who can try
	out the program and do it right, drop me a line.

	An indication of where at least the VAX was spending all its
	time is given by prof:

 	%Time Seconds Cumsecs  #Calls   msec/call  Name
  	39.9    1.98    1.98    5775      0.343   _sin
  	11.1    0.55    2.53      83      6.6     _write
   	8.4    0.42    2.95    3875      0.108   _sqrt
   	4.4    0.22    3.17    7944      0.027   _cos
   	4.4    0.22    3.38     722      0.30    _asin
   	3.4    0.17    3.55     722      0.23    _Kepler
   	3.4    0.17    3.72       1    167.      _main
   	3.0    0.15    3.87     725      0.21    _exp
   	2.7    0.13    4.00     722      0.18    _GetSubSatPoint
   	2.7    0.13    4.13     722      0.18    _GetBearings
   	2.7    0.13    4.27     722      0.18    _tan
   	2.7    0.13    4.40                      mcount
   	2.0    0.10    4.50                      _frexp
   	2.0    0.10    4.60    4613      0.022   _ldexp
   	1.0    0.05    4.65     725      0.07    _log
   	1.0    0.05    4.70    1444      0.035   _atan
   	0.7    0.03    4.73    1444      0.023   _acos
	(etc.)

	Isn't it weird how sin takes so much longer than cos?
	
Any help on this problem is appreciated. I am leaning towards the conclusion
that the reason the 6300+ performs so well is that IBM PC software is
written to professional standards. What use is the computing power of
the ST if there isn't any means of tapping it?

ihnp4!asr2!jmk	Joe Knapp AT&T Bell Labs, Columbus OH (614)860-3547

"68000" "8.0 MHz"  [ST trees falling in the middle of a forest]

ali@rocky.STANFORD.EDU (Ali Ozer) (06/05/87)

In article <106@asr2.UUCP> jmk@asr2.UUCP (Joe Knapp) writes:
>My "benchmark" results are given below for all the machines/compilers
>I have tried. The Amiga/Manx results were obtained using the '-lm' library,
>which the documentation calls the 'Motorola Fast Floating Point Package.' ...
>Manx has other math packages, including IEEE standard. When compiled with
>this package, execution time shot up to 400 seconds, but I must've done
>something wrong, because the output was even more ridiculous. 

You said to send email to request the source, except I couldn't get email
through to you... I'd appreciate it if you mailed the source to me;
I'll try to get it working with Amiga's IEEE FP (which does work and which 
should generate the same results as the other IEEE results) and also maybe
get it in the hands of some people with a 68020/68881 Amiga.

Ali Ozer, ali@rocky.stanford.edu, ...decwrl!rocky.stanford.edu!ali

hpai%rolls.uucp@utah-cs.UUCP (HP AI login) (06/06/87)

Oops.  Sorry for the null message.

In article <106@asr2.UUCP> jmk@asr2.UUCP (Joe Knapp) writes:
>machine		compiler	run time (s)	accuracy
>---------------------------------------------------------------
>Amiga 1000	Manx 		25		poor
>

There is a bad problem with one of the header files in the Manx compiler if
you used 3.02a. PI is misdefined!!  It does a good job of really messing
things up if you have a very small tolerance on your equation.  It is only
correct to about 3 decimal points.  I think PI is correct on Manx 3.40a for
the Amiga.

If you want to send email, use the address below, since this is being posted
from a group account.

 /|  |    /|||  /\|		|	John M. Olsen
 \|()|\|\_ |||. \/|/)@|\_	|	1547 Jamestown Drive
  |				|	Salt Lake City, UT  84121-2051
u-jmolse@ug.utah.edu	or  ...!{seismo,ihnp4}!utah-cs!utah-ug!uto-to-t.

braner@batcomputer.UUCP (06/07/87)

[]

The only way to get decent number-crunching performance, especially when
transcendental functions are needed, is to use a floating-point chip.
(That AT&T machine must have a '287 chip that the MS compiler uses.)
While the 32081 chip I'm using on the ST yields speeds comparable with the AT
(when used via Absoft FORTRAN), the very best choice is the 68020/68881
combo, the most powerful chip set in existence for personal machines.
(I can see the flames coming from fans of the '386...)  That combo is used
in the Mac II and in "workstations" such as the Sun.  Now if Atari ever dropped
that stupid IBM-PC clone project (which is now 2 or 3 generations behind the
times!) and developed a 68020/68881 box under $2000...  (wishful thinking).

Note: for many applications (e.g. graphics output routines) you don't need
high precision.  I wrote 4.5-digit precision routines for sin(), cos(), exp(),
log(), sqrt() and erf() that use table-lookup with linear interpolation.
(Each table is only 514 bytes long.)  They are written in 68000 AL and take
about 150 microsecs each (with no FP HW!) - faster than the VAX...

The VAX spent much less time in the cos() function than in the sin() because
(I guess) it just adds PI/2 to the argument and calls sin()...

- Moshe Braner

pes@bath63.UUCP (06/10/87)

I'd suggest (since you seem to use Lattice C already) that you quick go
get the latest upgrade to version 3.04 -- should be out now, I got mine
3 weeks ago.  I haven't formally measured it, but some quick and dirty tests
indicate that the math stuff has apparently been speeded up by an order
of magnitude, roughly (well, somewhere between and octal and a decimal
OoM, anyway).  Might well make you happier.