prl@eiger.uucp (03/01/89)
If you use libm (especially trancendental functions sin(), cos(), ln(),
exp() etc..) you can get nearly a 10* speedup by using the clumsy code
inlining facility provided by Sun's C complier.
For example (on Sun 3/60, SunOS 4.0.1)
The following code:
#include <math.h>
main()
{
register int i;
register double x, y;
for(i = 0, x = 0; i < 100000; i++, x += 2*M_PI/100000.0)
y = cos(x);
}
Compiled with:
cc -O -f68881 -o cos cos.c -lm
Runs in:
real 0m30.16s
user 0m24.56s
sys 0m0.58s
Compiled with (but how incredibly *UGLY*):
cc -O -f68881 -o cos cos.c /usr/lib/f68881/libm.il
Runs in:
real 0m4.33s
user 0m3.65s
sys 0m0.20s
REASON:
Although Sun went to the trouble of making the assembly inline
file /usr/lib/f68881/libm.il, and a 68881 version of the
maths library, they *DID NOT* make assembly versions of
the maths functions to put into the maths library!
(and similarly for the FPA)
THIS IS STUPID!
NOTA BENE, Sun:
We discovered this while benchmarking Sony workstations. Trivial
loops with sin(), cos() etc. calls run 10* faster on the Sony
(20MHz, 68020 version) than on the Sun 3/60, without the need for
the inlining muck.
If Sony can do it right first time, why can't Sun get it right
on their Nth release after introducing the 68881?
Peter Lamb uucp: seismo!mcvax!ethz!prl
Tel: (01) 256 5241 (Switzerland) eunet: prl@iis.ethz.ch
+411 256 5241 (International)
Integrated Systems Laboratory
ETH-Zentrum
8092 Zurich
prl@eiger.uucp (03/02/89)
I have constructed a replacement for libm.a in which most of the C library routines have had their code replaced by code out of the inline replacement library. I will be making this code available in sun-spots or in comp.sources.sun (as appropriate), in a form which requires no distribution of Sun source or binaries, as soon as it has been tested locally. You can expect sqrt() to be more than 10 times faster in this library than in the standard -lm! The speedups are as follows (Sun3/280, SunOS 4.0.1): Motorola 68881 Func -lm -lmfast libm.il secs secs rel secs rel rel -lm -lm -lmfast cos() 9.43 1.85 5.10 1.67 5.65 1.11 sin() 8.87 1.77 5.01 1.65 5.38 1.07 tan() 12.98 1.97 6.59 1.83 7.09 1.08 acos() 5.13 2.37 2.16 2.27 2.26 1.04 asin() 6.40 2.37 2.70 2.20 2.91 1.08 atan() 9.22 1.82 5.07 1.70 5.42 1.07 log() 10.25 2.25 4.56 2.10 4.88 1.07 log10() 6.47 2.35 2.75 2.22 2.91 1.06 log2() 5.33 2.33 2.29 1.48 3.60 1.57 exp() 8.42 1.95 4.32 1.80 4.68 1.08 exp10() 9.37 2.12 4.42 2.02 4.64 1.05 exp2() 6.50 2.10 3.10 1.97 3.30 1.07 sqrt() 14.32 1.20 11.93 1.03 13.90 1.17 cosh() 4.82 2.23 2.16 2.10 2.30 1.06 sinh() 5.32 2.15 2.47 1.98 2.69 1.09 tanh() 5.57 2.33 2.39 2.23 2.50 1.04 atanh() 3.95 2.52 1.57 1.63 2.42 1.55 Weitek FPA Func -lm -lmfast libm.il secs secs rel secs rel rel -lm -lm -lmfast cos() 3.45 0.83 4.16 0.82 4.21 1.01 sin() 3.23 0.75 4.31 0.73 4.42 1.03 tan() 4.60 1.63 2.82 1.57 2.93 1.04 acos() 3.55 2.00 1.77 1.97 1.80 1.02 asin() 3.95 2.05 1.93 1.90 2.08 1.08 atan() 2.97 1.23 2.41 1.23 2.41 1.00 log() 4.42 1.27 3.48 1.28 3.45 0.99 log10() 4.17 2.07 2.01 1.93 2.16 1.07 log2() 3.43 1.93 1.78 1.95 1.76 0.99 exp() 3.15 1.42 2.22 1.25 2.52 1.14 exp10() 4.90 1.75 2.80 1.67 2.93 1.05 exp2() 3.32 1.75 1.90 1.68 1.98 1.04 sqrt() 12.37 1.18 10.48 1.18 10.48 1.00 cosh() 2.75 1.95 1.41 1.82 1.51 1.07 sinh() 3.03 1.75 1.73 1.68 1.80 1.04 tanh() 3.33 2.00 1.67 1.93 1.73 1.04 atanh() 2.32 2.12 1.09 2.10 1.10 1.01 NOTES: 1) -lmfast is my modified libm.a, libm.il is with the use of the Sun-supplied inline code file. 2) Columns entitled `rel' are the speedup relative to the named column. 3) The times for each routine are for 50000 calls, with parameter values in the range from slightly more than 0.0 to slightly more than 1.0, spaced linearly by 1.0/50000.0. 4) Loop overhead has been subtracted, but not subroutine call overhead. Peter Lamb uucp: uunet!mcvax!ethz!prl eunet: prl@ethz.uucp Tel: +411 256 5241 Integrated Systems Laboratory ETH-Zentrum, 8092 Zurich
self@bayes.arc.nasa.gov (Matthew Self) (04/19/89)
John Schultz compiled the following timings for Sun's math libraries using GCC and CC with various options: > My results, running on Sun 3/60, Sun OS 3.5, GNU CC 1.32 built using > default switches were > > gcc -lm === 4.6 real 4.2 user 0.0 sys > gcc -m68881 -lm === 4.4 real 4.2 user 0.0 sys > gcc -O -m68881 -lm === 4.4 real 4.1 user 0.0 sys > gcc -O -g -m68881 -lm === 5.4 real 4.2 user 0.1 sys > cc -lm === 159.4 real 146.7 user 0.6 sys > cc -O -lm === 155.4 real 146.2 user 0.4 sys > cc -f68881 -lm === 6.6 real 4.6 user 0.1 sys > cc /usr/lib/f68881.il === 9.9 real 6.7 user 0.1 sys > cc -O /usr/lib/f68881.il === 6.5 real 6.4 user 0.0 sys > **********************************************************************/ > #include <math.h> > > main() > { > register int i; > register double x, y; > for(i = 0, x = 0; i < 100000; i++, x += 2*M_PI/100000.0) > y = cos(x); > } I have written an inline math library for GCC which is more than twice as fast as any of these options for this test program. In fact, it permits GCC to determine that the program does nothing at all, so it optimizes it away entirely! I modified the test program slightly to make the return value depend on the computations in the loop so this won't happen. Even with the extra addition I introduced, the program now executes in only 2.5s, more than twice as fast as before. Here is the new test program: #include <math.h> /* my inline ANSI math library */ #define M_PI 3.1415792 /* this isn't defined in ANSI C's math.h */ main() { int i; /* GCC doesn't need register declarations */ double x, y = 0; for(i = 0, x = 0; i < 100000; i++, x += 2*M_PI/100000.0) y += cos(x); if (y == 0) return 0; else return 1; } Availability of this inline math library will be announced soon on the info-gcc mailing list. Mail to info-gcc-request@prep.ai.mit.edu to subscribe. Matthew Self NASA Ames Research Center self@bayes.arc.nasa.gov
self@bayes.arc.nasa.gov (Matthew Self) (04/21/89)
John Schultz compiled the following timings for Sun's math libraries using GCC and CC with various options: > My results, running on Sun 3/60, Sun OS 3.5, GNU CC 1.32 built using > default switches were > > gcc -lm === 4.6 real 4.2 user 0.0 sys > gcc -m68881 -lm === 4.4 real 4.2 user 0.0 sys > gcc -O -m68881 -lm === 4.4 real 4.1 user 0.0 sys > gcc -O -g -m68881 -lm === 5.4 real 4.2 user 0.1 sys > cc -lm === 159.4 real 146.7 user 0.6 sys > cc -O -lm === 155.4 real 146.2 user 0.4 sys > cc -f68881 -lm === 6.6 real 4.6 user 0.1 sys > cc /usr/lib/f68881.il === 9.9 real 6.7 user 0.1 sys > cc -O /usr/lib/f68881.il === 6.5 real 6.4 user 0.0 sys > **********************************************************************/ > #include <math.h> > > main() > { > register int i; > register double x, y; > for(i = 0, x = 0; i < 100000; i++, x += 2*M_PI/100000.0) > y = cos(x); > } I have written an inline math library for GCC which is more than twice as fast as any of these options for this test program. In fact, it permits GCC to determine that the program does nothing at all, so it optimizes it away entirely! I modified the test program slightly to make the return value depend on the computations in the loop so this won't happen. Even with the extra addition I introduced, the program now executes in only 2.5s, more than twice as fast as before. Here is the new test program: #include <math.h> /* my inline ANSI math library */ #define M_PI 3.1415792 /* this isn't defined in ANSI C's math.h */ main() { int i; /* GCC doesn't need register declarations */ double x, y = 0; for(i = 0, x = 0; i < 100000; i++, x += 2*M_PI/100000.0) y += cos(x); if (y == 0) return 0; else return 1; } This inline math library was recently posted to the info-gcc mailing list (gnu.gcc newsgroup). If you can't obtain a copy there, let me know and I will send you a copy. Matthew Self NASA Ames Research Center self@bayes.arc.nasa.gov
dav@hplabs.hp.com (David L. Markowitz) (05/06/89)
self@bayes.arc.nasa.gov (Matthew Self) writes: > John Schultz compiled the following timings for Sun's math libraries using > GCC and CC with various options: > > > My results, running on Sun 3/60, Sun OS 3.5, GNU CC 1.32 built using > > default switches were > > [gcc timings deleted] > > cc -lm === 159.4 real 146.7 user 0.6 sys > > cc -O -lm === 155.4 real 146.2 user 0.4 sys > > cc -f68881 -lm === 6.6 real 4.6 user 0.1 sys Is this ^^^ a Typo? Maybe 6.4? > > cc /usr/lib/f68881.il === 9.9 real 6.7 user 0.1 sys > > cc -O /usr/lib/f68881.il === 6.5 real 6.4 user 0.0 sys [program deleted] > [discussion about GCC inline library reducing user time to 2.5s deleted] I would like to point out an error in the timing tests done above. The inline expansions only help in math library function calls - not in built-in floating point operations (like * and /). The -f68881 option does the opposite - it helps built-in math, but not math library function calls. Both are needed here. The optimal compiler command is therefore "cc -O -f68881 /usr/lib/f68881.il cos.c", which on my Sun 3/60 under SunOS 3.4 yields 3.7s of user time, which - while not as good as GCC with inlines - is still a lot better than the above matrix. Will this GCC inline stuff make it into a future GCC distribution? -- David L. Markowitz Rockwell International ...!sun!sunkist!arcturus!dav dav@arcturus.UUCP The above opinions are merely that, and only mine.