[comp.sys.sun] Floating-Point Computation on Suns

dgh@sun.com (David Hough) (04/05/89)

I just went through a visit with a customer who complained bitterly about
the quality of our math library and then admitted that he hadn't read or
even opened the shrink wrap on the Floating-Point Programmer's Guide in
the SunOS 4.0 document crate.  This is too bad; he might have found some
useful information there.

Some recent sunspots postings suggest this customer is not unique.  If you
want to get the most out of a floating-point-intensive computation on a
Sun, it would be an excellent idea to read the Floating-Point Programmer's
Guide, part number 800-1552-10.  It was written for SunOS 3.1 but most of
the information is still relevant.  And don't forget to also read the
addendum for SunOS 4.0 which is contained in part number 800-1789-10 which
has the somewhat misleading title "Software READ THIS FIRST Programmer's
Guides Minibox".  That addendum was kluged together at the last minute
when it became clear that I wouldn't have time for a complete rewrite for
4.0.  It's not as comprehensive but the information density is high.

Once again I have promised a complete rewrite for 4.1.  In the hope that I
will fulfill the promise, I'd get glad to hear comments from anybody who's
looked over the existing documentation.

David Hough 					dgh@sun.com

prl@eiger.uucp (04/27/89)

In article <8903240124.AA03033@dgh.sun.com> you write:
>X-Sun-Spots-Digest: Volume 7, Issue 225, message 8 of 11
>
>I just went through a visit with a customer who complained bitterly about
>the quality of our math library and then admitted that he hadn't read or
>even opened the shrink wrap on the Floating-Point Programmer's Guide in
>the SunOS 4.0 document crate.  This is too bad; he might have found some
>useful information there....
>Some recent sunspots postings suggest this customer is not unique....

As one of the people who has sharply criticised Sun's C maths library (the
speed of most important functions in the C maths library is between 3-10
times slower than what it should be), I would like to respond to this.

I agree that it is a good idea to read the Floating-Point Programmer's
Guide, and I had, before posting.

Neither the guide nor the READ THIS FIRST in fact say anything relevant
about my criticism of the C maths library (SunOS 4.0 and 4.0.1, but
probably earlier ones as well).

The problem is that the C maths library doesn't use the assembly-language
functions available in either the 68881 nor in the Sun3 FPA.

Functions like sqrt(), sin(), cos() are simply the C versions of these
functions, compiled with the appropriate compiler flag.

If sqrt() is replaced by a function which uses the 68881 fsqrtx
instruction the square root evaluation is 10 times faster.

I think this is very good reason to complain (even bitterly) about the
quality of the library.

The FP Programmer's Guide concentrates almost entirely on FORTRAN, and
reading it would not yield any useful information about this problem.

	"The /usr/lib/f*.il files' primary application is to accelerate
	calculations involving complex and doublecomplex data types in
	FORTRAN. ... intensive complex arithmetic may be twice as fast
	with inline expansion" p. 112

	"With cc, use of almost any of the functions defined in <math.h>
	invokes switched floating point [not true in 4.0 and later,
	corrected in 4.0 Software Read Me First - prl] using [inlining]
	causes these calls to switched floating point to be replaced by
	inline code or calls to appropriate unswitched routines." p. 112

Quotes from Sun Floating Point Programmer's Guide, Part Number
800-1552-10, Revision A, of 19 September 1986. This rather old document
was what was supplied with our SunOS4.0 documentation set.

The only other place where the comparative performance of the library
functions and inline code is mentioned is in some instructions about how
to hand-inline one of the functions in the FORTRAN Whetstone benchmark (p.
56).

I have sent a copy of my kit for creating a faster maths library to the
moderator; I have not checked the index at the archive server to see if it
arrived. The kit brings a 2 to 10-fold improvement in the speed of
functions from Sun's -lm to my -lmfast.  There is a minor incompatibility
(the same as if you use the inlining facility), that the SysV matherr()
function can never be invoked.

>Once again I have promised a complete rewrite for 4.1.  In the hope that I
>will fulfill the promise, I'd get glad to hear comments from anybody who's
>looked over the existing documentation.

Do you mean the documentation, or the library? Both could do with a good
overhaul.

Sony manages to run these functions 5-10 times faster on nearly the same
hardware, by having a decent implementation of the maths library and not
forcing the user to depend on obscure and poorly documented hacks.

My timings indicate that if libm is reasonably implemented, there is
*little* speedup to be gained from using inlined code over calling the
assembly instruction via a subroutine call!


					      VVVVVVVVVVVVV
        -lm   -lmfast speedup  inline speedup speedup
	sec	sec  lm/lmfast  sec   lm/inln lmfast/inline
cos()   3.45    0.83    4.16    0.82    4.21    1.01
sin()   3.23    0.75    4.31    0.73    4.42    1.03
tan()   4.60    1.63    2.82    1.57    2.93    1.04
acos()  3.55    2.00    1.77    1.97    1.80    1.02
asin()  3.95    2.05    1.93    1.90    2.08    1.08
atan()  2.97    1.23    2.41    1.23    2.41    1.00
log()   4.42    1.27    3.48    1.28    3.45    0.99
log10() 4.17    2.07    2.01    1.93    2.16    1.07
log2()  3.43    1.93    1.78    1.95    1.76    0.99
exp()   3.15    1.42    2.22    1.25    2.52    1.14
exp10() 4.90    1.75    2.80    1.67    2.93    1.05
exp2()  3.32    1.75    1.90    1.68    1.98    1.04
sqrt()  12.37   1.18    10.48   1.18    10.48   1.00
cosh()  2.75    1.95    1.41    1.82    1.51    1.07
sinh()  3.03    1.75    1.73    1.68    1.80    1.04
tanh()  3.33    2.00    1.67    1.93    1.73    1.04
atanh() 2.32    2.12    1.09    2.10    1.10    1.01

All times for 50000 calls, loop overhead subtracted, but subroutine
call overhead (naturally) not subtracted. Note that the 10* improvement
is not idle talk, sqrt() really is that bad in -lm!
The routines in -lmfast could be further optimised; they were created
using the Sun inline library. The timings are for SunOS4.0, Sun3/260,
	cc -O -f68881 ... {-lm, -lmfast, /usr/lib/f68881/libm.il}



Peter Lamb				uucp:  seismo!mcvax!ethz!prl
Tel: (01) 256 5241 (Switzerland)	eunet: prl@iis.ethz.ch
     +411 256 5241 (International)

Integrated Systems Laboratory
ETH-Zentrum
8092 Zurich