[comp.arch] Transcendental functions and microcoded instructions

jimv@radix (Jim Valerio) (09/27/87)

I am probably one of the few people who reads this newsgroup that has
microcoded transcendental floating-point instructions.  I wrote significant
portions of the transcendental microcode for the 80387.  I also have written
significant portions of math libraries that implement transcendental
functions.

In this article, I speak as an implementor of the functions, rather than
as a user of the functions.  As a user of the functions, I couldn't care
less whether they are in software, microcode, or hardware, just as long
as they are provided and don't surprise me with their results.

In article <705@gumby.UUCP> earl@mips.UUCP (Earl Killian) writes:
>The 68881 transcendentals are not implemented in hardware; they are
>implemented in microcode.  I believe the extra 0.5-1.5ulp of accuracy
>of the 68881 is due to the use of extended precision calculations, not
>to either hardware or algorithm (simple rational approximations are
>very accurate too when evaluated in extended precision).

The advantage I found of microcoding transcendental instructions over software
implementations of the functions were twofold.

One advantage was that the mantissa computations were done using the unrounded
intermediate results, which had approximately 3 extra bits of significance.
The extra bits in a few critical places allowed simpler approximation functions
to be used that, when rounded, delivered accurate (i.e. correctly rounded)
results.  Without the extra bits, a software implementation would need to
effectively compute double-precision operations.

The other advantage I found with the microcoded implementation over a
software implementation was that it was more convenient to do non-standard
operations.  By this I mean that in a software implementation, I would be
obliged to do add and multiply types of operations (or whatever the instruction
set gave me).  The only alternative would be to break the numbers apart and do
integer operations on the mantissas and exponents.  In the microcoded
implementation, the hardware support for manipulating pieces of floating-point
numbers was easily accessible, and I was not encouraged to think of every
operation I was performing as an arithmetic operation on a floating-point
data type.

The bottom line is that, given the sort of hardware support I had available
(this means area constraints, folks), I was able to get significantly faster
and better approximations using microcode than I could get from software.

Earl goes on to say:
> implementing transcendentals in 68881 microcode did
>nothing to make them fast.  The cycle counts for sin, cos, tan, atan,
>log, exp, etc. average about 3.5 longer for 68881 instructions than
>for MIPS R2000 libm subroutines.

The MIPS implementation is laudable, but there are many more issues than
speed involved here.  One is accuracy.  Often more important than accuracy
is monotonicity.  (If the mathematical function is monotonic over a region,
is the approximated function monotonic over the same region?)  Polynomial
approximation techniques often have monotonicity problems.  Other issues
include working on making simple transcendental identities hold in
floating-point computations (e.g. sin(-x) = -sin(x), exp(-x) = 1/exp(x)).

Now, I suspect that MIPS used the 4.3bsd libm.  This is a very good math
library, with man-years of work put into it by floating-point experts to
give it many of the important attributes of a good library.  It boasts of
high accuracy and no observed monotonicity errors.

Earl expresses some doubt that a microcoded implementation of the
transcendental functions is the right way to gain extra accuracy, and
suggests that providing hardware extended precision would be the better
approach.  Unfortuately, the 4.3bsd library would need to be largely rewritten
in non-trivial ways to take advantage of the extended precision.

I believe that a company committed to excellent floating-point, given the
choice of implementing transcendental functions in software when no excellent
library is available, or implementing them with significant hardware support,
would be crazy to go the software route.  Of course, most companies choose to
forego the excellence, and hope that the users don't notice.

I would like to make a few miscellaneous comments on the transcendental
functions.  The intent here is to indirectly say something about the
difficulty and intricacy of implementing these functions.

I said that the 4.3bsd libm has no observed monotonicity errors.  That means
that test programs running a few million points haven't found one.  That
doesn't mean that the error doesn't exist.  A few million points is a very
small subset of all double precision floating-point numbers.  Most
floating-point libraries haven't been tested even this well.

As a double-precision library, the 4.3bsd libm can do double-duty as a single
precision library, but one must check very carefully to verify that the
rounding of the double-precision result to single precision doesn't introduce
monotonicity errors.  This same concern does not apply to the 80387 and 68881
since the programmer has presumably set the rounding controls first.

The double precision libm would make a slow single precision library.  This
observations holds for the 68881 and 80387: these chips compute an
extended-precision result, and consequently is not as fast as an optimized
double precision implementation might be.  (Honestly, though, a
double-precision CORDIC implementation wouldn't make that much difference.)

The 68881 uses a CORDIC algorithm, based on some work done by Steve Walther
at HP.  Unfortunately, the 68881 doesn't carry around enough internal
precision to guarantee high accuracy or monotonicity.  My understanding is
that their results are accurate to about 56 bits (note double precision is
53 bits), and that monotonicity errors are rare but not unheard of.  I do not
recall if any monotonicity errors have been observed in single or double
precision.

The 80387 uses a different CORDIC algorithm (a modification of that used
in the 8087).  This algorithm requires less precision than the one
used in the 68881, and is accurate to 62 or 63 bits (depending on the
function).  In addition, this CORDIC algorithm has been proved monotonic.
However, the microcoded instruction set that uses these CORDIC primitives has
not been proved monotonic, so it is not clear what the proof buys you.
The last I heard, there have been no observed monotonicity errors in the
single, double, or extended precisions.
--
Jim Valerio	{verdix,intelca!mipos3,intel-iwarp.arpa}!omepd!radix!jimv

bct@its63b.ed.ac.uk (B Tompsett) (09/30/87)

In article <8@radix> jimv@radix.UUCP (Jim Valerio) writes:
>I am probably one of the few people who reads this newsgroup that has
>microcoded transcendental floating-point instructions. 

  The Computervision CDS 4000 has microcoded trancendental floating point
instructions. I was responsible for the Fortran compiler for this machine,
an it certainly helped with accuracy and performance to have these functions
in microcode. The microcoders had access to more machine facilities than I
would have had if I had to write them in a regular run-time library. For
example, intermediate computations could be performed to more precision than
the normally provided floating point operations allow.
  Brian.
-- 
--
> Brian Tompsett. Department of Computer Science, University of Edinburgh,
> JCMB, The King's Buildings, Mayfield Road, EDINBURGH, EH9 3JZ, Scotland, U.K.
> Telephone:         +44 31 667 1081 x3332.
> JANET:  bct@uk.ac.ed.ecsvax  ARPA: bct%ecsvax.ed.ac.uk@cs.ucl.ac.uk
> USENET: bct@ecsvax.ed.ac.uk  UUCP: ...!mcvax!ukc!ecsvax.ed.ac.uk!bct
> BITNET: psuvax1!ecsvax.ed.ac.uk!bct or bct%ecsvax.ed.ac.uk@earn.rl.ac.uk

lamaster@pioneer.arpa (Hugh LaMaster) (09/30/87)

In article <8@radix> jimv@radix.UUCP (Jim Valerio) writes:

>I am probably one of the few people who reads this newsgroup that has
>microcoded transcendental floating-point instructions.  I wrote significant
>portions of the transcendental microcode for the 80387.  I also have written
>significant portions of math libraries that implement transcendental
>functions.
:

>
>The MIPS implementation is laudable, but there are many more issues than
>speed involved here.  One is accuracy.  Often more important than accuracy
>is monotonicity.  (If the mathematical function is monotonic over a region,
>is the approximated function monotonic over the same region?)  Polynomial
>approximation techniques often have monotonicity problems.  Other issues
>include working on making simple transcendental identities hold in
>floating-point computations (e.g. sin(-x) = -sin(x), exp(-x) = 1/exp(x)).
>
>Now, I suspect that MIPS used the 4.3bsd libm.  This is a very good math
>library, with man-years of work put into it by floating-point experts to
>give it many of the important attributes of a good library.  It boasts of
>high accuracy and no observed monotonicity errors.
:

>
>I said that the 4.3bsd libm has no observed monotonicity errors.  That means
>that test programs running a few million points haven't found one.  That
>doesn't mean that the error doesn't exist.  A few million points is a very
:
>
>Jim Valerio	{verdix,intelca!mipos3,intel-iwarp.arpa}!omepd!radix!jimv

Are people familiar with the Kahan et. al. "paranoia" program, (available from
netlib), and, if so, what do people think of:

1) The validity of the error test results that it provides (in other words,
does it complain about things that are not valid complaints), and,

2) The completeness of the tests (how good it is at testing things which
should be tested, such as monotonicity of certain functions)?





  Hugh LaMaster, m/s 233-9,  UUCP {topaz,lll-crg,ucbvax}!
  NASA Ames Research Center                ames!pioneer!lamaster
  Moffett Field, CA 94035    ARPA lamaster@ames-pioneer.arpa
  Phone:  (415)694-6117      ARPA lamaster@pioneer.arc.nasa.gov

(Disclaimer: "All opinions solely the author's responsibility")