[comp.arch] Query about miserable M68882 performance really 80-bit notes

mash@mips.COM (John Mashey) (04/04/89)

In article <7829@pyr.gatech.EDU> mccalpin@loligo.cc.fsu.edu (John McCalpin) writes:
...
>On the bright side, both the Intel and Motorola chips make it easy
>to implement a robust IEEE-compliant floating-point system, since just
>about everything is done by the hardware.  If the compiler is smart about
>keeping intermediate results on the co-processor's internal stack, then
>the extra accuracy can be helpful....

Just out of curiosity, can people out there in netland say:
1) Which software systems do this [keep intermediate results this way]?
2) How well it works?
-- 
-john mashey	DISCLAIMER: <generic disclaimer, I speak for me only, etc>
UUCP: 	{ames,decwrl,prls,pyramid}!mips!mash  OR  mash@mips.com
DDD:  	408-991-0253 or 408-720-1700, x253
USPS: 	MIPS Computer Systems, 930 E. Arques, Sunnyvale, CA 94086

khb@fatcity.Sun.COM (Keith Bierman Sun Tactical Engineering) (04/06/89)

In article <16543@winchester.mips.COM> mash@mips.COM (John Mashey) writes:
>In article <7829@pyr.gatech.EDU> mccalpin@loligo.cc.fsu.edu (John McCalpin) writes:
>...
>>On the bright side, both the Intel and Motorola chips make it easy
>>to implement a robust IEEE-compliant floating-point system, since just
>>about everything is done by the hardware.  If the compiler is smart about
>>keeping intermediate results on the co-processor's internal stack, then
>>the extra accuracy can be helpful....
>
>Just out of curiosity, can people out there in netland say:
>1) Which software systems do this [keep intermediate results this way]?

?? Most register allocators _should_ be doing this ... for the obvious
optimization reasons. On VAXFPA, IBM and UNIVAC machines this used to
be the case.

Tom Lahey's fortran for the pc did this. Example:

	sum = 0
	do i = 1, n
	   sum = x(i)*y(i)
	end do

produces the same answer when sum is dp or sp (x,y, sp..chosen to make
this interesting).

>2) How well it works?

Tom's compiler worked it well enough that some customer's complained.
So there is a compiler option that forces gratuitous stores to memory
to force the conversion to 32-bits.

Of course, this caused the results to be less accurate _AND_ slower.
there is no accounting for taste.

cheers

Keith H. Bierman
It's Not My Fault ---- I Voted for Bill & Opus

jimv@radix (Jim Valerio) (04/08/89)

In article <16543@winchester.mips.COM> John Mashey asks about systems
that keep intermediate floating-point results in an extended format.
I feel compelled to respond, seeing that I'm an advocate of systems
supporting extended precision.  I'll first give an example how extended
precision helps, and then try to answer John's question of how well keeping
intermediate results in extended precision works.

Think of extended precision as a data format that provides slightly better
error control without the hardware or execution time costs of going to the
next larger format with twice the precision.

Extended precision gives you two advantages over the extended number's base
format: a wider exponent range and a wider significand.  When computing the
intermediate results of an expression:
  (1) the wider exponent makes overflow and underflow unlikely, and
  (2) the wider significand masks rounding errors.

So what?

Here's an example.  Let's compute cabs(x+yi), or hypot(x,y).

With an extended exponent range available and intermediate results all stored
in extended precision, hypot(x,y) can be computed without fear of overflow or
underflow with the simple formula:

		sqrt(x*x + y*y)

Here's what the 4.3bsd libm suggests as an algorithm to get around the
overflow and underflow problem when extended precision is not available.

		x = |x|
		y = |y|
		swap x,y if y>x
		if (x==0) return(0);
		if (y==0) return(x);
		exp = logb(x);
		x = scalb(x,-exp);
		y = scalb(y,-exp);
		return scalb(sqrt(x*x+y*y),exp);

However, the last line suffers from an undesirably large accumulated roundoff
error.  So here's how the 4.3bsd libm actually implements hypot(x,y):

	1. replace x by |x| and y by |y|, and swap x and
	   y if y > x (hence x is never smaller than y).
	2. Hypot(x,y) is computed by:

   Case I, x/y > 2
		
			       y
	hypot = x + -----------------------------
		 		    2
		    sqrt ( 1 + [x/y]  )  +  x/y

   Case II, x/y <= 2 
			                   y
	hypot = x + --------------------------------------------------
			          		     2 
			     			[x/y]   -  2
		   (sqrt(2)+1) + (x-y)/y + -----------------------------
		 		    			  2
		    			  sqrt ( 1 + [x/y]  )  + sqrt(2)


Pretty frightful cost just for a little accuracy, isn't it?

My experience is that this problem crops up in most math library functions.


Now back to John's question:
>2) How well [does keeping intermediate results in extended precision] work?

The LINPACK benchmark reports on the error of the arithmetic operations.
Here are the results when run on the 80960KB, a machine that implements
operations on extended precision operands.

				norm. resid	resid		machep
Double precision (extended)	1.50		6.66e-14	2.22e-16
Double precision (double)	1.67		7.46e-14	2.22e-16

The only difference between the extended and the double entries is that
in the extended entry, the intermediate result of A*X(I) is in extended
precision while the double case keeps the result in double precision.

Some months ago, John McCalpin (mccalpin@loligo.fsu.edu) reported in
<350@loligo.fsu.edu> some results from the 1000x1000 LINPACK benchmark
that might confirm the same observation using results from the Sun 3/280.
I say might because I am just presuming the results were run with an 68881,
and may or may not have been compiled using an extended-precision evaluation
model.

>machine		precision	    RMS error in solution (*)
>---------------------------------------------------------------------
>Cyber 205 / ETA-10	32-bit			2.22521256e-01
>IBM 3081		32-bit			2.37465184e-03
>IEEE standard		32-bit			2.82104476e-04
>---------------------------------------------------------------------
>Cyber 205 / ETA-10	64-bit			1.32111221e-08
>Cray X/MP		64-bit			2.47078473e-11
>IEEE standard		64-bit			2.27274978e-13
>---------------------------------------------------------------------
>NOTES:
>   (*) The solution vector consists of 1000 identical elements = 1.0
>   (1) The IEEE standard was run on a Sun 3/280, which had passed
>	the PARANOIA benchmark test.


In general, I don't know of any comprehensive data on how well extended
precision hides anomolies in the expression evalation found in practice
in typical floating-point applications.  Just as a well implemented math
library doesn't attract attention, attention either because it is too
slow or because it is too inaccurate, I believe that the benefit of
extended precision in expression evaluation is bound to be noticed
only by the lack of observed problems.
-- 
Jim Valerio	jimv%radix@omepd.intel.com, {verdix,tektronix,omepd}!radix!jimv

mccalpin@loligo.cc.fsu.edu (John McCalpin) (04/09/89)

In article <81@radix> jimv@radix.UUCP (Jim Valerio) writes:
>The LINPACK benchmark reports on the error of the arithmetic operations.
>Here are the results when run on the 80960KB
>				norm. resid	resid		machep
>Double precision (extended)	1.50		6.66e-14	2.22e-16
>Double precision (double)	1.67		7.46e-14	2.22e-16

	I assume that "extended precision" refers to the 80-bit format
	commonly used in the 80x87 family - is this correct?

>Some months ago, John McCalpin (mccalpin@loligo.fsu.edu) reported in
><350@loligo.fsu.edu> some results from the 1000x1000 LINPACK benchmark
>that might confirm the same observation using results from the Sun 3/280.
>I say might because I am just presuming the results were run with an 68881,
>and may or may not have been compiled using an extended-precision evaluation
>model.  Jim Valerio jimv%radix@omepd.intel.com

	The original test was run with the Weitek FPA, since that is
	what people are going to use for number-crunching.  I did
	verify that the Sun passed the Paranoia test suite with all
	three floating-point options (Wietek, 68881, and software).

	I actually forgot about the extended-precision stack of the 68881
	when I ran the tests. I am re-running the 32-bit case now to see
	how much the extended precision changes the RMS error of the
	solution.
-- 
----------------------   John D. McCalpin   ------------------------
Dept of Oceanography & Supercomputer Computations Research Institute
mccalpin@masig1.ocean.fsu.edu		mccalpin@nu.cs.fsu.edu
--------------------------------------------------------------------

dik@cwi.nl (Dik T. Winter) (04/09/89)

In article <81@radix> jimv@radix.UUCP (Jim Valerio) writes:
 > I feel compelled to respond, seeing that I'm an advocate of systems
 > supporting extended precision.  I'll first give an example how extended
 > precision helps, and then try to answer John's question of how well keeping
 > intermediate results in extended precision works.
I feel compelled also, I like also extended precision, but I think it should
be an option for the programmer.
 > 
 > Here's what the 4.3bsd libm suggests as an algorithm to get around the
 > overflow and underflow problem when extended precision is not available.
 > 
 > 		x = |x|
 > 		y = |y|
 > 		swap x,y if y>x
 > 		if (x==0) return(0);
 > 		if (y==0) return(x);
 > 		exp = logb(x);
 > 		x = scalb(x,-exp);
 > 		y = scalb(y,-exp);
 > 		return scalb(sqrt(x*x+y*y),exp);
 > 
 > However, the last line suffers from an undesirably large accumulated roundoff

And this is incorrect.  The last line does suffer some roundoff, but certainly
not large; less than two bits precision loss with a good implementation of the
sqrt and reliable floating point multiplication and addition (all other
operations are exact).

 > error.  So here's how the 4.3bsd libm actually implements hypot(x,y):

This is not entirely true.  The implementation above is the one used for
the Vax, the implementation that followed is for the IEEE machines.  That
other implementation has at most one bit loss of precision.
 > 

So indeed, extended precision and extended exponent range do help, but
not always.  Sometimes you want to have exact control over the precision
in which expressions are evaluated, so it must be left as an option.
Or more correct, there ought to be an option to turn off calculations in
extended precision.  On the other hand doing operations in extended
precision is only a valid option if it has no great penalty in preformance
(like K&R C).
-- 
dik t. winter, cwi, amsterdam, nederland
INTERNET   : dik@cwi.nl
BITNET/EARN: dik@mcvax