mash@mips.COM (John Mashey) (04/04/89)
In article <7829@pyr.gatech.EDU> mccalpin@loligo.cc.fsu.edu (John McCalpin) writes: ... >On the bright side, both the Intel and Motorola chips make it easy >to implement a robust IEEE-compliant floating-point system, since just >about everything is done by the hardware. If the compiler is smart about >keeping intermediate results on the co-processor's internal stack, then >the extra accuracy can be helpful.... Just out of curiosity, can people out there in netland say: 1) Which software systems do this [keep intermediate results this way]? 2) How well it works? -- -john mashey DISCLAIMER: <generic disclaimer, I speak for me only, etc> UUCP: {ames,decwrl,prls,pyramid}!mips!mash OR mash@mips.com DDD: 408-991-0253 or 408-720-1700, x253 USPS: MIPS Computer Systems, 930 E. Arques, Sunnyvale, CA 94086
khb@fatcity.Sun.COM (Keith Bierman Sun Tactical Engineering) (04/06/89)
In article <16543@winchester.mips.COM> mash@mips.COM (John Mashey) writes: >In article <7829@pyr.gatech.EDU> mccalpin@loligo.cc.fsu.edu (John McCalpin) writes: >... >>On the bright side, both the Intel and Motorola chips make it easy >>to implement a robust IEEE-compliant floating-point system, since just >>about everything is done by the hardware. If the compiler is smart about >>keeping intermediate results on the co-processor's internal stack, then >>the extra accuracy can be helpful.... > >Just out of curiosity, can people out there in netland say: >1) Which software systems do this [keep intermediate results this way]? ?? Most register allocators _should_ be doing this ... for the obvious optimization reasons. On VAXFPA, IBM and UNIVAC machines this used to be the case. Tom Lahey's fortran for the pc did this. Example: sum = 0 do i = 1, n sum = x(i)*y(i) end do produces the same answer when sum is dp or sp (x,y, sp..chosen to make this interesting). >2) How well it works? Tom's compiler worked it well enough that some customer's complained. So there is a compiler option that forces gratuitous stores to memory to force the conversion to 32-bits. Of course, this caused the results to be less accurate _AND_ slower. there is no accounting for taste. cheers Keith H. Bierman It's Not My Fault ---- I Voted for Bill & Opus
jimv@radix (Jim Valerio) (04/08/89)
In article <16543@winchester.mips.COM> John Mashey asks about systems that keep intermediate floating-point results in an extended format. I feel compelled to respond, seeing that I'm an advocate of systems supporting extended precision. I'll first give an example how extended precision helps, and then try to answer John's question of how well keeping intermediate results in extended precision works. Think of extended precision as a data format that provides slightly better error control without the hardware or execution time costs of going to the next larger format with twice the precision. Extended precision gives you two advantages over the extended number's base format: a wider exponent range and a wider significand. When computing the intermediate results of an expression: (1) the wider exponent makes overflow and underflow unlikely, and (2) the wider significand masks rounding errors. So what? Here's an example. Let's compute cabs(x+yi), or hypot(x,y). With an extended exponent range available and intermediate results all stored in extended precision, hypot(x,y) can be computed without fear of overflow or underflow with the simple formula: sqrt(x*x + y*y) Here's what the 4.3bsd libm suggests as an algorithm to get around the overflow and underflow problem when extended precision is not available. x = |x| y = |y| swap x,y if y>x if (x==0) return(0); if (y==0) return(x); exp = logb(x); x = scalb(x,-exp); y = scalb(y,-exp); return scalb(sqrt(x*x+y*y),exp); However, the last line suffers from an undesirably large accumulated roundoff error. So here's how the 4.3bsd libm actually implements hypot(x,y): 1. replace x by |x| and y by |y|, and swap x and y if y > x (hence x is never smaller than y). 2. Hypot(x,y) is computed by: Case I, x/y > 2 y hypot = x + ----------------------------- 2 sqrt ( 1 + [x/y] ) + x/y Case II, x/y <= 2 y hypot = x + -------------------------------------------------- 2 [x/y] - 2 (sqrt(2)+1) + (x-y)/y + ----------------------------- 2 sqrt ( 1 + [x/y] ) + sqrt(2) Pretty frightful cost just for a little accuracy, isn't it? My experience is that this problem crops up in most math library functions. Now back to John's question: >2) How well [does keeping intermediate results in extended precision] work? The LINPACK benchmark reports on the error of the arithmetic operations. Here are the results when run on the 80960KB, a machine that implements operations on extended precision operands. norm. resid resid machep Double precision (extended) 1.50 6.66e-14 2.22e-16 Double precision (double) 1.67 7.46e-14 2.22e-16 The only difference between the extended and the double entries is that in the extended entry, the intermediate result of A*X(I) is in extended precision while the double case keeps the result in double precision. Some months ago, John McCalpin (mccalpin@loligo.fsu.edu) reported in <350@loligo.fsu.edu> some results from the 1000x1000 LINPACK benchmark that might confirm the same observation using results from the Sun 3/280. I say might because I am just presuming the results were run with an 68881, and may or may not have been compiled using an extended-precision evaluation model. >machine precision RMS error in solution (*) >--------------------------------------------------------------------- >Cyber 205 / ETA-10 32-bit 2.22521256e-01 >IBM 3081 32-bit 2.37465184e-03 >IEEE standard 32-bit 2.82104476e-04 >--------------------------------------------------------------------- >Cyber 205 / ETA-10 64-bit 1.32111221e-08 >Cray X/MP 64-bit 2.47078473e-11 >IEEE standard 64-bit 2.27274978e-13 >--------------------------------------------------------------------- >NOTES: > (*) The solution vector consists of 1000 identical elements = 1.0 > (1) The IEEE standard was run on a Sun 3/280, which had passed > the PARANOIA benchmark test. In general, I don't know of any comprehensive data on how well extended precision hides anomolies in the expression evalation found in practice in typical floating-point applications. Just as a well implemented math library doesn't attract attention, attention either because it is too slow or because it is too inaccurate, I believe that the benefit of extended precision in expression evaluation is bound to be noticed only by the lack of observed problems. -- Jim Valerio jimv%radix@omepd.intel.com, {verdix,tektronix,omepd}!radix!jimv
mccalpin@loligo.cc.fsu.edu (John McCalpin) (04/09/89)
In article <81@radix> jimv@radix.UUCP (Jim Valerio) writes: >The LINPACK benchmark reports on the error of the arithmetic operations. >Here are the results when run on the 80960KB > norm. resid resid machep >Double precision (extended) 1.50 6.66e-14 2.22e-16 >Double precision (double) 1.67 7.46e-14 2.22e-16 I assume that "extended precision" refers to the 80-bit format commonly used in the 80x87 family - is this correct? >Some months ago, John McCalpin (mccalpin@loligo.fsu.edu) reported in ><350@loligo.fsu.edu> some results from the 1000x1000 LINPACK benchmark >that might confirm the same observation using results from the Sun 3/280. >I say might because I am just presuming the results were run with an 68881, >and may or may not have been compiled using an extended-precision evaluation >model. Jim Valerio jimv%radix@omepd.intel.com The original test was run with the Weitek FPA, since that is what people are going to use for number-crunching. I did verify that the Sun passed the Paranoia test suite with all three floating-point options (Wietek, 68881, and software). I actually forgot about the extended-precision stack of the 68881 when I ran the tests. I am re-running the 32-bit case now to see how much the extended precision changes the RMS error of the solution. -- ---------------------- John D. McCalpin ------------------------ Dept of Oceanography & Supercomputer Computations Research Institute mccalpin@masig1.ocean.fsu.edu mccalpin@nu.cs.fsu.edu --------------------------------------------------------------------
dik@cwi.nl (Dik T. Winter) (04/09/89)
In article <81@radix> jimv@radix.UUCP (Jim Valerio) writes: > I feel compelled to respond, seeing that I'm an advocate of systems > supporting extended precision. I'll first give an example how extended > precision helps, and then try to answer John's question of how well keeping > intermediate results in extended precision works. I feel compelled also, I like also extended precision, but I think it should be an option for the programmer. > > Here's what the 4.3bsd libm suggests as an algorithm to get around the > overflow and underflow problem when extended precision is not available. > > x = |x| > y = |y| > swap x,y if y>x > if (x==0) return(0); > if (y==0) return(x); > exp = logb(x); > x = scalb(x,-exp); > y = scalb(y,-exp); > return scalb(sqrt(x*x+y*y),exp); > > However, the last line suffers from an undesirably large accumulated roundoff And this is incorrect. The last line does suffer some roundoff, but certainly not large; less than two bits precision loss with a good implementation of the sqrt and reliable floating point multiplication and addition (all other operations are exact). > error. So here's how the 4.3bsd libm actually implements hypot(x,y): This is not entirely true. The implementation above is the one used for the Vax, the implementation that followed is for the IEEE machines. That other implementation has at most one bit loss of precision. > So indeed, extended precision and extended exponent range do help, but not always. Sometimes you want to have exact control over the precision in which expressions are evaluated, so it must be left as an option. Or more correct, there ought to be an option to turn off calculations in extended precision. On the other hand doing operations in extended precision is only a valid option if it has no great penalty in preformance (like K&R C). -- dik t. winter, cwi, amsterdam, nederland INTERNET : dik@cwi.nl BITNET/EARN: dik@mcvax