[comp.arch] 80-bit notes

rik@june.cs.washington.edu (Rik Littlefield) (04/07/89)

In article <97749@sun.Eng.Sun.COM>, khb@fatcity.Sun.COM (Keith Bierman Sun Tactical Engineering) writes:

<  Tom Lahey's fortran for the pc did this [kept 80-bit vars on the 80x87
<  stack during loop execution]. Example:
<  
<  	sum = 0
<  	do i = 1, n
<  	   sum = x(i)*y(i)
<  	end do
<  
<  produces the same answer when sum is dp or sp (x,y, sp..chosen to make
<  this interesting).
<  ........
<  Tom's compiler worked it well enough that some customer's complained.
<  So there is a compiler option that forces gratuitous stores to memory
<  to force the conversion to 32-bits.
<  
<  Of course, this caused the results to be less accurate _AND_ slower.
<  there is no accounting for taste.
   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^              

Perhaps these customers wanted to compare their results against other
compilers/machines.  Maybe I'm just pessimistic, but when a new compiler
produces answers that are much different from what I'm used to, "increased
accuracy" is just about the *last* possibility that comes to mind.

--Rik

khb%chiba@Sun.COM (chiba) (04/07/89)

In article <7806@june.cs.washington.edu> rik@june.cs.washington.edu (Rik Littlefield) writes:
>In article <97749@sun.Eng.Sun.COM>, khb@fatcity.Sun.COM (Keith Bierman Sun Tactical Engineering) writes:
>
><  Tom Lahey's fortran for the pc did this [kept 80-bit vars on the 80x87
><  stack during loop execution]. Example:
><  
><  	sum = 0
><  	do i = 1, n
><  	   sum = x(i)*y(i)
><  	end do

Bug. meant to type sum = sum+ ... i.e. dot product
>
>Perhaps these customers wanted to compare their results against other
>compilers/machines.  Maybe I'm just pessimistic, but when a new compiler
>produces answers that are much different from what I'm used to, "increased
>accuracy" is just about the *last* possibility that comes to mind.
>

Before the ieee standard, this is ALWAYS different. 64-bit math on vax
yields very different answers than IBM or Cray or (new) CDC's , etc.
32-bit math is even "worse". 

Stable algorithms always benefit from improved quality arithmetic. Unstable
algorithms are sensitive (adversely) to _improvement_ in the
arithmetic. The reason one should prefer the extended computation is
that a perfectly stable algorithm may be fed an ill-conditioned
problem. In that event, the extended math can be a godsend.

If you understand FP arithmetic, and your algorithm, the fact that
improved arithmetic quality changes your answer should be no surprise.
If the difference is really significant, it means your algorithm (or
its implementation) sucks. 

This really does happen. I remember porting a FE code once (not to
suns) and getting radically different answers depending on which
compiler I used on the vaxen, and got a totally different answer on
the ieee box I was porting to. Turned out that what they were
computing was a tiny residual, but scaled to be large, and then
printed (and used in suceeding computations). Needless to say, I was
not impressed. Fortunately, it is not a very popular package, so most
bridges and airplanes are probably OK.

Keith H. Bierman
It's Not My Fault ---- I Voted for Bill & Opus

david@june.cs.washington.edu (David Callahan) (04/07/89)

In article <7806@june.cs.washington.edu> rik@june.cs.washington.edu (Rik Littlefield) writes:
%In article <97749@sun.Eng.Sun.COM>, khb@fatcity.Sun.COM (Keith Bierman Sun Tactical Engineering) writes:

%<  Tom Lahey's fortran for the pc did this [kept 80-bit vars on the 80x87
%<  stack during loop execution]. Example:
  
%<  	sum = 0
%<  	do i = 1, n
%<  	   sum = x(i)*y(i)
%<  	end do
%<  
%<  produces the same answer when sum is dp or sp (x,y, sp..chosen to make
%<  this interesting).
%<  ........
%<  Tom's compiler worked it well enough that some customer's complained.
%<  So there is a compiler option that forces gratuitous stores to memory
%<  to force the conversion to 32-bits.
%<  
%<  Of course, this caused the results to be less accurate _AND_ slower.
%<  there is no accounting for taste.
%   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^              
%
%Perhaps these customers wanted to compare their results against other
%compilers/machines.  Maybe I'm just pessimistic, but when a new compiler
%produces answers that are much different from what I'm used to, "increased
%accuracy" is just about the *last* possibility that comes to mind.

Another reason involves portable numerical codes.
Consider a routine such as MCHEPS (?) from LINPACK
I think. It experimentally determines machine precision with a loop
which is roughly:
	COUNT = 1
	EPS = 1.0
	WHILE (EPS+1.0 > 1.0) 
		COUNT = COUNT + 1
		EPS = EPS / 2
After execution COUNT will be some number.

Since such a number will influence error estimates it would 
be nice if it reflected the precision of EPS in memory.

Strikes me that this represent a use of "volatile" and it
doesn't even invole exception handlers, device drivers or 
parallel processing.

David Callahan 	  Tera Computer Co. Seattle WA.

khb@fatcity.Sun.COM (fatcity) (04/08/89)

In article <7811@june.cs.washington.edu> david@uw-june.UUCP (David Callahan) writes:
>..... < 80-bit stuff > ...

>Another reason involves portable numerical codes.
>Consider a routine such as MCHEPS (?) from LINPACK
>I think. It experimentally determines machine precision with a loop
>which is roughly:
>	COUNT = 1
>	EPS = 1.0
>	WHILE (EPS+1.0 > 1.0) 
>		COUNT = COUNT + 1
>		EPS = EPS / 2
>After execution COUNT will be some number.
>
>Since such a number will influence error estimates it would 
>be nice if it reflected the precision of EPS in memory.

There are better routines than the old one in linpack. 

Here is one:

      SUBROUTINE MACHAR(IBETA,IT,IRND,NGRD,MACHEP,NEGEP,IEXP,MINEXP,
     1                   MAXEXP,EPS,EPSNEG,XMIN,XMAX)
C-----------------------------------------------------------------------
C  This Fortran 77 subroutine is intended to determine the parameters
C   of the floating-point arithmetic system specified below.  The
C   determination of the first three uses an extension of an algorithm
C   due to M. Malcolm, CACM 15 (1972), pp. 949-951, incorporating some,
C   but not all, of the improvements suggested by M. Gentleman and S.
C   Marovich, CACM 17 (1974), pp. 276-277.  An earlier version of this
C   program was published in the book Software Manual for the
C   Elementary Functions by W. J. Cody and W. Waite, Prentice-Hall,
C   Englewood Cliffs, NJ, 1980.  The present version is documented in 
C   W. J. Cody, "MACHAR: A subroutine to dynamically determine machine
C   parameters," TOMS 14, December, 1988.
C
C  The program as given here must be modified before compiling.  If
C   a single (double) precision version is desired, change all
C   occurrences of CS (CD) in columns 1 and 2 to blanks.
C
C  Parameter values reported are as follows:
C
C       IBETA   - the radix for the floating-point representation
C       IT      - the number of base IBETA digits in the floating-point
C                 significand
C       IRND    - 0 if floating-point addition chops
C                 1 if floating-point addition rounds, but not in the
C                   IEEE style
C                 2 if floating-point addition rounds in the IEEE style
C                 3 if floating-point addition chops, and there is
C                   partial underflow
C                 4 if floating-point addition rounds, but not in the
C                   IEEE style, and there is partial underflow
C                 5 if floating-point addition rounds in the IEEE style,
C                   and there is partial underflow
C       NGRD    - the number of guard digits for multiplication with
C                 truncating arithmetic.  It is
C                 0 if floating-point arithmetic rounds, or if it
C                   truncates and only  IT  base  IBETA digits
C                   participate in the post-normalization shift of the
C                   floating-point significand in multiplication;
C                 1 if floating-point arithmetic truncates and more
C                   than  IT  base  IBETA  digits participate in the
C                   post-normalization shift of the floating-point
C                   significand in multiplication.
C       MACHEP  - the largest negative integer such that
C                 1.0+FLOAT(IBETA)**MACHEP .NE. 1.0, except that
C                 MACHEP is bounded below by  -(IT+3)
C       NEGEPS  - the largest negative integer such that
C                 1.0-FLOAT(IBETA)**NEGEPS .NE. 1.0, except that
C                 NEGEPS is bounded below by  -(IT+3)
C       IEXP    - the number of bits (decimal places if IBETA = 10)
C                 reserved for the representation of the exponent
C                 (including the bias or sign) of a floating-point
C                 number
C       MINEXP  - the largest in magnitude negative integer such that
C                 FLOAT(IBETA)**MINEXP is positive and normalized
C       MAXEXP  - the smallest positive power of  BETA  that overflows
C       EPS     - the smallest positive floating-point number such
C                 that  1.0+EPS .NE. 1.0. In particular, if either
C                 IBETA = 2  or  IRND = 0, EPS = FLOAT(IBETA)**MACHEP.
C                 Otherwise,  EPS = (FLOAT(IBETA)**MACHEP)/2
C       EPSNEG  - A small positive floating-point number such that
C                 1.0-EPSNEG .NE. 1.0. In particular, if IBETA = 2
C                 or  IRND = 0, EPSNEG = FLOAT(IBETA)**NEGEPS.
C                 Otherwise,  EPSNEG = (IBETA**NEGEPS)/2.  Because
C                 NEGEPS is bounded below by -(IT+3), EPSNEG may not
C                 be the smallest number that can alter 1.0 by
C                 subtraction.
C       XMIN    - the smallest non-vanishing normalized floating-point
C                 power of the radix, i.e.,  XMIN = FLOAT(IBETA)**MINEXP
C       XMAX    - the largest finite floating-point number.  In
C                 particular  XMAX = (1.0-EPSNEG)*FLOAT(IBETA)**MAXEXP
C                 Note - on some machines  XMAX  will be only the
C                 second, or perhaps third, largest number, being
C                 too small by 1 or 2 units in the last digit of
C                 the significand.
C
C     Latest revision - December 4, 1987
C
C     Author - W. J. Cody
C              Argonne National Laboratory
C
C-----------------------------------------------------------------------
      INTEGER I,IBETA,IEXP,IRND,IT,ITEMP,IZ,J,K,MACHEP,MAXEXP,
     1        MINEXP,MX,NEGEP,NGRD,NXRES
#ifdef DP
      DOUBLE PRECISION
#else
      REAL
#endif

     1   A,B,BETA,BETAIN,BETAH,CONV,EPS,EPSNEG,ONE,T,TEMP,TEMPA,
     2   TEMP1,TWO,XMAX,XMIN,Y,Z,ZERO
C-----------------------------------------------------------------------
#ifdef DP
      CONV(I) = DBLE(I)
#else
      CONV(I) = REAL(I)
#endif

      ONE = CONV(1)
      TWO = ONE + ONE
      ZERO = ONE - ONE
C-----------------------------------------------------------------------
C  Determine IBETA, BETA ala Malcolm.
C-----------------------------------------------------------------------
      A = ONE
   10 A = A + A
         TEMP = A+ONE
         TEMP1 = TEMP-A
         IF (TEMP1-ONE .EQ. ZERO) GO TO 10
      B = ONE
   20 B = B + B
         TEMP = A+B
         ITEMP = INT(TEMP-A)
         IF (ITEMP .EQ. 0) GO TO 20
      IBETA = ITEMP
      BETA = CONV(IBETA)
C-----------------------------------------------------------------------
C  Determine IT, IRND.
C-----------------------------------------------------------------------
      IT = 0
      B = ONE
  100 IT = IT + 1
         B = B * BETA
         TEMP = B+ONE
         TEMP1 = TEMP-B
         IF (TEMP1-ONE .EQ. ZERO) GO TO 100
      IRND = 0
      BETAH = BETA / TWO
      TEMP = A+BETAH
      IF (TEMP-A .NE. ZERO) IRND = 1
      TEMPA = A + BETA
      TEMP = TEMPA+BETAH
      IF ((IRND .EQ. 0) .AND. (TEMP-TEMPA .NE. ZERO)) IRND = 2
C-----------------------------------------------------------------------
C  Determine NEGEP, EPSNEG.
C-----------------------------------------------------------------------
      NEGEP = IT + 3
      BETAIN = ONE / BETA
      A = ONE
      DO 200 I = 1, NEGEP
         A = A * BETAIN
  200 CONTINUE
      B = A
  210 TEMP = ONE-A
         IF (TEMP-ONE .NE. ZERO) GO TO 220
         A = A * BETA
         NEGEP = NEGEP - 1
      GO TO 210
  220 NEGEP = -NEGEP
      EPSNEG = A
C-----------------------------------------------------------------------
C  Determine MACHEP, EPS.
C-----------------------------------------------------------------------
      MACHEP = -IT - 3
      A = B
  300 TEMP = ONE+A
         IF (TEMP-ONE .NE. ZERO) GO TO 320
         A = A * BETA
         MACHEP = MACHEP + 1
      GO TO 300
  320 EPS = A
C-----------------------------------------------------------------------
C  Determine NGRD.
C-----------------------------------------------------------------------
      NGRD = 0
      TEMP = ONE+EPS
      IF ((IRND .EQ. 0) .AND. (TEMP*ONE-ONE .NE. ZERO)) NGRD = 1
C-----------------------------------------------------------------------
C  Determine IEXP, MINEXP, XMIN.
C
C  Loop to determine largest I and K = 2**I such that
C         (1/BETA) ** (2**(I))
C  does not underflow.
C  Exit from loop is signaled by an underflow.
C-----------------------------------------------------------------------
      I = 0
      K = 1
      Z = BETAIN
      T = ONE + EPS
      NXRES = 0
  400 Y = Z
         Z = Y * Y
C-----------------------------------------------------------------------
C  Check for underflow here.
C-----------------------------------------------------------------------
         A = Z * ONE
         TEMP = Z * T
         IF ((A+A .EQ. ZERO) .OR. (ABS(Z) .GE. Y)) GO TO 410
         TEMP1 = TEMP * BETAIN
         IF (TEMP1*BETA .EQ. Z) GO TO 410
         I = I + 1
         K = K + K
      GO TO 400
  410 IF (IBETA .EQ. 10) GO TO 420
      IEXP = I + 1
      MX = K + K
      GO TO 450
C-----------------------------------------------------------------------
C  This segment is for decimal machines only.
C-----------------------------------------------------------------------
  420 IEXP = 2
      IZ = IBETA
  430 IF (K .LT. IZ) GO TO 440
         IZ = IZ * IBETA
         IEXP = IEXP + 1
      GO TO 430
  440 MX = IZ + IZ - 1
C-----------------------------------------------------------------------
C  Loop to determine MINEXP, XMIN.
C  Exit from loop is signaled by an underflow.
C-----------------------------------------------------------------------
  450 XMIN = Y
         Y = Y * BETAIN
C-----------------------------------------------------------------------
C  Check for underflow here.
C-----------------------------------------------------------------------
         A = Y * ONE
         TEMP = Y * T
         IF (((A+A) .EQ. ZERO) .OR. (ABS(Y) .GE. XMIN)) GO TO 460
         K = K + 1
         TEMP1 = TEMP * BETAIN
         IF ((TEMP1*BETA .NE. Y) .OR. (TEMP .EQ. Y)) THEN
               GO TO 450
            ELSE
               NXRES = 3
               XMIN = Y
         END IF
  460 MINEXP = -K
C-----------------------------------------------------------------------
C  Determine MAXEXP, XMAX.
C-----------------------------------------------------------------------
      IF ((MX .GT. K+K-3) .OR. (IBETA .EQ. 10)) GO TO 500
      MX = MX + MX
      IEXP = IEXP + 1
  500 MAXEXP = MX + MINEXP
C-----------------------------------------------------------------
C  Adjust IRND to reflect partial underflow.
C-----------------------------------------------------------------
      IRND = IRND + NXRES
C-----------------------------------------------------------------
C  Adjust for IEEE-style machines.
C-----------------------------------------------------------------
      IF (IRND .GE. 2) MAXEXP = MAXEXP - 2
C-----------------------------------------------------------------
C  Adjust for machines with implicit leading bit in binary
C  significand, and machines with radix point at extreme
C  right of significand.
C-----------------------------------------------------------------
      I = MAXEXP + MINEXP
      IF ((IBETA .EQ. 2) .AND. (I .EQ. 0)) MAXEXP = MAXEXP - 1
      IF (I .GT. 20) MAXEXP = MAXEXP - 1
      IF (A .NE. Y) MAXEXP = MAXEXP - 2
      XMAX = ONE - EPSNEG
      IF (XMAX*ONE .NE. XMAX) XMAX = ONE - BETA * EPSNEG
      XMAX = XMAX / (BETA * BETA * BETA * XMIN)
      I = MAXEXP + MINEXP + 3
      IF (I .LE. 0) GO TO 520
      DO 510 J = 1, I
          IF (IBETA .EQ. 2) XMAX = XMAX + XMAX
          IF (IBETA .NE. 2) XMAX = XMAX * BETA
  510 CONTINUE
  520 RETURN
C---------- LAST CARD OF MACHAR ----------
      END




Keith H. Bierman
It's Not My Fault ---- I Voted for Bill & Opus