[comp.arch] RS6000 Multiply/Accumulate instruction

dik@cwi.nl (Dik T. Winter) (03/21/90)

About:
    Errors in solution of the 1000x1000 system of equations
machine			precision	RMS error
---------------------------------------------------------
IBM RS/6000             64-bit          1.2e-12  <-- new ibm
IEEE (Sun-3)		64-bit		2.3e-13  <-- ieee
 
I asked how this was generated; John McCalpin answers:
 > The RMS error that I have been posting is based on the fact that the
 > matrix is chosen in such as way that the solution consists of 1000
 > identical elements equal to 1.0d0.  It does this by generating a 
 > pseudo-random matrix and then re-scaling by the row sums (I think this
 > is how it was done).
(I assume this is from a uniform distribution on [0.0,1.0]; but that does
not matter very much.  Also I do assume the right-hand side is random.)
 >                       There are therefore two possible sources of error:
 > --> The first is that the matrix may not have been scaled correctly because
 > of roundoff error in the calculation of the row sums.
Right, but if we use backward error analysis that does not matter.
 > --> The second is that the solution contains propagated roundoff error from
 > the LU-decomposition and back-substitution steps.
Agreed.
 > 
 > I have consistently ignored the first source of error, and defined the
 > RMS error to be:
Follows the definition.

The problem I see is that to compare arithmetic properties of processors
you must use *identical* input data on which to perform the operations.
When using random numbers from the system supplied random number generator
you cannot be sure of this (and in general it is not true).  In the case
of solving a set of linear equations that can change a lot.  I just did
some tests with a 100*100 system on our Alliant FX/4 (near IEEE arithmetic).
I used LINPACKs DGEFA (for decomposition) and DGESL (for solving).  Even
when I used the same system, but only reordened the right-hand side
the errors ranged from 4.3e-14 to 1.2e-13.  The next system I tried (also
random) gave errors ranging from 8.3e-14 to 1.4e-13.  In all cases these
errors are in the range to be expected.  (The numbers are better than
those given above because they scale with the order of the matrix.)

So my conclusion still is that the difference in errors between the IBM and
Sun/IEEE implementations are within the noise.  It may be there is indeed
some flaw in the IBM implementation, but that requires more convincing
figures.
-- 
dik t. winter, cwi, amsterdam, nederland
dik@cwi.nl

mccalpin@vax1.acs.udel.EDU (John D Mccalpin) (03/21/90)

I wrote about the RMS error in the solution of the LINPACK 1000x1000
system of equations on the new IBM RS/6000 vs various other machines.

The RMS error that I have been posting is based on the fact that the
matrix is chosen in such as way that the solution consists of 1000
identical elements equal to 1.0d0.  It does this by generating a 
pseudo-random matrix and then re-scaling by the row sums (I think this
is how it was done).

In article <8907@boring.cwi.nl>, dik@cwi.nl (Dik T. Winter) writes:
> The problem I see is that to compare arithmetic properties of processors
> you must use *identical* input data on which to perform the operations.
> When using random numbers from the system supplied random number generator
> you cannot be sure of this (and in general it is not true). 

The pseudo-random numbers are generated by the LINPACK test code and
are identical on all machines.  The two errors that I was referring to
are (1) the error in calculating the sum of each row, and (2) the
error in the LU-decomposition and back-substitution.

The code is:

      subroutine matgen(a,lda,n,b,norma)
      double precision a(lda,1),b(1),norma
      init = 1325
      norma = 0.0
      do 30 j = 1,n
         do 20 i = 1,n
            init = mod(3125*init,65536)		! psuedo-random number
            a(i,j) = (init - 32768.0)/16384.0	! generator
            norma = dmax1(a(i,j), norma)	! infinity norm of matrix
   20    continue
   30 continue
      do 35 i = 1,n
          b(i) = 0.0
   35 continue
      do 50 j = 1,n
         do 40 i = 1,n
            b(i) = b(i) + a(i,j)		! RHS=row sums of Matrix
   40    continue
   50 continue
      end

I have used the subroutine DGECO from LINPACK to estimate the condition
number of this matrix and it is about 200,000, as I recall.

> So my conclusion still is that the difference in errors between the IBM and
> Sun/IEEE implementations are within the noise.  It may be there is indeed
> some flaw in the IBM implementation, but that requires more convincing
> figures.
> -- 
> dik t. winter, cwi, amsterdam, nederland
> dik@cwi.nl

I have been very careful not to call the IBM implementation flawed.
I have merely been pointing out that on identical calculations, the
answers differ from an IEEE calculation.  My experience has been that
even the direction of the error (i.e. better or worse answers) is
data-dependent.  

I am still trying to get the machine to run with the multiply/add
function separated, but the new O/S that they put on the FSU machine
seems to be broken....
-- 
John D. McCalpin                               mccalpin@vax1.acs.udel.edu
Assistant Professor                            mccalpin@delocn.udel.edu
College of Marine Studies, U. Del.             mccalpin@scri1.scri.fsu.edu

dik@cwi.nl (Dik T. Winter) (03/22/90)

In article <5900@udccvax1.acs.udel.EDU> mccalpin@vax1.acs.udel.EDU (John D Mccalpin) writes:
 > I wrote about the RMS error in the solution of the LINPACK 1000x1000
 > system of equations on the new IBM RS/6000 vs various other machines.
 > 
 > The RMS error that I have been posting is based on the fact that the
 > matrix is chosen in such as way that the solution consists of 1000
 > identical elements equal to 1.0d0.  It does this by generating a 
 > pseudo-random matrix and then re-scaling by the row sums (I think this
 > is how it was done).
 > 
And shows me wrong when I assumed that the input data might be different.
Thanks for the info.

 > I have used the subroutine DGECO from LINPACK to estimate the condition
 > number of this matrix and it is about 200,000, as I recall.
 > 
This is interesting; it is not extremely high but not very low either.
(Singular values of an order of 1.0e-5 are very reasonable.)  On the
other hand, such a condition number indicates a certain sensitivity to
the way results are calculated.  So even if more precision is used
doing some calculations this might result in a less favourable end result.

 >                                           My experience has been that
 > even the direction of the error (i.e. better or worse answers) is
 > data-dependent.  
Very true.
 > 
 > I am still trying to get the machine to run with the multiply/add
 > function separated, but the new O/S that they put on the FSU machine
 > seems to be broken....
I look forward to the results.

A concluding remark.  In my opinion the use of linear system solvers to
detect accuracy of arithmetic on different machines is debatable at least.
All such comparisons depend on a forward analysis (I have got this result;
it should be so-and-so; this is the error).  For linear sytems this is
not very reliable.  To get an historical perspective:  in the forties and
early fifties it was thought that direct methods to solve linear systems
(i.e. LU) could not give good results.  The reason is that forward error
analysis fails on this problem.  It was J.H.Wilkinson who showed that such
methods could be used favorably (in 1956 or thereabouts) by the use of
backward error analysis (i.e. my result is the exact solution for a
system that is very close to the original problem).  So any comparison
of the true solution and the numerical solution should be viewed with
some restraint.
-- 
dik t. winter, cwi, amsterdam, nederland
dik@cwi.nl

troy@troyhix.austin.ibm.com (03/23/90)

>>
>>Here is the result of the "non-IEEE-ness" of the rounding of the
>>
>>---------------------------------------------------------
>>   Errors in solution of the 1000x1000 system of equations
>>           from the LINPACK benchmark suite
>>machine                 precision               RMS error
>>---------------------------------------------------------
>>ETA-10                   32-bit                 2.2e-01
>>IBM 3081                 32-bit                 2.4e-03
>>VAX 8700                 32-bit                 3.9e-04
>>IBM RS/6000              32-bit                 2.8e-04  <-- new ibm
>>IEEE (Sun-3)             32-bit                 2.8e-04  <-- ieee
>>---------------------------------------------------------
>>ETA-10                   64-bit                 1.3e-08
>>Cray X/MP                64-bit                 2.5e-11
>>IBM 3090                 64-bit                 5.8e-12
>>IBM RS/6000              64-bit                 1.2e-12  <-- new ibm
>>IEEE (Sun-3)             64-bit                 2.3e-13  <-- ieee
>>VAX "D"-format           64-bit                 7.2e-14
>>---------------------------------------------------------
>>ETA-10                  128-bit                 1.6e-22
>>Cray X/MP               128-bit                 4.2e-26
>>---------------------------------------------------------
>>
>>(1) The IBM has no extra error in the 32-bit case, but
>>    loses a factor of 5 (about 2 bits) in the 64-bit case.
>>    This is still a factor of almost 5 better than the
>>    results from the IBM mainframe hex-exponent perversity.
>>(2) The Cray and ETA-10 have larger exponent fields for
>>    64-bit numbers than the other formats.  This accounts
>>    for several bits of the difference.  The trouble with
>>    the ETA-10 format is discussed in my article in
>>    Supercomputer, issue 24.
>>
>>John D. McCalpin - mccalpin@vax1.acs.udel.edu
>>                   mccalpin@delocn.udel.edu
>>                   mccalpin@scri1.scri.fsu.edu
>>--
>>John D. McCalpin - mccalpin@vax1.acs.udel.edu
>>                   mccalpin@delocn.udel.edu
>>                   mccalpin@scri1.scri.fsu.edu
>>
>>
>>

    Needless to say this has caused quite a stir here. :-(
I brought this to the attention of our resident Floating-point math
guru, Dr. Peter Markstein of IBM Yorktown research, who was one of the
original 801 (first RISC) team member.



   Lately, reports of RMS errors from LINPACK have been circulating
in comp.arch.  The reports show that the IBM RiscSystem/6000 gets a
larger RMS error than the SUN3 using standard double precision IEEE
arithmetic.  How can this be?
   Computing residuals is a very difficult task.  To do it well, it must
be done in a greater precision than the precision used to solve the
equations.  The reason for this is that in computing residuals, there
is alot of cancellation of significant bits.  After all, if we were
computing with infinite precision, the addition of the very last
term of the inner product used to compute the residual would produce
a zero (cancelling all the digits).  We can say very little about the
residuals that the machines report using their own
maximum precision arithmetic.
   However, we observe that a machine with standard IEEE arithmetic can
very easily understate the size of the residual.  Attached below are
two programs:  the first solves a trivial set of equations and then
computes the RMS of the residuals.  Indeed, the IBM '6000 gets a larger RMS
than does a standard IEEE machine using the same precision.  But the IBM '6000
computes each term of the residual vector EXACTLY, so there is no doubt
as to which computation is more trustworthy.  The second program works
explicitly with the components of the residuals of the first.  It
computes the reciprocals of all the integers between 1 and 1000.  It
then reports the reciprocal to be exact if i * (1/i) - 1.0 is exactly 0.0.
Now we KNOW that that i * (1/i) computes to EXACTLY 1.0 using floating
point arithmetic only when i is a power of two.  Yet, on a standard IEEE
machine, the second program produces far more than 10 lines of output!
   To get back to the accuracy of residuals, one method of solving equations
in the days that it was done by desk calculator was to first solve
    AX = B
getting an approximate solution X'.  The residual was computed as
   R = B - AX', with the arithmetic done in double the usual precision
and the residual R then rounded to standard precision.  A correction term
was then computed by solving AY = R, yielding an approximate solution Y'.
Then X' + Y' was an improved solution, usually giving full precision accuracy.
This method IS NOT used in computers because, using full precision floating
point, the residual R has zero significant bits.  Using it to refine the
solution does no good at all.  Only if R can be computed by machine the
way it was on paper, with extra precision, is iterative refinement useful.
On most computers, this is simply too slow.  On the '6000, however,
the residual CAN be computed with reasonable effort, and iterative
refinement may make a comeback as a useful computational technique.
    Student
    School of Hard Knocks

C   Program 1 - Solving a trivial set of linear equations
         implicit real*8(a-h,o-z)
         dimension a(1000), x(1000), b(1000)
C   This program solves the set of 1000 equations AX = B
C   for the very simple case in which A is the diagonal
C   matrix with A(i,i) = i, and B(i) = 1.  Of course, we
C   know that the result is X(i) = 1/i, as best that 1/i
C   can be represented.  Since we all are IEEE machines,
C   we will all get the same results for X.
C      Then we compute the residuals and sum the squares.
C   We know that most of the X(i) are not exact, because
C   only powers of two have reciprocals that can be repre-
C   sented exactly in binary.  So the sum of the squares
C   of the residuals should be non-zero.  The IBM
C   RiscSystem/6000 computes all the residuals exactly.
C   However, a standard IEEE machine will report zero
C   for many of the components of the residual vector,
C   and thus reports a lower RMS!
        do 1 i = 1, 1000
           a(i) = i
           b(i) = 1.0
1          x(i) = b(i) / a(i)
C   The above loop set up the equations and solved them
C   Now we compute the residuals and sum the squares.
       sum = 0.0
       do 2 i = 1, 1000
2         sum = sum + (b(i) - a(i) * x(i))**2
C   Now we get the RMS ...
       rms = Sqrt(sum / 1000.0)
       write(*,10)rms
10     format(' The RMS for solving nx = 1, n = 1...1000, is',e15.7)
       stop
       end

C   Program 2 -- Determining if reciprocals have an exact representation
           implicit real*8(a-h,o-z)
           do 100 i = 1 , 1000
C             We will compute reciprocals of the first 1000
C             integers, and report the reciprocal to be exact
C             if the product of the integer and its computer
C             reciprocal are exactly 1.0
              x = i
              r = 1.0/x
              if ((x*r - 1.0) .eq. 0) write(*,10)i
100           continue
              stop
10    format(' The integer ', i4, ' has an exact reciprocal in binary')
      end


                              Peter Markstein
   To Reply directly to Dr. Markstein his Email Address is:
   USA InterNet: 
uunet!cs.utexas.edu!ibmaus!auschs!mrkstn.austin.ibm.com!mrkstn
       or         ...@cs.utexas.edu:ibmaus!auschs!mrkstn.austin.ibm.com!mrkstn

><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><
 IBM Corp. - Adv. Workstation Div
  Troy N. Hicks                                      IBM Tieline:  678-7318
   Mail Stop: ZIP 4359                                USA Phone: (512)838-7318
   Austin, Texas  78758
   IBM VNet:  TROYHIX at AUSVM9        IBM InterNet:
troy@troyhix.austin.ibm.com
   USA InterNet:  uunet!cs.utexas.edu!ibmaus!auschs!troyhix.austin.ibm.com!troy
       or         ...@cs.utexas.edu:ibmaus!auschs!troyhix.austin.ibm.com!troy
              /*   I DO NOT speak for IBM, only for MYSELF   */
><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><

mccalpin@vax1.acs.udel.EDU (John D Mccalpin) (03/23/90)

I wrote:
---------------------------------------------------------
   Errors in solution of the 1000x1000 system of equations
           from the LINPACK benchmark suite
machine                 precision               RMS error
---------------------------------------------------------
ETA-10                   64-bit                 1.3e-08
Cray X/MP                64-bit                 2.5e-11
IBM 3090                 64-bit                 5.8e-12
IBM RS/6000              64-bit                 1.2e-12  <-- new ibm
IEEE (Sun-3)             64-bit                 2.3e-13  <-- ieee
VAX "D"-format           64-bit                 7.2e-14
---------------------------------------------------------

In article <1872@awdprime.UUCP> ...@cs.utexas.edu:ibmaus!auschs!troyhix.austin.ibm.com!troy writes:
>    Needless to say this has caused quite a stir here. :-(

I have said before, and I will say again, that I do not believe that
the results that I have shown imply that there is anything *wrong* with
the IBM machine.  The question of error in this case came up because
of the unusual add/multiply instruction of the RS/6000 and I simply
wanted to see how much difference it made in this test case.  It looks
like IBM has a surprisingly strong product here....

The actual accumulation of roundoff error is an inherently data-dependent
process, and the above results are not strong enough to imply that
the IBM results are "bad".  The one data point above does not say
anything useful about whether the IBM floating-point is better or
worse than the IEEE floating-point, it *does* says that there is a
non-trivial difference (~2 bits) in the RMS error of the solution 
of this system of equations.  Other matrices will generate other results,
some in the opposite direction (with the IBM more accurate than IEEE).

>I brought this to the attention of our resident Floating-point math
>guru, Dr. Peter Markstein of IBM Yorktown research, who was one of the
>original 801 (first RISC) team member.

Glad to get some attention!

[... example programs and explanation deleted ...]
>                              Peter Markstein
>uunet!cs.utexas.edu!ibmaus!auschs!mrkstn.austin.ibm.com!mrkstn
>       or         ...@cs.utexas.edu:ibmaus!auschs!mrkstn.austin.ibm.com!mrkstn
>troy@troyhix.austin.ibm.com
-- 
John D. McCalpin                               mccalpin@vax1.acs.udel.edu
Assistant Professor                            mccalpin@delocn.udel.edu
College of Marine Studies, U. Del.             mccalpin@scri1.scri.fsu.edu