[comp.sys.apollo] Problem with FPX on 580-T

i91@nikhefh.hep.nl (Fons Rademakers) (06/08/88)

I have the following problem, see below the dashed line.

What I would like to know is 
  1) does this happen on all DN5(8|9)0-T with a FPX
  2) how come that it also goes wrong in the case when compiled without
     -cpu fpx, is the FPX always used?
  3) is there maybe a microcode fix

Thanks in advance for answering.
   -- Fons Rademakers


-------------------------------------------------
      PROGRAM AAP

      REAL PA(3)

      PA(1) = 100.00001
      PA(2) = 100.00001
      PA(3) = 100.00001

      SAVXYZ = PA(1)*PA(2)*PA(3)

      IF (PA(1)*PA(2)*PA(3) .NE. SAVXYZ) THEN
         PRINT *, '1 NOT OK'
         PRINT 10000, PA(1)*PA(2)*PA(3) - SAVXYZ
      ELSE
         PRINT *, '1 OK'
      ENDIF

      SAVXY2 = PA(1)*PA(2)*PA(3)

      IF (SAVXY2 .NE. SAVXYZ) THEN
         PRINT *, '2 NOT OK'
         PRINT 10000, SAVXY2 - SAVXYZ
      ELSE
         PRINT *, '2 OK'
      ENDIF

10000 FORMAT(E40.15)

      END

*------------------------------ RESULTS -------------------------------
* [86] ftn aap -save -indexl -zero -dba
* no errors, no warnings in AAP, Fortran version 9.66 1988/06/08 16:29:38 MET (Wed)
* [87] aap.bin
*  1 NOT OK
*                   -0.211181500000000E-01
*  2 OK
* [88] ftn aap -save -indexl -zero -dba -cpu fpx
* no errors, no warnings in AAP, Fortran version 9.66 1988/06/08 16:29:47 MET (Wed)
* [89] aap.bin
*  1 NOT OK
*                   -0.211181500000000E-01
*  2 OK
* [90] ftn aap -save -indexl -zero -cpu fpx
* no errors, no warnings in AAP, Fortran version 9.66 1988/06/08 16:29:59 MET (Wed)
* [91] aap.bin
*  1 OK
*  2 NOT OK
*                   -0.211181500000000E-01
* [92] ftn aap -save -indexl -zero
* no errors, no warnings in AAP, Fortran version 9.66 1988/06/08 16:30:11 MET (Wed)
* [93] aap.bin
*  1 OK
*  2 OK
* [94] ftn aap -save -indexl -zero -dbs
* no errors, no warnings in AAP, Fortran version 9.66 1988/06/08 17:47:09 MET (Wed)
* [95] aap.bin
*  1 OK
*  2 OK

*--------------------------------- CONFIGURATION ---------------------
* [140] /com/netstat -config
* 
* 
*      The node ID of this node is 4950.
* 
* **** Node 4950 ****  //mars
* Time 1988/06/08.17:50:35  Up since 1988/05/30.17:04:10  
* 
* Net I/O:         total= 2447728   rcvs = 1323752   xmits = 1123976
* Winchester I/O:  total= 265714   reads= 110012   writes= 155702
* No ring hardware failure report.
* System configured with 8.0 mb of memory.
* 
*   NODE CONFIGURATION
*     Node Type:  DN580-T
*     Hardware  Version:  1.50
*     Display type:  1280 x 1024 color display
*     Graphics Accelerator Board present.
*     Floating Point Accelerator Unit present.
*     Microcode Version:  2.5
*     Peripheral configuration:
*         Disks:  winchester
*         Networks: Ring
*         Tapes:  1/4" cartridge tape
*     Disk types:  MSD-190M  

*--------------------------------- REMARKS -------------------------
* Error does only occur on the machine with the FPX installed.
* 
* On one of our other DN580-T without FPX as well as on all our other
* DN330/3000/4000 the program produces no errors in any of the above cases.
* 
* Note that moving the SAVXY2 = ... statement to just below SAVXYZ = ...
* make the result of [91] look like [89]. I.e. '2 NOT OK' does not 
* occur anymore.
*
* I know it is dangerous to compare REALs, but an error in the second IF
* is still quite worrysome.
-- 
Org:    NIKHEF-H, National Institute for Nuclear and High-Energy Physics.
Mail:   Kruislaan 409, P.O. Box 41882, 1009 DB Amsterdam, the Netherlands
Phone:  (20)5925018 or 5925003                      Telex: 10262 (hef nl)
UUCP:   i91@nikhefh.hep.nl               BITNET: nikhefh!i91@mcvax.bitnet

oj@apollo.uucp (Ellis Oliver Jones) (06/15/88)

In article <471@nikhefh.hep.nl> i91@nikhefh.hep.nl (Fons Rademakers) writes:
>I have the following problem...
>      PA(1) = 100.00001
>      PA(2) = 100.00001
>      PA(3) = 100.00001
>      SAVXYZ = PA(1)*PA(2)*PA(3)
No doubt you realize what you're asking for arithmetically...
Let's analyze it as follows:  

You want to compute (a+b) ^ 3 , where a = 1e2 and b = 1e-5

The result is  a^3 + 3*(a^2*b + a*b^2) + b^2    
or 1e6 + 0.3 + 3e-8 + 1e-15
or 1000000.300000030000001

Single-precision IEEE floating point has a 24-bit mantissa (counting
the "hidden" bit) so at most accuracy is 1 part in 2^24, or about 5.96e-8 .

So, the precision of your PA(1)*PA(2)*PA(3) calculation is always getting sawed
off and more so when you move it from a register to memory.

>What I would like to know is 
>  1) does this happen on all DN5(8|9)0-T with a FPX 
It happened on the one I use.

It has to do with the optimization strategy used by the compiler.
The fpx's internal registers are good to double precision, whereas
once the number's been stored in memory precision is lost.  Code
generation for the FPX is different than for -cpu any 
or -cpu 580  (which gets you 68881 code).  Register-lifetime
differences could account for the differences you saw.

Try using the "-exp" option in your compilation commmand, and you
can see the compiled code for yourself.

>  2) how come that it also goes wrong in the case when compiled without
>     -cpu fpx, is the FPX always used?
Yes.  When you don't specify -cpu fpx, you get generic code which
calls syslib to do floating point operations.  The version of syslib 
which runs on an fpx-node uses the fpx.

> 3) Microcode fix?   

I'm (for what my opinion's worth) not convinced the FPX is broken,
considering the requirements of your computation compared to the
precision provided by single-precision floating point.
We can check it out, though.  Would you mind submitting your net posting
in APR form?

/Oliver Jones     (speaking for myself, not necessarily for Apollo Computer, Inc).

i91@nikhefh.hep.nl (Fons Rademakers) (07/27/88)

This message is mainly intended for Oliver Jones @ Apollo. I mailed
him directly this message but I never got any reply. Maybe something
went wrong with the mail, who knows? Any comments form other qualified
persons are also welcome.

=========================== original message ====================
Hi Oliver,

   Thanks for your extensive reply concerning the Fpx problem. However...

Concerning the original values I gave
>      PA(1) = 100.00001
>      PA(2) = 100.00001
>      PA(3) = 100.00001
>      SAVXYZ = PA(1)*PA(2)*PA(3)
I understand clearly that this is arithmetically extreme and can be
subject to truncation when moving from register to memory.

Making the numbers a little less extreme, to bring them within 
IEEE floating point precision (+/- 5.96e-8), still results in
the same kind of problem.

Take the following numbers:

      PA(1) = 10.1      or even 1.1
      PA(2) = 10.1
      PA(3) = 10.1

Now PA(1)*PA(2)*PA(3) = 1030.301, well within any limits.

Still I get the following errors:

*------------------------------ RESULTS -------------------------------
* [86] ftn aap -save -indexl -zero -dba
* no errors, no warnings in AAP, Fortran version 9.66 1988/06/24 12:33:50 MET (Fri)
* [87] aap.bin
*  1 NOT OK
*                   -0.307197500000000E-04
*  2 OK
* [88] ftn aap -save -indexl -zero -dba -cpu fpx
* no errors, no warnings in AAP, Fortran version 9.66 1988/06/24 12:33:50 MET (Fri)
* [89] aap.bin
*  1 NOT OK
*                   -0.307197500000000E-04
*  2 OK
* [90] ftn aap -save -indexl -zero -cpu fpx
* no errors, no warnings in AAP, Fortran version 9.66 1988/06/24 12:33:50 MET (Fri)
* [91] aap.bin
*  1 OK
*  2 NOT OK
*                   -0.307197500000000E-04
* [92] ftn aap -save -indexl -zero
* no errors, no warnings in AAP, Fortran version 9.66 1988/06/24 12:33:50 MET (Fri)
* [93] aap.bin
*  1 OK
*  2 OK
* [94] ftn aap -save -indexl -zero -dbs
* no errors, no warnings in AAP, Fortran version 9.66 1988/06/24 12:33:50 MET (Fri)
* [95] aap.bin
*  1 OK
*  2 OK
*-----------------------------------------------------------------------

This error pattern (depending on the compiler options) is exactly as
it was in the extreme case. The error only changed from  -0.211181500000000E-01
to -0.307197500000000E-04.

To summarize, I think that there is still something wrong with the Fpx
(although [93] and [95] behave correctly).

In case you did not save the program I attach another copy here, so you can
try it out yourself.

Let me know what do you think of this. 
If there is a FPX problem, please, direct it to the people involved.                                   

Cheers, Fons Rademakers.
*-----------------------------------------------------
      PROGRAM AAP

      REAL PA(3)

      PA(1) = 10.1
      PA(2) = 10.1
      PA(3) = 10.1

      SAVXYZ = PA(1)*PA(2)*PA(3)

      IF (PA(1)*PA(2)*PA(3) .NE. SAVXYZ) THEN
         PRINT *, '1 NOT OK'
         PRINT 10000, PA(1)*PA(2)*PA(3) - SAVXYZ
      ELSE
         PRINT *, '1 OK'
      ENDIF

      SAVXY2 = PA(1)*PA(2)*PA(3)

      IF (SAVXY2 .NE. SAVXYZ) THEN
         PRINT *, '2 NOT OK'
         PRINT 10000, SAVXY2 - SAVXYZ
      ELSE
         PRINT *, '2 OK'
      ENDIF

10000 FORMAT(E40.15)

      END
*------------------------------------------------------

-- 
Org:    NIKHEF-H, National Institute for Nuclear and High-Energy Physics.
Mail:   Kruislaan 409, P.O. Box 41882, 1009 DB Amsterdam, the Netherlands
Phone:  (20)5925018 or 5925003                      Telex: 10262 (hef nl)
UUCP:   i91@nikhefh.hep.nl               BITNET: nikhefh!i91@mcvax.bitnet