i91@nikhefh.hep.nl (Fons Rademakers) (06/08/88)
I have the following problem, see below the dashed line. What I would like to know is 1) does this happen on all DN5(8|9)0-T with a FPX 2) how come that it also goes wrong in the case when compiled without -cpu fpx, is the FPX always used? 3) is there maybe a microcode fix Thanks in advance for answering. -- Fons Rademakers ------------------------------------------------- PROGRAM AAP REAL PA(3) PA(1) = 100.00001 PA(2) = 100.00001 PA(3) = 100.00001 SAVXYZ = PA(1)*PA(2)*PA(3) IF (PA(1)*PA(2)*PA(3) .NE. SAVXYZ) THEN PRINT *, '1 NOT OK' PRINT 10000, PA(1)*PA(2)*PA(3) - SAVXYZ ELSE PRINT *, '1 OK' ENDIF SAVXY2 = PA(1)*PA(2)*PA(3) IF (SAVXY2 .NE. SAVXYZ) THEN PRINT *, '2 NOT OK' PRINT 10000, SAVXY2 - SAVXYZ ELSE PRINT *, '2 OK' ENDIF 10000 FORMAT(E40.15) END *------------------------------ RESULTS ------------------------------- * [86] ftn aap -save -indexl -zero -dba * no errors, no warnings in AAP, Fortran version 9.66 1988/06/08 16:29:38 MET (Wed) * [87] aap.bin * 1 NOT OK * -0.211181500000000E-01 * 2 OK * [88] ftn aap -save -indexl -zero -dba -cpu fpx * no errors, no warnings in AAP, Fortran version 9.66 1988/06/08 16:29:47 MET (Wed) * [89] aap.bin * 1 NOT OK * -0.211181500000000E-01 * 2 OK * [90] ftn aap -save -indexl -zero -cpu fpx * no errors, no warnings in AAP, Fortran version 9.66 1988/06/08 16:29:59 MET (Wed) * [91] aap.bin * 1 OK * 2 NOT OK * -0.211181500000000E-01 * [92] ftn aap -save -indexl -zero * no errors, no warnings in AAP, Fortran version 9.66 1988/06/08 16:30:11 MET (Wed) * [93] aap.bin * 1 OK * 2 OK * [94] ftn aap -save -indexl -zero -dbs * no errors, no warnings in AAP, Fortran version 9.66 1988/06/08 17:47:09 MET (Wed) * [95] aap.bin * 1 OK * 2 OK *--------------------------------- CONFIGURATION --------------------- * [140] /com/netstat -config * * * The node ID of this node is 4950. * * **** Node 4950 **** //mars * Time 1988/06/08.17:50:35 Up since 1988/05/30.17:04:10 * * Net I/O: total= 2447728 rcvs = 1323752 xmits = 1123976 * Winchester I/O: total= 265714 reads= 110012 writes= 155702 * No ring hardware failure report. * System configured with 8.0 mb of memory. * * NODE CONFIGURATION * Node Type: DN580-T * Hardware Version: 1.50 * Display type: 1280 x 1024 color display * Graphics Accelerator Board present. * Floating Point Accelerator Unit present. * Microcode Version: 2.5 * Peripheral configuration: * Disks: winchester * Networks: Ring * Tapes: 1/4" cartridge tape * Disk types: MSD-190M *--------------------------------- REMARKS ------------------------- * Error does only occur on the machine with the FPX installed. * * On one of our other DN580-T without FPX as well as on all our other * DN330/3000/4000 the program produces no errors in any of the above cases. * * Note that moving the SAVXY2 = ... statement to just below SAVXYZ = ... * make the result of [91] look like [89]. I.e. '2 NOT OK' does not * occur anymore. * * I know it is dangerous to compare REALs, but an error in the second IF * is still quite worrysome. -- Org: NIKHEF-H, National Institute for Nuclear and High-Energy Physics. Mail: Kruislaan 409, P.O. Box 41882, 1009 DB Amsterdam, the Netherlands Phone: (20)5925018 or 5925003 Telex: 10262 (hef nl) UUCP: i91@nikhefh.hep.nl BITNET: nikhefh!i91@mcvax.bitnet
oj@apollo.uucp (Ellis Oliver Jones) (06/15/88)
In article <471@nikhefh.hep.nl> i91@nikhefh.hep.nl (Fons Rademakers) writes: >I have the following problem... > PA(1) = 100.00001 > PA(2) = 100.00001 > PA(3) = 100.00001 > SAVXYZ = PA(1)*PA(2)*PA(3) No doubt you realize what you're asking for arithmetically... Let's analyze it as follows: You want to compute (a+b) ^ 3 , where a = 1e2 and b = 1e-5 The result is a^3 + 3*(a^2*b + a*b^2) + b^2 or 1e6 + 0.3 + 3e-8 + 1e-15 or 1000000.300000030000001 Single-precision IEEE floating point has a 24-bit mantissa (counting the "hidden" bit) so at most accuracy is 1 part in 2^24, or about 5.96e-8 . So, the precision of your PA(1)*PA(2)*PA(3) calculation is always getting sawed off and more so when you move it from a register to memory. >What I would like to know is > 1) does this happen on all DN5(8|9)0-T with a FPX It happened on the one I use. It has to do with the optimization strategy used by the compiler. The fpx's internal registers are good to double precision, whereas once the number's been stored in memory precision is lost. Code generation for the FPX is different than for -cpu any or -cpu 580 (which gets you 68881 code). Register-lifetime differences could account for the differences you saw. Try using the "-exp" option in your compilation commmand, and you can see the compiled code for yourself. > 2) how come that it also goes wrong in the case when compiled without > -cpu fpx, is the FPX always used? Yes. When you don't specify -cpu fpx, you get generic code which calls syslib to do floating point operations. The version of syslib which runs on an fpx-node uses the fpx. > 3) Microcode fix? I'm (for what my opinion's worth) not convinced the FPX is broken, considering the requirements of your computation compared to the precision provided by single-precision floating point. We can check it out, though. Would you mind submitting your net posting in APR form? /Oliver Jones (speaking for myself, not necessarily for Apollo Computer, Inc).
i91@nikhefh.hep.nl (Fons Rademakers) (07/27/88)
This message is mainly intended for Oliver Jones @ Apollo. I mailed him directly this message but I never got any reply. Maybe something went wrong with the mail, who knows? Any comments form other qualified persons are also welcome. =========================== original message ==================== Hi Oliver, Thanks for your extensive reply concerning the Fpx problem. However... Concerning the original values I gave > PA(1) = 100.00001 > PA(2) = 100.00001 > PA(3) = 100.00001 > SAVXYZ = PA(1)*PA(2)*PA(3) I understand clearly that this is arithmetically extreme and can be subject to truncation when moving from register to memory. Making the numbers a little less extreme, to bring them within IEEE floating point precision (+/- 5.96e-8), still results in the same kind of problem. Take the following numbers: PA(1) = 10.1 or even 1.1 PA(2) = 10.1 PA(3) = 10.1 Now PA(1)*PA(2)*PA(3) = 1030.301, well within any limits. Still I get the following errors: *------------------------------ RESULTS ------------------------------- * [86] ftn aap -save -indexl -zero -dba * no errors, no warnings in AAP, Fortran version 9.66 1988/06/24 12:33:50 MET (Fri) * [87] aap.bin * 1 NOT OK * -0.307197500000000E-04 * 2 OK * [88] ftn aap -save -indexl -zero -dba -cpu fpx * no errors, no warnings in AAP, Fortran version 9.66 1988/06/24 12:33:50 MET (Fri) * [89] aap.bin * 1 NOT OK * -0.307197500000000E-04 * 2 OK * [90] ftn aap -save -indexl -zero -cpu fpx * no errors, no warnings in AAP, Fortran version 9.66 1988/06/24 12:33:50 MET (Fri) * [91] aap.bin * 1 OK * 2 NOT OK * -0.307197500000000E-04 * [92] ftn aap -save -indexl -zero * no errors, no warnings in AAP, Fortran version 9.66 1988/06/24 12:33:50 MET (Fri) * [93] aap.bin * 1 OK * 2 OK * [94] ftn aap -save -indexl -zero -dbs * no errors, no warnings in AAP, Fortran version 9.66 1988/06/24 12:33:50 MET (Fri) * [95] aap.bin * 1 OK * 2 OK *----------------------------------------------------------------------- This error pattern (depending on the compiler options) is exactly as it was in the extreme case. The error only changed from -0.211181500000000E-01 to -0.307197500000000E-04. To summarize, I think that there is still something wrong with the Fpx (although [93] and [95] behave correctly). In case you did not save the program I attach another copy here, so you can try it out yourself. Let me know what do you think of this. If there is a FPX problem, please, direct it to the people involved. Cheers, Fons Rademakers. *----------------------------------------------------- PROGRAM AAP REAL PA(3) PA(1) = 10.1 PA(2) = 10.1 PA(3) = 10.1 SAVXYZ = PA(1)*PA(2)*PA(3) IF (PA(1)*PA(2)*PA(3) .NE. SAVXYZ) THEN PRINT *, '1 NOT OK' PRINT 10000, PA(1)*PA(2)*PA(3) - SAVXYZ ELSE PRINT *, '1 OK' ENDIF SAVXY2 = PA(1)*PA(2)*PA(3) IF (SAVXY2 .NE. SAVXYZ) THEN PRINT *, '2 NOT OK' PRINT 10000, SAVXY2 - SAVXYZ ELSE PRINT *, '2 OK' ENDIF 10000 FORMAT(E40.15) END *------------------------------------------------------ -- Org: NIKHEF-H, National Institute for Nuclear and High-Energy Physics. Mail: Kruislaan 409, P.O. Box 41882, 1009 DB Amsterdam, the Netherlands Phone: (20)5925018 or 5925003 Telex: 10262 (hef nl) UUCP: i91@nikhefh.hep.nl BITNET: nikhefh!i91@mcvax.bitnet