jbs@WATSON.IBM.COM (05/24/91)
Hugh LaMaster states: But, the reason that the standard is so entrenched is that it is so useful! Manufacturers are forced to use it because users have requirements for it. I believe the standard has become entrenched for reasons other than technical merit. If the IEEE had chosen some other standard it would probably be equally entrenched by now. Hugh LaMaster also states: There are always programs which run on the ragged edge of precision, and which don't work right on a slightly poorer implementation. I am skeptical about this statement. Do you have some non- contrived examples? I believe 64 bits with any reasonable format is enough for most problems. Certainly I have not noticed any great demand for more than 64 bits which suggests to me there is still some pre- cision to spare. Many users get by with 32 bits. Hugh LaMaster also states: 2) IEEE is very well behaved, compared with other representations. I won't bother to substantiate this, other than to state that many experts agreed at the time it was developed that there was no known way to improve it numerically. I do not consider this a point in favor of the IEEE standard. It is very rare that the optimum design in cases where there is trade- off between qualities A and B, consists of maximizing A and ignoring B. This is because there is usually a diminishing returns effect in which one must give up greater and greater amounts of B to obtain further im- provements in A. The above statement suggests the IEEE standard gave undue weight to numerical quality to the exclusion of factors such as performance and ease of implementation. Couldn't we do without some or all of the following features of the IEEE standard? 4 rounding modes (why only 4? why not 8?). Inf and NaN. Denormalized numbers. Distinction between +0 and -0. Will Nasa ever build a spacecraft of which experts will say "there is no known way to make this vehicle safer"? Hugh LaMaster also states: The down side of IEEE is a performance hit in heavily pipelined FP units, for some input values. On the other hand, it is nice to get the right answer, even if some cases slow down. You are understating the downside. The standard is complex which increases design time and cost. This added design cost applies to software such as compilers as well hardware. The performance hits are not confined to heavily pipelined floating point units on denorm- alized inputs. As to wrong answers, wrong answers are generally caused by users not knowing what they are doing. Users who do know what they are doing don't need IEEE arithmetic to get right answers, users who don't know what they are doing will have no problems getting wrong answers using IEEE arithmetic. James B. Shearer
lamaster@pioneer.arc.nasa.gov (Hugh LaMaster) (05/24/91)
Jim Patterson writes: >In article <1991May22.221824.16887@riacs.edu> lamaster@pioneer.arc.nasa.gov (Hugh LaMaster) writes: >>1) It is a standard. It is very important to certain users to be able to >>move binary data between machines without conversion. >This isn't quite correct. While IEEE specifies quite a bit about the >format of its floating point representations, it leaves unsaid a few >details such as endian-ness. The 80x87 chips, for example, follow >INTEL convention I was certainly aware of the "little-endian" problem when I wrote the above :-) I am still wondering which DEC employee writes 10^7 as 00000001, or is that 00010000 ? I always forget. (Add a few smilies here for the humor impaired :-) :-) :-) Even *Digital Review* commented on DEC's choice of little endian for the DECsystem 5xxx product line a while back. However, in any case, I was referring to FP representation conversion, which is much more costly than byte swapping. (But, it can be vectorized for large arrays!) Endian-ness is indeed a problem. As I have stated before, I also support a standard for endian-ness for exactly this reason. However, byte-swapping is a relatively low overhead, even in the kinds of applications I am dealing with. The major problem that it causes is one of convenience, rather than performance: It would be nice to move binary files with mixed data types. I note only that it takes more to do this than agree on an endian standard. It would also be nice to not have to convert somewhere, or write special code to figure out whether the machine you are on is byte- swapped, so that you can write out data in standard metafile order. BTW, you get one guess regarding whether I favor big or little-endian as the standard :-) > Representations >of the extended precision formats are also not pinned down; 80x87 uses >80 bit extended while other implementations e.g. HP-PA use a 128 bit >extended precision. (This is an internal format, though; not a lot of >need to transfer it around between machines). I agree that these are not as pinned down as they should be, and, are also controversial for other reasons as well. However, as you note, they are primarily an internal/computation issue, rather than an external standard issue. >So, while you might get by with doing at most a byte-swap to move IEEE >floats around, you can't blindly assume that all IEEE floats are the >same. Believe me, I am very aware of byte swapping. -- Jim Patterson Cognos Incorporated UUNET:uunet!cognos.uucp!jimp P.O. BOX 9707 BITNET:ccs.carleton.ca!cognos.uucp!jimp 3755 Riverside Drive PHONE:(613)738-1440 x6112 Ottawa, Ont K1G 3Z4 In article <9105240158.AA02761@ucbvax.Berkeley.EDU>, jbs@WATSON.IBM.COM writes: |> |> Hugh LaMaster states: |> But, the reason that the standard is so entrenched is that it is so useful! |> Manufacturers are forced to use it because users have requirements for it. |> |> I believe the standard has become entrenched for reasons other |> than technical merit. If the IEEE had chosen some other standard it |> would probably be equally entrenched by now. Hard to prove or disprove either way. But I can think of lots of standards which people ignore because they are not useful. |> Hugh LaMaster also states: |> There are always programs which run on the ragged edge of precision, |> and which don't work right on a slightly poorer implementation. |> |> I am skeptical about this statement. Do you have some non- |> contrived examples? Over the years I have run into a number of problems like this. Programs which got right answers on a VAX, and didn't converge on an IBM. Other programs which got right answers on a Cray, and didn't converge on an Cyber 205. I have seen such programs even at 64 bits, although 32 is more common. Sometimes, users would get convergence on one machine at 32 bits, and have to go to 64 bits on a different machine to get convergence. Fact is, it happens all the time. What is more dangerous are programs which are verified for correctness on one machine, but don't have built in error checks. Sometimes, when they are moved to another machine, such programs give wrong answers. Just ask any numerical analyst in a consulting group in a big computer center for examples; they can always give you horror stories on demand. I believe 64 bits with any reasonable format is |> enough for most problems. I agree. However, users do not always know how much precision they need, or what they might be doing which increases the precision needed by their program. If it converges to the right answer, why should they? Not everyone understands all the subtleties of error analysis, including myself. The user is frequently not the programmer. When the program stops converging on another machine with a different FP arithmetic, they may be in for a costly delay. The fact is that different FP formats are an obstacle to portability, and, I have plenty of personal experience with this problem. |> Hugh LaMaster also states: |> 2) IEEE is very well behaved, compared with other representations. I won't |> bother to substantiate this, other than to state that many experts agreed |> at the time it was developed that there was no known way to improve it |> numerically. |> |> I do not consider this a point in favor of the IEEE standard. Well, I certainly do. I guess we disagree. . . . |> ... The above statement suggests the IEEE standard gave |> undue weight to numerical quality to the exclusion of factors such as |> performance and ease of implementation. True, it does. This was cause for considerable concern at the time. However, aren't people clever? Pipelined FP units are now available for IEEE. |> Couldn't we do without some or all of the following features of |> the IEEE standard? 4 rounding modes (why only 4? why not 8?). Inf and |> NaN. Denormalized numbers. Distinction between +0 and -0. Inf and NaN are extremely useful, and cost nothing to implement. Denormalized numbers are also extremely useful. They are, unfortunately, very costly to implement, and have been the source of most of the controversy over IEEE. However, see above. |> Will Nasa ever build a spacecraft of which experts will say |> "there is no known way to make this vehicle safer"? I don't speak for NASA. |> Hugh LaMaster also states: |> The down side of IEEE is a performance hit in heavily pipelined FP units, for |> some input values. On the other hand, it is nice to get the right answer, even |> if some cases slow down. |> |> You are understating the downside. The standard is complex |> which increases design time and cost. This added design cost applies |> to software such as compilers as well hardware. The performance hits |> are not confined to heavily pipelined floating point units on denorm- |> alized inputs. I am not aware of any additional design cost to compilers or other system software. Quite the contrary. It is easier to implement a standard math library on a system with a well behaved and well understood FP arithmetic. As for user software, well, I see the benefits of having IEEE FP as the workstation standard every day. As for hardware, there has been a tremendous benefit to microprocessor based systems, from PC's to workstations, in having standard parts available which implement IEEE. It means that both system designers and users have available high performance IEEE parts. Does your System have a Weitek, TI, Cypress, or whatever, FP unit? Do you know, or care? |> As to wrong answers, wrong answers are generally caused by |> users not knowing what they are doing. This argument was addressed above. |> James B. Shearer -- Hugh LaMaster, M/S 233-9, UUCP: ames!lamaster NASA Ames Research Center Internet: lamaster@ames.arc.nasa.gov Moffett Field, CA 94035 With Good Mailer: lamaster@george.arc.nasa.gov Phone: 415/604-1056 #include <std.disclaimer>
henry@zoo.toronto.edu (Henry Spencer) (05/25/91)
In article <9105240158.AA02761@ucbvax.Berkeley.EDU> jbs@WATSON.IBM.COM writes: > As to wrong answers, wrong answers are generally caused by >users not knowing what they are doing. Users who do know what they are >doing don't need IEEE arithmetic to get right answers, users who don't >know what they are doing will have no problems getting wrong answers >using IEEE arithmetic. Unfortunately, numerical computing is too useful and too widespread to remain the plaything of the small number of people who "know what they are doing" in all respects. The fact is, almost nobody who is using computers to get real work done has time for an in-depth study of all the fine points of numerical mathematics. The notion that such people shouldn't try to do numerical computing at all is hopelessly unrealistic, not to mention obnoxiously elitist. Without lengthy analysis by experts, it is not possible to say *for sure* that the answers are right. However, well-designed tools like IEEE FP improve the odds a lot. In particular, they make it much more likely that problems will be obvious and predictable rather than subtle and mysterious. This is very important in the real world, where experts are expensive and in short supply, and a great deal of numerical computing simply has to be done without them. -- And the bean-counter replied, | Henry Spencer @ U of Toronto Zoology "beans are more important". | henry@zoo.toronto.edu utzoo!henry
sjc@borland.com (Steve Correll) (05/25/91)
In article <1991May24.161833.20530@riacs.edu> lamaster@pioneer.arc.nasa.gov (Hugh LaMaster) writes: >I am not aware of any additional design cost to compilers or other system >software. (I'm not the original complainant, but I do compilers.) Pre-IEEE compilers generally require modification to conform to the letter of the standard. Optimizers which rely on a family of axioms like the following: ! (a > b) implies a <= b must be changed lest they produce incorrect code in the presence of NaN. The existence of two representations for zero rules out certain code optimizations. What to do about NaN in Fortran arithmetic IF is a quandary when traps are disabled. It's neither less than, nor equal to, nor greater than zero. The obvious solution--fall through to the next statement--is likely to surprise and not delight the average programmer. At least one compiler vendor went to great lengths to obey IEEE rules when folding constant expressions at compile time, warning the user whenever a result was inexact, since the compiler could not know what the state of inexact traps or rounding would be at execution time. The average programmer may or may not appreciate a warning about every "1./10" in the code, but I'm told it was an effective selling point in competition with other compilers. >As for hardware, there has been a tremendous benefit to microprocessor >based systems, from PC's to workstations, in having standard parts >available which implement IEEE. It means that both system designers >and users have available high performance IEEE parts. That's a good argument for _some_ standard, though not for IEEE in particular. When you build an emulator or cross-compiler, it is much less work if the host and target machines implement floating point identically. >Does your System >have a Weitek, TI, Cypress, or whatever, FP unit? Do you know, or care? Never underestimate entropy. Though Sun has pretty much standardized on IEEE, their "f77" manual page lists the following options, several of which require you to know who made your floating-point chip: -cg87, -cg89, -fnonstd, -f3167, -f68881, -f80387, -ffpa, -ffpaplus, -fsoft, -fswitch, -fstore. :-)
jbs@WATSON.IBM.COM (05/25/91)
Henry Spencer writes: Unfortunately, numerical computing is too useful and too widespread to remain the plaything of the small number of people who "know what they are doing" in all respects. The fact is, almost nobody who is using computers to get real work done has time for an in-depth study of all the fine points of numerical mathematics. The notion that such people shouldn't try to do numerical computing at all is hopelessly unrealistic, not to mention obnoxiously elitist. Well many users apparently feel they don't have time for a superficial review of the basics of numerical mathematics either. Even willingness to use a little common sense would avoid a lot of problems. I believe it is the notion that such people are offered any significant protection by the IEEE standard that is hopelessly unrealistic. Henry Spencer also writes: Without lengthy analysis by experts, it is not possible to say *for sure* that the answers are right. However, well-designed tools like IEEE FP improve the odds a lot. In particular, they make it much more likely that problems will be obvious and predictable rather than subtle and mysterious. Well to repeat myself I don't believe IEEE floating point (part- icularly its more esoteric aspects) improves the odds a lot because I don't believe it addresses the main sources of error. I also don't believe that IEEE FP is well-designed. Exactly why do you believe nu- merical problems using IEEE FP are more likely to be obvious and pre- dictable than numerical problems using IBM hex (for example)? As long as we are on this subject using IBM hex; 0*x is always 0, x.le.y is the same test as .not.(x.gt.y), and (x.ne.x) is always false. None of the preceding properties hold with IEEE FP. Which behavior would you call "obvious and predictable" as opposed to "subtle and mysterious"? Have you considered the problems this sort of thing causes compiler writers? Henry Spencer also writes: This is very important in the real world, where experts are expensive and in short supply, and a great deal of numerical computing simply has to be done without them. Expert hardware designers and expert compiler writers are also expensive and in short supply. Why allow compromises in the area which is the major source of problems, while demanding perfection in an area which is a minor source of problems? James B. Shearer
lindsay@gandalf.cs.cmu.edu (Donald Lindsay) (05/26/91)
In article <9105240158.AA02761@ucbvax.Berkeley.EDU> jbs@WATSON.IBM.COM writes: >As to wrong answers, wrong answers are generally caused by >users not knowing what they are doing. I disagree. Dr. Kahan tells excellent anecdotes on this point, with examples of well-respected products (e.g. Mathematica, hand calculators) giving wrong answers to simple problems. In article <1991May24.173747.1483@zoo.toronto.edu> henry@zoo.toronto.edu (Henry Spencer) writes: >However, well-designed tools like IEEE FP improve the odds a lot. Seconded. People who think that the common alternatives are as well designed, are probably unfamiliar with the sore points. For example, Cray format allows a value which will overflow if multiplied by 1 ... -- Don D.C.Lindsay Carnegie Mellon Robotics Institute
henry@zoo.toronto.edu (Henry Spencer) (05/26/91)
In article <9105250030.AA08036@ucbvax.Berkeley.EDU> jbs@WATSON.IBM.COM writes: >...don't believe it addresses the main sources of error. I also don't >believe that IEEE FP is well-designed. Exactly why do you believe nu- >merical problems using IEEE FP are more likely to be obvious and pre- >dictable than numerical problems using IBM hex (for example)? ... A proper response to this would basically be a detailed defence of IEEE FP's more controversial design decisions. I have neither the time nor, really, the expertise to do this. However, salvation arriveth from an unexpected direction. :-) Go read "What Every Computer Scientist Should Know About Floating-Point Arithmetic", by David Goldberg, in the latest (March) issue of ACM Computing Surveys; doing so will enlighten you in detail on the subject. I will confine myself to observing that IBM hex FP is the only FP format I know of that made half the FP instructions -- the single-precision ones -- just about useless to most programmers. -- "We're thinking about upgrading from | Henry Spencer @ U of Toronto Zoology SunOS 4.1.1 to SunOS 3.5." | henry@zoo.toronto.edu utzoo!henry
hrubin@pop.stat.purdue.edu (Herman Rubin) (05/26/91)
In article <1991May25.222551.16365@zoo.toronto.edu>, henry@zoo.toronto.edu (Henry Spencer) writes: ...................... > I will confine myself to observing that IBM hex FP is the only FP format > I know of that made half the FP instructions -- the single-precision ones -- > just about useless to most programmers. The IBM hex FP is not that much worse than any other FP with 32-bit words. Some of the IEEE implementations make the 32-bit FP operations actually more expensive than the longer ones, whereas on a machine like the CYBER 205, especially for vectors, they are much less expensive. Of course the CYBER quite properly calls the 32-bit ones half precision and the 64-bit ones full precision, and the hardware does make provision for double precision. -- Herman Rubin, Dept. of Statistics, Purdue Univ., West Lafayette IN47907-1399 Phone: (317)494-6054 hrubin@l.cc.purdue.edu (Internet, bitnet) {purdue,pur-ee}!l.cc!hrubin(UUCP)
dwells@fits.cx.nrao.edu (Don Wells) (05/26/91)
In article <12805@mentor.cc.purdue.edu> hrubin@pop.stat.purdue.edu (Herman Rubin) writes: ... The IBM hex FP is not that much worse than any other FP with 32-bit words. The average precision of SP hex normalization, something like 22.5 bits (compared to the 24 bits of VAX or IEEE), does not determine the overall precision of complicated numerical operations like FFTs; instead the 21_bit worst case causes random truncation errors in intermediate sums, and thereby eventually corrupts most of the answers. The loss of precision is easily detectable when SP results from 370-architecture-CPUs are compared with SP results computed on VAXen and IEEE-FP machines. For signal processing applications where the signals and results have high dynamic range (e.g., more than 10,000:1), hex-normalized SP FP must be viewed with suspicion. -- Donald C. Wells Associate Scientist dwells@nrao.edu National Radio Astronomy Observatory +1-804-296-0277 Edgemont Road Fax= +1-804-296-0278 Charlottesville, Virginia 22903-2475 USA 78:31.1W, 38:02.2N
mccalpin@perelandra.cms.udel.edu (John D. McCalpin) (05/26/91)
>>> On 25 May 91 22:25:51 GMT, henry@zoo.toronto.edu (Henry Spencer) said:
Henry> I will confine myself to observing that IBM hex FP is the only
Henry> FP format I know of that made half the FP instructions -- the
Henry> single-precision ones -- just about useless to most
Henry> programmers.
I have to point out that the Cyber 205/ETA-10 32-bit FP is worse than
the IBM's hex FP --- at least for the cases I tested. I discussed
some of this in a paper in Supercomputer in 1987 or 1988 (issue 24, I
believe). The more interesting part was an analysis of the roundoff
error in the 1000x1000 LINPACK benchmark, which unfortunately did not
make it into the paper in its completed form.
Without looking up my answers in detail, on the 1000x1000 test case,
IEEE machines got 4 significant digits of accuracy, the IBM hex format
got about 3, and the Cyber 205 got 1.
Since the error is due to rounding modes and truncation, this same
loss of accuracy extends to the 64-bit format, which got about 8
digits of accuracy compared to 12 for IEEE. This comparison is not
completely fair, since the Cyber 205 64-bit has a bigger exponenent
and smaller mantissa, but the difference is nothing like 4 decimal
digits....
--
John D. McCalpin mccalpin@perelandra.cms.udel.edu
Assistant Professor mccalpin@brahms.udel.edu
College of Marine Studies, U. Del. J.MCCALPIN/OMNET
mbk@jacobi.ucsd.edu (Matt Kennel) (05/27/91)
Yes, once again, the great IEEE floating-point debate. I'm looking at this from a non-experts point of view: I'm a 'naive scientist' who doesn't really know much about the subtleties of numerical mathematics, but I have to do alot of numerical computing for my work. Here is what I see as the 2 sides: Pro: Using a carefully designed standard rules for FP is crucial for getting the 'right' answer, and you can't expect mere mortals to just do it any which way which seems convenient. All the complexity is imperative. Any body who cheats, or doesn't bother, (e.g. Cray) are silly fools who only care how fast the program runs, whether or not it gives the "correct" answer. IEEE arithmetic is the key for consistency for running programs across different machines. Con: Floating-point is only an approximation to reality anyway, so the business of "correctness" is silly. THe answer will never be truly Right, so who gives a flying rat's ass about eeking out that last bit of 'precision'. The IEEE standard only serves to make programs "wrong" in the same artificial way (instilling a false sense of security about the results), but ends up making computers alot slower and more expensive, only to please some anal ret- entive nerds in some ivory-tower committee. (:-)) From a practical scientist's point of view, I'd have to put myself in the second category. I mean, if one's models are only valid to 1% anyway, the end limits of precision are useless anyway. And if something in that last bit makes the algorithm go unstable, then choosing some "standard" way of dealing with FP arithmetic won't solve the essential problem. Of course, I should relate this to computer architecture, and so I ask the experts, what exactly are the costs and benefits of IEEE arithmetic? Matt Kennel mbk@inls1.ucsd.edu
cet1@cl.cam.ac.uk (C.E. Thompson) (05/27/91)
In article <12805@mentor.cc.purdue.edu> hrubin@pop.stat.purdue.edu (Herman Rubin) writes: > >The IBM hex FP is not that much worse than any other FP with 32-bit words. > >Some of the IEEE implementations make the 32-bit FP operations actually >more expensive than the longer ones, ... This isn't unknown in implementations of IBM/360 FP either. On the 370/165, for example, all SP adds and subtracts took one (80ns) cycle longer than the corresponding DP ones. (In fact, the extra cycle was the first one, which converted the operands to DP, and set a bit to cause the subsequent register write to transfer only the first 32 bits of the result.) Multiply and divide were faster in SP than DP, though. Chris Thompson JANET: cet1@uk.ac.cam.phx Internet: cet1%phx.cam.ac.uk@nsfnet-relay.ac.uk
mccalpin@perelandra.cms.udel.edu (John D. McCalpin) (05/27/91)
> On 26 May 91 14:02:57 GMT, mccalpin@perelandra.cms.udel.edu, I said: >>> On 25 May 91 22:25:51 GMT, henry@zoo.toronto.edu (Henry Spencer) said: Henry> I will confine myself to observing that IBM hex FP is the only Henry> FP format I know of that made half the FP instructions -- the Henry> single-precision ones -- just about useless to most Henry> programmers. Me> I have to point out that the Cyber 205/ETA-10 32-bit FP is worse than Me> the IBM's hex FP --- at least for the cases I tested. I discussed Me> some of this in a paper in Supercomputer in 1987 or 1988 (issue 24, I Me> believe). The more interesting part was an analysis of the roundoff Me> error in the 1000x1000 LINPACK benchmark, which unfortunately did not Me> make it into the paper in its completed form. Just so no one can complain too much about "anecdotal evidence", here are the results from my study: Errors in solution of 1000x1000 system of equations from the LINPACK benchmark suite machine precision RMS error --------------------------------------------------------- ETA-10 32-bit 2.2e-01 IBM 3081 32-bit 2.4e-03 VAX 8700 32-bit 3.9e-04 IEEE (Sun-3) 32-bit 2.8e-04 IBM RS/6000 32-bit 2.8e-04 ETA-10 64-bit 1.3e-08 IBM RS/6000 64-bit 1.3e-10 Cray X/MP 64-bit 2.5e-11 IBM 3090 64-bit 5.8e-12 IBM RS/6000 (-qnomaf) 64-bit 2.2e-12 IEEE (Sun-3) 64-bit 2.3e-13 VAX "D"-format 64-bit 7.2e-14 ETA-10 128-bit 1.6e-22 Cray X/MP 128-bit 4.2e-26 --------------------------------------------------------- Notes: (1) The LINPACK 1000x1000 matrix is set up so that the solution is a vector whose elements are all equal to 1.0. The RMS error is calculated by err2 = 0.0d0 do i=1,1000 err2 = err2 + (x(i)-one)**2 end do rms = sqrt(err2/999.) (2) The reason for the difference between the IEEE and IBM RS/6000 errors is unclear to me. I got a long explanation from someone at IBM Austin a year or two ago trying to explain why the enhanced accuracy led to a larger error calculation -- but I never did really understand it. In an effort to check the accuracy of the IEEE error estimates, I sorted the errors and then summed them in order of increasing magnitude and got identical results.... Note that the -qnomaf flag prohibits the compiler from using the enhanced accuracy combined multiply/add instruction. This should give IEEE compliant results, but it does not. (3) The IEEE results are from a Sun-3, but have been duplicated on Sun-4 and SGI (MIPS) machines. (4) I will post the answers for 128-bit arithmetic on the IBM RS/6000 as soon as I get the new compiler installed.... -- John D. McCalpin mccalpin@perelandra.cms.udel.edu Assistant Professor mccalpin@brahms.udel.edu College of Marine Studies, U. Del. J.MCCALPIN/OMNET
khb@chiba.Eng.Sun.COM (Keith Bierman fpgroup) (05/28/91)
In article <1991May24.221018.18582@borland.com> sjc@borland.com (Steve Correll) writes:
Never underestimate entropy. Though Sun has pretty much standardized on IEEE,
their "f77" manual page lists the following options, several of which require
you to know who made your floating-point chip: -cg87, -cg89, -fnonstd, -f3167,
-f68881, -f80387, -ffpa, -ffpaplus, -fsoft, -fswitch, -fstore. :-)
Don't forget the most handy of them all
-fast
Which among other things, selects the optimal mode for the machine
that you are compiling on.
Note that for most of the options, what is selected is performance
.... if you don't care how fast it goes you don't need to say anything.
--
----------------------------------------------------------------
Keith H. Bierman keith.bierman@Sun.COM| khb@chiba.Eng.Sun.COM
SMI 2550 Garcia 12-33 | (415 336 2648)
Mountain View, CA 94043
dik@cwi.nl (Dik T. Winter) (05/28/91)
In article <MCCALPIN.91May27092950@pereland.cms.udel.edu> mccalpin@perelandra.cms.udel.edu (John D. McCalpin) writes: > Just so no one can complain too much about "anecdotal evidence", here > are the results from my study: I did believe you before you posted these results. The remaining question is, how much of the bad ETA-10 results is due to software, and how much is due to hardware. I have seen the 205 compiler at work, and the results were mediocre at least. Testing elementary functions I found that the 'HALF PRECISION' (32 bits) arcsine gave better results then the 'REAL' (64 bits) arcsine! I presume a software problem. On the other hand, it is provable that the Cray can loose 5 ULPS in precision on a division. I do not think the ETA-10 goes beyond 1.5 ULP (and IEEE gives 0.5 ULP of course). In general, when giving such a table you ought to give the precision in mantissa size. I think it is: number size mantissa size ETA-10 32 bits 23 bits IBM 3081 32 bits 24 bits (6 hex digits) VAX/IEEE/RS6000 32 bits 24 bits ETA-10 64 bits 47 bits Cray 64 bits 48 bits IBM 3090 64 bits 56 bits (14 hex digits) VAX D 64 bits 56 bits IEEE/RS6000 64 bits 53 bits ETA-10 128 bits 95 bits (software) Cray 128 bits 96 bits (software) However, due to the way arithmetic works, I generally give penalties both to ETA-10 (Cyber 205) and Cray arithmetic. In the case of ETA-10 (Cyber 205) I generally say that precision is 22 bits, resp. 46 bits. For the Cray the penalty is 3 bits if division is heavily used, 1 bit otherwise. For the 128 bit precision the penalty is larger. For both the arithmetic failures are documented, although fairly inaccessible. I have no idea about the bad results of the RS6000. My only suggestion is that there is probably some problem with the translation to and from IEEE from and to the internal hex code used. (Yes, contrary to most machines the external visible representation of FP numbers does not match the internal representation on the RS6000.) -- dik t. winter, cwi, amsterdam, nederland dik@cwi.nl
bob@tera.com (Bob Alverson) (05/28/91)
In article <MCCALPIN.91May27092950@pereland.cms.udel.edu> mccalpin@perelandra.cms.udel.edu (John D. McCalpin) writes: >(2) The reason for the difference between the IEEE and IBM RS/6000 >errors is unclear to me. I got a long explanation from someone at IBM >Austin a year or two ago trying to explain why the enhanced accuracy >led to a larger error calculation -- but I never did really understand >it. In an effort to check the accuracy of the IEEE error estimates, >I sorted the errors and then summed them in order of increasing >magnitude and got identical results.... >Note that the -qnomaf flag prohibits the compiler from using the >enhanced accuracy combined multiply/add instruction. This should give >IEEE compliant results, but it does not. As an example of how MAF can give worse error, consider this: a = pi*pi; b = a - pi*pi; If the second line is done with a single round, b is not zero. However, if each operation is separately rounded, b comes out as zero, as you might expect. This situation is most troubling, since MAF is seen to both enhance and reduce accuracy. The trick is figuring out which it does when. Bob
davidsen@crdos1.crd.ge.COM (Wm E Davidsen Jr) (05/28/91)
In article <5397@network.ucsd.edu> mbk@jacobi.ucsd.edu (Matt Kennel) writes: | Con: Floating-point is only an approximation to reality anyway, so the | business of "correctness" is silly. THe answer will never be truly Right, | so who gives a flying rat's ass about eeking out that last bit of 'precision'. | The IEEE standard only serves to make programs "wrong" in the same artificial | way (instilling a false sense of security about the results), but ends up | making computers alot slower and more expensive, only to please some anal ret- | entive nerds in some ivory-tower committee. (:-)) Take a course in numerical analysis. A small reduction in accuracy can result in a drop from a few significant digits to none, depending on what you're doing, and how you're doing it. What good is an answer if you lose that "last bit of precision?" In analysis of some problems you may not have more than a few significant bits for starters, and I would rather not trust my bits to a computer designed with the marketing department winning compromises between "right" and "fast" answers. -- bill davidsen (davidsen@crdos1.crd.GE.COM -or- uunet!crdgw1!crdos1!davidsen) "Most of the VAX instructions are in microcode, but halt and no-op are in hardware for efficiency"
khb@chiba.Eng.Sun.COM (Keith Bierman fpgroup) (05/29/91)
In article <3421@crdos1.crd.ge.COM> davidsen@crdos1.crd.ge.COM (Wm E Davidsen Jr) writes: | Con: Floating-point is only an approximation to reality anyway, so the | business of "correctness" is silly. THe answer will never be truly Right, | so who gives a flying rat's ass about eeking out that last bit of 'precision'. | The IEEE standard only serves to make programs "wrong" in the same artificial | way (instilling a false sense of security about the results), but ends up | making computers alot slower and more expensive, only to please some anal ret- | entive nerds in some ivory-tower committee. (:-)) Take a course in numerical analysis. A small reduction in accuracy can I second the motion. Ideally the course will expose you to case studies where you will see bad algorithms, bad implementations of good algorithms, ill-condition systems, and bad computer arithmetic (among other things) the really interesting lesson is that you often can't tell one failure from another from the symptoms. Having 1% bad fp arithmetic is like playing russian roulette every morning. It will bite you someday. Unlike RR, the bad fp will just assist you in doing bad science. Perhaps that isn't a problem, just think of all the extra papers/research we all get to do to disprove the resulting drivel... -- ---------------------------------------------------------------- Keith H. Bierman keith.bierman@Sun.COM| khb@chiba.Eng.Sun.COM SMI 2550 Garcia 12-33 | (415 336 2648) Mountain View, CA 94043
news@inews.intel.com (news accounting id) (05/29/91)
> >In article <3421@crdos1.crd.ge.COM> davidsen@crdos1.crd.ge.COM (Wm E Davidsen Jr) writes: > >> Take a course in numerical analysis. A small reduction in accuracy can > >I second the motion. Ideally the course will expose you to case >studies where you will see bad algorithms, bad implementations of good >algorithms, ill-condition systems, and bad computer arithmetic (among >other things) the really interesting lesson is that you often can't >tell one failure from another from the symptoms. What you should do is use interval arithmetic. It uses twice the memory and less than 3 times the time*gates. By using interval arithmetic you at least know when something is messed up. In fact, used typed data and you'll know if your algorithms make sense. From: mfineman@cadev6.intel.com (Mark Fineman ~) Path: cadev6!mfineman (408) 765-4277; MS SC3-36, 3065 Bowers Ave, Santa Clara, CA 95052 / decwrl \ | hplabs | -| oliveb |- !intelca!mipos3!cadev6!mfineman | amd | \ qantel /
jbs@WATSON.IBM.COM (05/29/91)
I said: >As to wrong answers, wrong answers are generally caused by >users not knowing what they are doing. D.C. Lindsay commented: I disagree. Dr. Kahan tells excellent anecdotes on this point, with examples of well-respected products (e.g. Mathematica, hand calculators) giving wrong answers to simple problems. I don't see your point here. In the first place I said generally not always. In the second place these examples support my point since the users here are the writers of Mathematica and the power function on calculators and they are making errors which have nothing to do with the quality of the floating point arithmetic they are using. D.C. Linsay also said Seconded. People who think that the common alternatives are as well designed, are probably unfamiliar with the sore points. For example, Cray format allows a value which will overflow if multiplied by 1 ... IEEE format allows values which when multiplied by 0 do not equal 0 which is in my opinion worse. I said: >...don't believe it addresses the main sources of error. I also don't >believe that IEEE FP is well-designed. Exactly why do you believe nu- >merical problems using IEEE FP are more likely to be obvious and pre- >dictable than numerical problems using IBM hex (for example)? ... Henry Spencer replies A proper response to this would basically be a detailed defence of IEEE FP's more controversial design decisions. I have neither the time nor, really, the expertise to do this. However, salvation arriveth from an unexpected direction. :-) Go read "What Every Computer Scientist Should Know About Floating-Point Arithmetic", by David Goldberg, in the latest (March) issue of ACM Computing Surveys; doing so will enlighten you in detail on the subject. I will confine myself to observing that IBM hex FP is the only FP format I know of that made half the FP instructions -- the single-precision ones -- just about useless to most programmers. I checked the Goldberg reference. He states in effect the pur- pose of his paper is to explain IEEE floating point not defend it. Your statements about IBM hex also are not responsive to the question I asked which to repeat is given that you have exceeded the precision provided why does this produce "obvious and predictable" effects with IEEE as opposed to "subtle and mysterious" effects with IBM hex (or any other alternative). Bill Davidson states: may not have more than a few significant bits for starters, and I would rather not trust my bits to a computer designed with the marketing department winning compromises between "right" and "fast" answers. I believe you have things backwards here. Marketing depart- ments generally love IEEE. After all they don't have to implement it. Keith H. Bierman states: Having 1% bad fp arithmetic is like playing russian roulette every morning. It will bite you someday. Unlike RR, the bad fp will just assist you in doing bad science. Perhaps that isn't a problem, just think of all the extra papers/research we all get to do to disprove the resulting drivel... This would imply people using Crays are more likely to pro- duce drivel than people using IEEE systems. I doubt this is true. James B. Shearer
ok@goanna.cs.rmit.oz.au (Richard A. O'Keefe) (05/29/91)
In article <4451@inews.intel.com>, news@inews.intel.com (news accounting id) writes: > What you should do is use interval arithmetic. It uses twice the > memory and less than 3 times the time*gates. By using interval > arithmetic you at least know when something is messed up. In fact, > used typed data and you'll know if your algorithms make sense. Support for Interval arithmetic is precisely why IEEE arithmetic includes all those rounding modes. -- Should you ever intend to dull the wits of a young man and to incapacitate his brains for any kind of thought whatever, then you cannot do better than give him Hegel to read. -- Schopenhauer.
henry@zoo.toronto.edu (Henry Spencer) (05/29/91)
In article <9105290600.AA28145@ucbvax.Berkeley.EDU> jbs@WATSON.IBM.COM writes: > I checked the Goldberg reference. He states in effect the pur- >pose of his paper is to explain IEEE floating point not defend it. Did you read the paper, or just the introduction? He does a fairly good job of justifying the various decisions as he goes. -- "We're thinking about upgrading from | Henry Spencer @ U of Toronto Zoology SunOS 4.1.1 to SunOS 3.5." | henry@zoo.toronto.edu utzoo!henry
mccalpin@perelandra.cms.udel.edu (John D. McCalpin) (05/30/91)
> On 27 May 91 13:29:50 GMT, mccalpin@perelandra.cms.udel.edu (John D. McCalpin) said: > On 26 May 91 14:02:57 GMT, mccalpin@perelandra.cms.udel.edu, I said: Me> Errors in solution of 1000x1000 system of equations Me> from the LINPACK benchmark suite Me> machine precision RMS error Me> --------------------------------------------------------- Me> IBM RS/6000 64-bit 1.3e-10 ^^^^^^^ WRONG! Me> IBM RS/6000 (-qnomaf) 64-bit 2.2e-12 Me> IEEE (Sun-3) 64-bit 2.3e-13 Me> --------------------------------------------------------- Please note that the number indicated is WRONG. At least, I am not able to reproduce it. The reproducible number is 1.2e-12 for the RMS error. This is very close the the -qnomaf result, but still inexplicably about 5 times larger than the IEEE number. I apologize for the misunderstanding, and any resulting heartburn at IBM :-)..... -- John D. McCalpin mccalpin@perelandra.cms.udel.edu Assistant Professor mccalpin@brahms.udel.edu College of Marine Studies, U. Del. J.MCCALPIN/OMNET
mccalpin@perelandra.cms.udel.edu (John D. McCalpin) (05/30/91)
> On 29 May 91 20:46:41 GMT, mccalpin@perelandra.cms.udel.edu I said: Me> RMS ERRORS IN SOLUTION OF LINPACK ORDER 1000 SYSTEM Me> machine precision RMS error Me> --------------------------------------------------------- Me> IBM RS/6000 64-bit 1.3e-10 <-- WRONG! Me> IBM RS/6000 (-qnomaf) 64-bit 2.2e-12 <-- WRONG! Me> IEEE (Sun-3) 64-bit 2.3e-13 Me> --------------------------------------------------------- The mystery is solved, thanks to James Shearer of IBM, who found a bug in one of the BLAS routines I was using. The IBM RS/6000 machines now reproduce the IEEE results and show an insignificant difference in accuracy between the results with the multiply-accumulate instruction and without it. >Date: Wed, 29 May 91 20:30:40 EDT >From: jbs@watson.ibm.com >To: mccalpin >Subject: linpack 1000 > The code you sent me appears to contain a bug. In the isamax >routine there is a statement: > dmax=abs(dx(ix)) >I believe ix should be i. This causes the find pivot step to possibly >find an incorrect pivot. This would explain an increased error in the >result. When I change ix to i, I now get a rms error using the nomaf >option of 2.27*10**-13 (2.24*10**-13 using maf) which agrees with the >other IEEE machines. The results for the 3090 (IBM hex) also change >(7.08*10**-13). It took me longer to find this than it should have >since the VS Fortran compiler was warning me about possibly uninitial- >ized variables (ddot appears to have the same problem although it is >not used). > James B. Shearer The table of results is now: --------------------------------------------------------- Errors in solution of 1000x1000 system of equations from the LINPACK benchmark suite machine precision RMS error --------------------------------------------------------- ETA-10 32-bit 2.2e-01 IBM 3081 32-bit 2.4e-03 VAX 8700 32-bit 3.9e-04 IEEE 32-bit 2.8e-04 ETA-10 64-bit 1.3e-08 Cray X/MP 64-bit 2.5e-11 IBM 3090 64-bit 7.1e-13 IEEE 64-bit 2.3e-13 VAX "D"-format 64-bit 7.2e-14 ETA-10 128-bit 1.6e-22 Cray X/MP 128-bit 4.2e-26 --------------------------------------------------------- John D. McCalpin - mccalpin@perelandra.cms.udel.edu --------------------------------------------------------- Machines which have been tested with IEEE formats include: Sun-3, Sun-4, IRIS 4D, and IBM RS/6000. Curiously, the IRIS 3000 series, which used IEEE formats, but did not use IEEE-compliant rounding got slightly better results (1 bit or so). Note that the error which Shearer found in my code was only for my port of the DOUBLEPRECISION BLAS, so the 32-bit results and the 64-bit results on the machines for which REAL is 64-bits are probably good. In Summary: ----------- The original purpose of the post was to show that the Cyber 205/ETA-10 32-bit format was *much* worse in accuracy than the IBM HEX format, which these results show clearly (for this one test case). The IBM HEX results are only 2-3 bits worse than IEEE, which is seldom disastrous. If this 2-3 bit difference does cause your application serious trouble, then you should be running at the next higher precision on all the platforms.... -- John D. McCalpin mccalpin@perelandra.cms.udel.edu Assistant Professor mccalpin@brahms.udel.edu College of Marine Studies, U. Del. J.MCCALPIN/OMNET
dwells@fits.cx.nrao.edu (Don Wells) (05/31/91)
In article <MCCALPIN.91May30105154@pereland.cms.udel.edu> mccalpin@perelandra.cms.udel.edu (John D. McCalpin) writes: ... The IBM HEX results are only 2-3 bits worse than IEEE, which is seldom disastrous. If this 2-3 bit difference does cause your application serious trouble, then you should be running at the next higher ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ precision on all the platforms.... ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ I disagree with this statement. The 2-3 bit difference is very nearly one whole decimal digit. You need at least one digit for truncation guard (two is better), and you want to resolve the noise (about one more digit), so if your dynamic range is 4 digits (and quite a few measurement systems are this good today), you generally have just enough digits for signal processing with IEEE, but quite possibly not enough with hex normalization. -- Donald C. Wells Associate Scientist dwells@nrao.edu National Radio Astronomy Observatory +1-804-296-0277 Edgemont Road Fax= +1-804-296-0278 Charlottesville, Virginia 22903-2475 USA 78:31.1W, 38:02.2N
jbs@WATSON.IBM.COM (05/31/91)
I said: > I checked the Goldberg reference. He states in effect the pur- >pose of his paper is to explain IEEE floating point not defend it. Henry Spencer said: Did you read the paper, or just the introduction? He does a fairly good job of justifying the various decisions as he goes. I have looked through the paper ("What Every Computer Scientist Should Know About Floating-Point Arithmetic" by D. Goldberg, ACM Compu- ing Surveys, Vol 23, 1991, p5-48). He provides examples of ways in which various features of IEEE arithmetic can be used. I don't consider the fact that a feature has some use justification for including it. Justif- ication would also consider the cost of providing a feature and discus- sion of alternative ways of providing some or all of the benefits. A feature should be included only if this sort of analysis indicates a favorable cost/benefit ratio. The paper does little of this and indeed the author states this is not the purpose of his paper. Additionally some of the examples of how to use IEEE seem ill- advised to me. I will discuss one example. On page 22 the author says (I have made small changes to the notation, fmax is the largest finite floating point number): >Here is a practical example that makes use of infinity arithmetic. Con- >sider computing the function x/(x**2+1). This is a bad formula, because >not only will it overflow when x is larger than <sqrt(fmax)>, but infin- >ity arithmetic will give the wrong answer because it will yield 0 rather >than a number near 1/x. However x/(x**2+1) can be rewritten as 1/(x+ >1/x). This improved expression will not overflow prematurely and be- >cause of infinity arithmetic will have the correct value when x=0: 1/(0+ >1/0)=1/(0+inf)=1/inf=0. Without infinity arithmetic the expression 1/ >(x+1/x) requires a test for x=0, which not only adds extra instructions >but may also disrupt a pipeline. This example illustrates a general >fact; namely, that infinity arithmetic often avoids the need for special >case checking; however formulas need to be carefully inspected to make >sure they do not have spurious behavior... I have the following comments. 1) If the user uses the original "bad" formula out of careless- ness or a belief x will never be large then as the author notes IEEE arithmetic may compute a totally bogus result with no error indication (other than the setting of an overflow bit which nobody looks at). On other systems the error is usually more obvious. 2) An alternative way of dealing with the large x problem is: if(abs(x).le.1.0d18)then y=x/(1+x*x) else y=1/x endif Compared to the author's solution, this has the following virtues a) It is more portable. b) It will be faster (at least on scalar machines) because it uses a single floating point divide. c) It will work properly on denormalized inputs. The authors code fails whenever 1/x overflows but x is not 0. Note the original "bad" code handles this properly. 3) Coding intentional harmless overflows makes it very diffi- cult to detect unintentional harmful overflows (such as the original formula can produce). Thus coding in the author's suggested style would appear to be dangerous. To sum up, in my opinion all that this example shows is that infs provide a new way to screw up. Another quote from this paper. In a footnote on page 17 the author states: >According to Kahan, extended precision has 64 bits of significand be- >cause that was the widest precision across which carry propagation >could be done on the Intel 8087 without increasing the cycle time... This may constitute justification to 8087 designers, I don't see why it should be for the rest of us. James B. Shearer
glew@pdx007.intel.com (Andy Glew) (06/01/91)
James B. Shearer says: [to handle overflow problems]: > if(abs(x).le.1.0d18)then > y=x/(1+x*x) > else > y=1/x > endif >Compared to the author's solution, this has the following virtues > a) It is more portable. Minor point, but I have trouble imagining anything with a hard-wired constant in it to be portable. -- Andy Glew, glew@ichips.intel.com Intel Corp., M/S JF1-19, 5200 NE Elam Young Parkway, Hillsboro, Oregon 97124-6497 This is a private posting; it does not indicate opinions or positions of Intel Corp.
henry@zoo.toronto.edu (Henry Spencer) (06/01/91)
In article <9105310523.AA15861@ucbvax.Berkeley.EDU> jbs@WATSON.IBM.COM writes: >Justification would also consider the cost of providing a feature and discus- >sion of alternative ways of providing some or all of the benefits. A >feature should be included only if this sort of analysis indicates a >favorable cost/benefit ratio... My understanding is that the funnier-looking features, in particular the infinities, NaNs, and signed zeros, mostly cost essentially nothing. Getting the rounding right reportedly takes work, but that one seems easy to support. As far as I know, the *only* thing in IEEE arithmetic whose cost/benefit ratio has been seriously disputed is denormalized numbers. >arithmetic may compute a totally bogus result with no error indication >(other than the setting of an overflow bit which nobody looks at). On >other systems the error is usually more obvious. Only if the author goes to substantially more trouble than merely looking at an overflow bit. This really sounds to me like the old "people wrote better when they had to retype a whole page to fix an error, because they had to think about it more" fallacy. People who took care before will continue to take care; people who didn't before won't now. The difference is that both are somewhat less likely to get plausible-looking garbage now. Nobody is proposing that IEEE arithmetic is a replacement for care and understanding. All it does is improve the odds that a given amount of care and understanding will produce meaningful answers. >>According to Kahan, extended precision has 64 bits of significand be- >>cause that was the widest precision across which carry propagation >>could be done on the Intel 8087 without increasing the cycle time... > > This may constitute justification to 8087 designers, I don't >see why it should be for the rest of us. To me it sounded like "the precise number of bits was not thought to be crucial, so the limitations of a very important existing implementation were taken into consideration". If you don't like this, explain why the resulting number of bits is wrong or inadequate. If it's satisfactory, why do you care how it was arrived at? Standards decisions often turn on such seemingly trivial or ephemeral issues; what matters is the result. -- "We're thinking about upgrading from | Henry Spencer @ U of Toronto Zoology SunOS 4.1.1 to SunOS 3.5." | henry@zoo.toronto.edu utzoo!henry
pshuang@ATHENA.MIT.EDU (06/04/91)
Reading this thread, I get the impression that several people feel that IEEE compliance exacts a large toll in speed. I do not doubt this, but I was wondering if anyone had comparative statistics on the speed of floating point operations when performed by an IEEE-compliant package versus one which was not. Of course, to be meaningful the data would have to be for the same piece of hardware. Singing off, UNIX:/etc/ping instantiated (Ping Huang).
hays@iSC.intel.com (Kirk Hays) (06/05/91)
In article <9106040308.AA20570@W20-575-91.MIT.EDU>, pshuang@ATHENA.MIT.EDU writes: |> Reading this thread, I get the impression that several people feel that |> IEEE compliance exacts a large toll in speed. I do not doubt this, but |> I was wondering if anyone had comparative statistics on the speed of |> floating point operations when performed by an IEEE-compliant package |> versus one which was not. Of course, to be meaningful the data would |> have to be for the same piece of hardware. |> On the iPSC/860 (and the Delta Touchstone System) we provide an IEEE compliant mode as default, and a non-IEEE mode, available via compilation/linking switches. The non-IEEE mode forces denormals to 0.0 when they are encountered, and does lossy division (loss of 3 ulps). In addition, the routines in libm.a are substituted with analogues that have been compiled in the non-IEEE mode. Codes that encounter denormals, do division, or call the libm.a routines that encounter denormals or do division, experience speedups. Without citing codes or exact numbers, I have seen speedups ranging from nothing to 500x. It's very application/dataset dependent. Just anecdotal evidence, worth no more than the bits it's written with. Note that handling of denormals and division on the i860 are done in software, like many other RISC implementations. I don't speak for Intel - they don't pay me enough to do that. -- Kirk Hays - NRA Life. [hays@ssd.intel.com] "Good ideas become hardware, bad ideas stay in software." - who said that?
jbs@WATSON.IBM.COM (06/05/91)
Ping Huang asked: Reading this thread, I get the impression that several people feel that IEEE compliance exacts a large toll in speed. I do not doubt this, but I was wondering if anyone had comparative statistics on the speed of floating point operations when performed by an IEEE-compliant package versus one which was not. Of course, to be meaningful the data would have to be for the same piece of hardware. The following quote from "Machine Organization of the IBM RISC System/6000 processor" by G.F. Grohoski, IBM Journal of Research and Development", vol 34, 1990, p37-58 (p56-57) may give some indication. "... the RISC System/6000 uses the IEEE floating-point arith- metic format, while AMERICA used the IBM System/370 format. ... This degraded floating point performance substantially in peak floating- point loops. For example, using the 2D graphics example above, the RS/6000 machine takes seven cycles per loop iteration as opposed to four in AMERICA. On balance however this degradation is less severe; while the potential AMERICA LINPACK performance was approximately 15 MFLOPS, the RISC System/6000 achieves nearly 11 MFLOPS." The main reason for the performance degradation appears to be the need for rounding in IEEE. James B. Shearer
jbs@WATSON.IBM.COM (06/08/91)
I said: > 1. The question is not what fortran requires but what IEEE imple- >mented in fortran requires. One of the problems with IEEE's infs and >NaNs is that they do not fit well with fortran. Robert Firth said: Yes, that's the problem. The Fortran standard describes the implementation of arithmetic expressions in terms of the "mathematical" rules, that is, the mathematical properties of real numbers. It is the business of the implementation to provide a reasonable approximation that, as far as possible, obeys those rules. If it fails to do so, i's not a legitimate "fortran machine" and shouldn't claim to support ANSI F-77. But a language implementation must follow the standard. That is the whole point of standards, after all, that they circumscribe the permissible implementations. You can't correct bogus hardware by writing a bogus compiler. Clearly it would be possible to write a compiler for an IEEE machine which performed all floating point operations by simulating IBM hex arithmetic (for example) using fixed point instructions. Such a compiler might obey the fortran standard. However it would be ludicrous to claim it was obeying the IEEE floating point standard. Are you saying it is impossible to write a compiler which simultaneously obeys the fortran standard and the IEEE floating point standard? Are you saying the fortran standard forbids floating point models which include infs? Do you plan to claim your compiler complies with the IEEE standard? This thread started with my claim that it is expensive for compilers to comply with the IEEE standard. I do not claim it is im- possible. I said: > Do you replace 0*x with 0 when compiling without optimization? >I think that if a (legal) program behaves differently when compiled with >and without optimization that then the compiler is broken. Perhaps you >disagree. Robert Firth said: Yes, I disagree. The issue is the relevance of the difference. Is a compiler broken because an optimised program runs faster? That's a difference in behaviour, but you presumably think it an acceptable one. Likewise, if a program transformation is permitted by the standard, a compiler is entitled to apply that transformation; it is the responsibility of the programmer to be aware which transformations are legal and not assume any of them won't happen. To clarify I was not referring to timing type differences. Perhaps I should have said it is undesirable for a program to behave differently when compiled with and without optimization regardless of whether this is technically permitted by the standard. (Actually I think it would desirable for the standard to exactly specify the behavior of any legal program but apparently it does not.) An example may explain why I believe this. Consider an user with a 100000 line fortran package. It works fine when compiled on other machines (with and without optimization). It works fine when compiled with your compiler without optimization. However it fails when compiled with your compiler with optimization. I believe it will difficult to con- vince this user that the bug is in his package and not in your compiler whatever the technical validity of your position James B. Shearer
preston@rutherfordia.rice.edu (Preston Briggs) (06/08/91)
jbs@WATSON.IBM.COM writes: >>I think that if a (legal) program behaves differently when compiled with >>and without optimization that then the compiler is broken. >It works fine when compiled on other machines (with >and without optimization). It works fine when compiled with your >compiler without optimization. However it fails when compiled with >your compiler with optimization. I believe it will difficult to con- >vince this user that the bug is in his package and not in your compiler >whatever the technical validity of your position Often, compiler and machine differences expose "implementation dependencies" in programs. These are sometimes marked as "implementation dependent" in the language spec, but programmers _will_ play with fire. Alternatively, some language specs say things like: "the behaviour of a program doing thus-and-so is undefined." Compiler writers are free to make use of these "don't cares" to simplify their lives and/or speed the object code. I agree that it's difficult to convince users of this, but a baseball bat often suffices. Alternatively, pointing out the problem in the code, along with the relevant portion of the manual sometimes suffices. The behaviour of a program other machines or with other compilers or simply with the optimizer turned off is not sufficient reason to convict the optimizer. Preston Briggs
boundy@apollo.hp.com (David Boundy) (06/11/91)
In article <3421@crdos1.crd.ge.COM> davidsen@crdos1.crd.ge.com (bill davidsen) writes: >In article <5397@network.ucsd.edu> mbk@jacobi.ucsd.edu (Matt Kennel) writes: > >| Con: Floating-point is only an approximation to reality anyway, so the >| business of "correctness" is silly. THe answer will never be truly Right, >| so who gives a flying rat's ass about eeking out that last bit of 'precision'. >| The IEEE standard only serves to make programs "wrong" in the same artificial >| way (instilling a false sense of security about the results), but ends up >| making computers alot slower and more expensive, only to please some anal ret- >| entive nerds in some ivory-tower committee. (:-)) > > Take a course in numerical analysis. A small reduction in accuracy can >result in a drop from a few significant digits to none, depending on >what you're doing, and how you're doing it. What good is an answer if >you lose that "last bit of precision?" In analysis of some problems you >may not have more than a few significant bits for starters, and I would >rather not trust my bits to a computer designed with the marketing >department winning compromises between "right" and "fast" answers. >-- >bill davidsen (davidsen@crdos1.crd.GE.COM -or- uunet!crdgw1!crdos1!davidsen) > "Most of the VAX instructions are in microcode, > but halt and no-op are in hardware for efficiency" Kind of betwwen the lines of Bill's posting is the notion that "fully compiliant with the strictest possible reading of the IEEE standard is the definition of correct." This isn't too far from another common definition, "agreement with this other machine that I've been using for years and years is the definition of correct." Both of these are just wrong. "Correct" is what you'd get if we could do all the computations in infinite precision. Thus if you put perfect data in, you'd get perfect answers out, and the error bars on the output would only be error bars on the input. The IEEE standard guarantees "correctly rounded" results of each intermediate computation -- which is to say that it guarantees "minimally damaged" results for each intermediate computation. But I don't know of any body that's willing to pay for that many gates. In the real world where real dollars must be traded off against "close enough", for most computations, the 53 bits of an IEEE significand is "enough". But no one must ever make the mistake of calling these "correct". Unless care is taken to control rounding errors, Matt Kennel's charactarization that IEEE guarantees only the reproducability of garbage is pretty much the case. -boundless
boundy@apollo.hp.com (David Boundy) (06/13/91)
In article <9106080100.AA09901@ucbvax.Berkeley.EDU> James B. Shearer writes: >Perhaps I should have said it is undesirable for a program to behave >differently when compiled with and without optimization regardless >of whether this is technically permitted by the standard. (Actually >I think it would desirable for the standard to exactly specify the >behavior of any legal program but apparently it does not.) An example >may explain why I believe this. Consider an user with a 100000 line >fortran package. It works fine when compiled on other machines (with >and without optimization). It works fine when compiled with your >compiler without optimization. However it fails when compiled with >your compiler with optimization. I believe it will difficult to con- >vince this user that the bug is in his package and not in your compiler >whatever the technical validity of your position. Undesirable, yes. But also unavoidable in a market where performance is what sells machines. ( Correctness, as perceived by the individual programmer, is not allowed to be compromised in the name of performance. ) Langauge standards are contracts between language implementers and language users -- "If you program within these rules, we promise to give you a "correct" rendering into the target instruction set." There are responsibilities for both parties. Just as the programer is allowed to assume that a given set of features will be available and will have certain behaviors, so compiler writers are assured by the standard that no program will ever be presented that does thus-and-so; if a "program" *does* violate the standard, we're allowed to do any bloody thing we like with it ( though we try to find some friendly way of rejecting it... ) Thus, standards do "exactly specify the behavior of any legal program:" that's the defintion of a "legal program." Standards do not, however, try to specify the behavior of an arbitrary collection of characters presented to the compiler, in particular those that Mr. Turing assures us cannot be statically distinguished from legal programs. Note that the quoted paragraph breaks down long before we get to floating point. When the Apollo optimizer is turned all the way up, we take full advantage of freedoms in the standards like order of expression evaluation, that parameters to FORTRAN routines may not overlap each other, etc. Compilers with timid optimizers may accept an illegal program and do what the programmer expects; our speed-hungry-and-language-conforming customers would be upset if we bowed to the pressure to make more conservative assumptions, penalizing their programs, instead of pushing the non-conforming program back as "not a bug". But to narrow back down to floating point, it should be noted that I've never encountered a case of arithmetic that can be done at compile time whose operands are outside the set of normalized, denorms, and zero. In other words, all the compile time arithmetic we currently do is legal under the fortran standard. ( There's the issue that on our 68000 machines, we do arithmetic at compile time in 64 bits, but the same expression unoptimized will be evaluated at run time in 80 bits. Sigh. ) We have a plathora of switches that allow the programmer to tell our compilers how much of the standard they want to violate in their source programs, and how much of the standard they're willing to let the compiler violate in order to speed their programs up. The default, of course, is to force both parties to fully observe the terms of the ANSI contract. -boundless ------------------------------------ The opinions here are mine, not HP's -- but since my opinions find their expression in HP's software, the difference probably isn't practically distinguishable...