jbs@WATSON.IBM.COM (06/01/91)
Henry Spencer says: My understanding is that the funnier-looking features, in particular the infinities, NaNs, and signed zeros, mostly cost essentially nothing. I believe this statement is incorrect. Infs and nans compli- cate the design of the floating point unit. Even if there is ultimately no performance impact, this added design effort represents a very real cost. Perhaps some hardware designers could comment. Also as has already been pointed out there is a software cost due to such features as 0*x no longer being 0. In some cases this will have a performance impact in that compilers will generate inferior code. Henry Spencer says: Getting the rounding right reportedly takes work, but that one seems easy to support. As far as round to nearest goes, this does provide a real benefit. I am not totally convinced it's worth the cost but a case can made. The case for the round to even (when halfway) seems much more dubious. I believe this represents a substantial portion of the cost (comments?) and the additional benefits seem quite tenuous. I don't see any good reason for the other 3 modes. I said: >arithmetic may compute a totally bogus result with no error indication >(other than the setting of an overflow bit which nobody looks at). On >other systems the error is usually more obvious. Henry Spencer said: Only if the author goes to substantially more trouble than merely looking at an overflow bit. This really sounds to me like the old "people wrote better when they had to retype a whole page to fix an error, because they had to think about it more" fallacy. People who took care before will continue to take care; people who didn't before won't now. The difference is that both are somewhat less likely to get plausible-looking garbage now. Nobody is proposing that IEEE arithmetic is a replacement for care and understanding. All it does is improve the odds that a given amount of care and understanding will produce meaningful answers. I guess I was a bit obscure here. On IBM hex systems when floating overflow occurs the default behavior is for an error message to be printed which includes a traceback pointing to the offending instruction. Execution then continues (with the max floating number as a fixup) but will terminate if more than a small number (5?) of over- flows occur. I was under the impression that this was typical behavior for pre-IEEE systems. Is that not the case? I consider this behavior much friendlier than typical IEEE behavior which is to run to completion then output infs and nans (or worse plausible but wrong numbers) with no indication of where the error occurred. It seems to me that the existence of infs in IEEE is in- creasing the likelihood of obtaining plausible-looking garbage. I said quoting the paper: >>According to Kahan, extended precision has 64 bits of significand be- >>cause that was the widest precision across which carry propagation >>could be done on the Intel 8087 without increasing the cycle time... > > This may constitute justification to 8087 designers, I don't >see why it should be for the rest of us. Henry Spencer said: To me it sounded like "the precise number of bits was not thought to be crucial, so the limitations of a very important existing implementation were taken into consideration". If you don't like this, explain why the resulting number of bits is wrong or inadequate. If it's satisfactory, why do you care how it was arrived at? Standards decisions often turn on such seemingly trivial or ephemeral issues; what matters is the result. I was unaware the Intel 8087 could be considered a "very important existing implementation" as compared to the DEC format for example. Do you have some evidence for this? 79 bits is clearly an absurd length for a floating point format (the same is true of the 43 bit single extended format). As well being very awkward the extended lengths are too close in precision to the unextended lengths. The rationale behind the extended formats: providing extra precision for intermediate terms is ridiculous. If the added precision is available it should always be used otherwise it is of little value (see various comments on hex format). James B. Shearer
henry@zoo.toronto.edu (Henry Spencer) (06/02/91)
In article <9106010224.AA28532@ucbvax.Berkeley.EDU> jbs@WATSON.IBM.COM writes: >>My understanding is that the funnier-looking features, in particular the >>infinities, NaNs, and signed zeros, mostly cost essentially nothing. > > I believe this statement is incorrect. Infs and nans compli- >cate the design of the floating point unit... I've had some mail from people who've designed FPUs. They say that the infs make no difference to speak of, nor do the quiet NaNs. They do say that the signalling NaNs are a pain -- as is anything that can cause an exception -- on a pipelined machine with tight constraints on exception behavior. > I guess I was a bit obscure here. On IBM hex systems when >floating overflow occurs the default behavior is for an error message >to be printed which includes a traceback pointing to the offending >instruction. Execution then continues (with the max floating number as >a fixup) but will terminate if more than a small number (5?) of over- >flows occur. I was under the impression that this was typical behavior >for pre-IEEE systems. Is that not the case? Not the ones I've used. You could tell them to do this, but then you can tell an IEEE system to do it too. > It seems to me that the existence of infs in IEEE is in- >creasing the likelihood of obtaining plausible-looking garbage. I'm not sure how. Certainly forcing a result to inf is much better than forcing it to some random number (e.g. the largest representable one), because infs propagate through successive calculations, and generally don't get turned back into plausible-looking garbage. >>To me it sounded like "the precise number of bits was not thought to be >>crucial, so the limitations of a very important existing implementation >>were taken into consideration"... > I was unaware the Intel 8087 could be considered a "very >important existing implementation" as compared to the DEC format for >example. Do you have some evidence for this? Its implementation was well underway at the time and it plausibly was going to sell far more FPUs than any DEC design. Besides, you're missing the point slightly: *in the context of IEEE arithmetic*, the 8087 was an important implementation. The DEC implementations were irrelevant to the question of how long IEEE extended formats should be, since they were not IEEE at all, the campaign to base IEEE on DEC FP having failed (according to Goldberg, mostly because DEC format had no room for things like infs and NaNs). > 79 bits is clearly an absurd length for a floating point >format (the same is true of the 43 bit single extended format)... They are minima, not precise requirements, as witness the people who implement single extended as 64-bit double so they can claim conformance with the standard's very strong recommendation that there be an extended format for every "normal" format. >... The rationale behind the extended formats: >providing extra precision for intermediate terms is ridiculous. If the >added precision is available it should always be used otherwise it is >of little value... Having had some need to compute hypotenuses without overflow in code that otherwise wanted to use single precision for speed, I'm afraid I disagree. Whether it is "of little value" depends a lot on what you are doing. -- "We're thinking about upgrading from | Henry Spencer @ U of Toronto Zoology SunOS 4.1.1 to SunOS 3.5." | henry@zoo.toronto.edu utzoo!henry
dik@cwi.nl (Dik T. Winter) (06/02/91)
In article <9106010224.AA28532@ucbvax.Berkeley.EDU> jbs@WATSON.IBM.COM writes: > I guess I was a bit obscure here. On IBM hex systems when > floating overflow occurs the default behavior is for an error message > to be printed which includes a traceback pointing to the offending > instruction. Execution then continues (with the max floating number as > a fixup) but will terminate if more than a small number (5?) of over- > flows occur. I was under the impression that this was typical behavior > for pre-IEEE systems. Is that not the case? No, it was not. Amongst the system I used I have seen: 1. Replace overflown value by largest representable value without any notification. 2. Immediately trap on overflow, do not continue. 3. Replace overflown value by an infinity which traps when used. And more strategies have probably been implemented. > I consider this behavior much friendlier than typical IEEE > behavior which is to run to completion then output infs and nans (or > worse plausible but wrong numbers) with no indication of where the > error occurred. IEEE allows the user to specify that overflow traps. However, especially on the RS6000, this leads to e terrific decrease in performance. So apparently it is not cheap to trap. > It seems to me that the existence of infs in IEEE is in- > creasing the likelihood of obtaining plausible-looking garbage. The likelyhood is increased if you, the user, does not want to trap. But it also leads to some nice properties. I disagree with the default behaviour of IEEE, but I agree with the possibilities. > 79 bits is clearly an absurd length for a floating point > format (the same is true of the 43 bit single extended format). As well > being very awkward the extended lengths are too close in precision to > the unextended lengths. Perhaps, but note that the IEEE requirements for extended precision are minimal requrements, so you may provide more in your implementation. That all systems that do provide extended double precision do provide the exact requirements might tell something. Note also that if a system implements both IEEE single and IEEE double precision that double precision can be seen as extended single precision (and I indeed do not know a single system that implements true extended single precision). > The rationale behind the extended formats: > providing extra precision for intermediate terms is ridiculous. If the > added precision is available it should always be used otherwise it is > of little value (see various comments on hex format). Providing extra precision for intermediate terms is *not* ridiculous. Witness the maf instructions on the RS6000. Witness also the many old papers that use inner products calculated in extra precision. The problems with hex format are the following: 1. The wobbling precision must let you use the worst case for error estimates. 2. This makes single precision much more useles than binary single precision is already. Note also that Blaauw reportedly has said that were he to design FP again he would not use hex. -- dik t. winter, cwi, amsterdam, nederland dik@cwi.nl
cet1@cl.cam.ac.uk (C.E. Thompson) (06/03/91)
In article <9106010224.AA28532@ucbvax.Berkeley.EDU> jbs@WATSON.IBM.COM writes: > > I guess I was a bit obscure here. On IBM hex systems when >floating overflow occurs the default behavior is for an error message >to be printed which includes a traceback pointing to the offending >instruction. Execution then continues (with the max floating number as >a fixup) but will terminate if more than a small number (5?) of over- >flows occur. I was under the impression that this was typical behavior >for pre-IEEE systems. Is that not the case? Not particularly typical, as other postings have pointed out. More to the point, what you are describing is the actions of a piece of user-mode software: the IBM Fortran runtime system. There is nothing in the IBM/3x0 architecture that specifies these fixup/termination actions; it just generates a program interrupt. Indeed, doing the fixup involves grubbing around in the failing instruction to find which floating point register to modify. Chris Thompson JANET: cet1@uk.ac.cam.phx Internet: cet1%phx.cam.ac.uk@nsfnet-relay.ac.uk
colwell@pdx023.ichips (Robert Colwell) (06/03/91)
In article <9106010224.AA28532@ucbvax.Berkeley.EDU> jbs@WATSON.IBM.COM writes:
Path: ichips!iWarp.intel.com!ogicse!dali.cs.montana.edu!uakari.primate.wisc.edu!zaphod.mps.ohio-state.edu!cis.ohio-state.edu!ucbvax!WATSON.IBM.COM!jbs
Henry Spencer says:
My understanding is that the funnier-looking features, in particular the
infinities, NaNs, and signed zeros, mostly cost essentially nothing.
I believe this statement is incorrect. Infs and nans compli-
cate the design of the floating point unit. Even if there is ultimately
no performance impact, this added design effort represents a very real
cost. Perhaps some hardware designers could comment.
Ok, I designed the square root/divide hardware in the first generation of
Multiflow's machines. I'm mostly on Henry's side -- you do need to detect
inputs that can cause wrong answers (square root of negative, 0/0, etc.),
and it does take extra hardware and design time, but it isn't that hard,
and it doesn't affect the cycle time. Of the "nanocode" that controlled
this floating point unit, something like half dealt with getting
single/double divide/sqrt results, and the other half looked for
exceptions, rounding, and the like.
Getting the exceptions right was harder than getting the basic
functionality right, and was tougher to validate. We found four bugs in
this functional unit over the two years it was in use, and three were in
exceptions, not basic functionality. But it still isn't all that hard: I
spent three months coming up with the basic design from a standing start.
And since there was a standard, I could run industry standard validation
suites against it to build my own confidence in it. To me, that alone
proved the worth of the standard.
Getting the denorms right is not easy, and it does affect performance.
Especially in a statically scheduled machine, where the compiler was
expecting the result in cycle N, but your floating point unit just figured
out that it generated an underflow in cycle N-1 and must now do some
additional processing. This means that the floating unit must be able to
stall the machine, something it previously didn't need, and stall is
inevitably on the critical timing path.
Bob Colwell colwell@ichips.intel.com 503-696-4550
Intel Corp. JF1-19
5200 NE Elam Young Parkway
Hillsboro, Oregon 97124
jcallen@Encore.COM (Jerry Callen) (06/03/91)
In article <9106010224.AA28532@ucbvax.Berkeley.EDU> jbs@WATSON.IBM.COM writes: > I guess I was a bit obscure here. On IBM hex systems when >floating overflow occurs the default behavior is for an error message >to be printed which includes a traceback pointing to the offending >instruction. Execution then continues (with the max floating number as >a fixup) but will terminate if more than a small number (5?) of over- >flows occur. I was under the impression that this was typical behavior >for pre-IEEE systems. Is that not the case? > I consider this behavior much friendlier than typical IEEE >behavior which is to run to completion then output infs and nans (or >worse plausible but wrong numbers) with no indication of where the >error occurred. This may be obvious, but let's not confuse "compiler and runtime system behavior" with "floating point hardware behavior." Yes, Fortran on 360/370 systems behaves as you describe. In PL/1 it is possible to completely ignore exponent overflow if you wish and get meaningless results. Similarly, IEEE overflow behavior is under programmer control. If you WANT to ignore overflow and then just check to see if some error occurred at the end of the calculation you may do so; if you want to know immediately, you can do that, too. > James B. Shearer -- Jerry Callen jcallen@encore.com
jbs@watson.ibm.com (06/04/91)
Several people have challenged my characterization of typical behavior on fp overflow in pre-IEEE environments. In my defense I was not intending to claim the exact behavior of IBM fortran was typical, just the fact that x/(x*x+1) cannot silently produce garbage. Even so it appears that pre-IEEE behavior was more diverse than I had realized. I said: >... The rationale behind the extended formats: >providing extra precision for intermediate terms is ridiculous. If the >added precision is available it should always be used otherwise it is >of little value... Henry Spencer says: Having had some need to compute hypotenuses without overflow in code that otherwise wanted to use single precision for speed, I'm afraid I disagree. Whether it is "of little value" depends a lot on what you are doing. First a nitpick, you are using the extra exponent range not extra precision. In deciding whether a feature is of value one must look at alternatives. In your example the obvious alternative is to compute the hypotenuses using double precision. I do not see any reason for an intermediate precision. Btw how exactly would you expose an intermediate format to the user? Dik Winter said (responding to the same statement): Providing extra precision for intermediate terms is *not* ridiculous. Witness the maf instructions on the RS6000. Witness also the many old papers that use inner products calculated in extra precision. The problems with hex format are the following: 1. The wobbling precision must let you use the worst case for error estimates. 2. This makes single precision much more useles than binary single precision is already. Note also that Blaauw reportedly has said that were he to design FP again he would not use hex. I believe the maf instructions omit the intermediate round- ing step primarily to improve performance. I believe the papers to which you refer basically require a way to obtain the double precis- ion product of two single precision numbers. I consider this a some- what different issue. I agree that binary formats are superior to hex in that they provide 2 extra bits of precision. I don't agree (particually for 64-bit formats) that this 2 bits makes it significantly easier to avoid numer- ical problems. If I were to design a FP format I would probably use binary also. I am not objecting to the use of binary. James B. Shearer
henry@zoo.toronto.edu (Henry Spencer) (06/04/91)
In article <9106032144.AA15477@ucbvax.Berkeley.EDU> jbs@watson.ibm.com writes: >>Having had some need to compute hypotenuses without overflow in code that >>otherwise wanted to use single precision for speed, I'm afraid I disagree. > >... In deciding whether a feature is of value one must >look at alternatives. In your example the obvious alternative is to >compute the hypotenuses using double precision... Right, that is, using extended single precision. That's why the extended precisions are there, and why 754 very strongly recommends that an extended precision be available for any normal precision. What if I'd been using double precision already? -- "We're thinking about upgrading from | Henry Spencer @ U of Toronto Zoology SunOS 4.1.1 to SunOS 3.5." | henry@zoo.toronto.edu utzoo!henry
jbs@WATSON.IBM.COM (06/04/91)
Jerry Callen says: This may be obvious, but let's not confuse "compiler and runtime system behavior" with "floating point hardware behavior." Yes, Fortran on 360/370 systems behaves as you describe. In PL/1 it is possible to completely ignore exponent overflow if you wish and get meaningless results. Similarly, IEEE overflow behavior is under programmer control. If you WANT to ignore overflow and then just check to see if some error occurred at the end of the calculation you may do so; if you want to know immediately, you can do that, too. I believe most users will find it difficult to do these things. I, for example, do not know how do this in fortran on the IBM Risc System 6000. This is a problem for other users on other machines as the "How to detect NaN's" thread in comp.lang.fortran shows. Also it is desirable that any method for doing this should be portable across IEEE systems (since IEEE is supposed to promote portabil- ity) and have no significant performance impact when exceptions do not occur (since otherwise users are encouraged to run in an unsafe way and IEEE is supposed to promote safety). James B. Shearer
dik@cwi.nl (Dik T. Winter) (06/04/91)
In article <9106032144.AA15477@ucbvax.Berkeley.EDU> jbs@watson.ibm.com writes: > I believe the papers to > which you refer basically require a way to obtain the double precis- > ion product of two single precision numbers. I consider this a some- > what different issue. It would be a different issue if it where true. But it is not true, the papers I refer to are about the accumulation of an inner product in double (extra) precision. See for instance: J.H.Wilkinson, Rounding Errors in Numeric Processes, Notes on Applied Science No. 32, HMSO, London; Prentice Hall, New Jersey, 1963. (Yes fairly old; it was already known in that time!) > I agree that binary formats are superior to hex in that they > provide 2 extra bits of precision. A nit; they provide 3 more; nearly a digit. > I don't agree (particually for 64-bit > formats) that this 2 bits makes it significantly easier to avoid numer- > ical problems. Oh, but they are, see the paper I mentioned above. -- dik t. winter, cwi, amsterdam, nederland dik@cwi.nl
dricejb@drilex.UUCP (Craig Jackson drilex1) (06/04/91)
In article <9106010224.AA28532@ucbvax.Berkeley.EDU> jbs@WATSON.IBM.COM writes: > > Henry Spencer says: >My understanding is that the funnier-looking features, in particular the >infinities, NaNs, and signed zeros, mostly cost essentially nothing. > > I believe this statement is incorrect. Infs and nans compli- >cate the design of the floating point unit. Even if there is ultimately >no performance impact, this added design effort represents a very real >cost. Perhaps some hardware designers could comment. > Also as has already been pointed out there is a software cost >due to such features as 0*x no longer being 0. In some cases this will >have a performance impact in that compilers will generate inferior code. There is a significant software cost to not having NaNs--simulating them. The reality is that data are often missing, and systems need to deal with that fact. One can either store the information as to whether a given 'double' is a valid floating point number as part of its bit representation, or one can store it elsewhere. (Perhaps as a bitmap attached to each array in the application system.) Many older systems used some valid floating point number, such as '-999', as their missing-data specifier. Such a convention is highly portable, but runs the risk of valid data being interpreted as missing data. When NaNs are available, it becomes possible to use them to signify missing data. Generally, normal NaN-producing operations will only produce a few of the possible NaN bit patterns, and the remaining ones can be used to code application-specific information about missing data. On other systems there may be a bit that does not participate in arithmetic. There is such a bit in the Burroughs Large Systems architecture, a relic of the B5500 representation. This has successfully allowed us to represent missing data in the absence of NaNs. When the same system was ported to the IBM 370 architecture, I used non-zero unnormalized floating point numbers to represent these values, but with much less confidence. > I guess I was a bit obscure here. On IBM hex systems when >floating overflow occurs the default behavior is for an error message >to be printed which includes a traceback pointing to the offending >instruction. Execution then continues (with the max floating number as >a fixup) but will terminate if more than a small number (5?) of over- >flows occur. I was under the impression that this was typical behavior >for pre-IEEE systems. Is that not the case? As others have pointed out, this was the behavior of a particular run-time system on IBM 370s. By default, the hardware simply trapped on overflows. This behavior assumes that the user cares about the overflows. If the overflows were insignificant, then extra (possibly non-portable) coding had to be done to avoid the messages. > I consider this behavior much friendlier than typical IEEE >behavior which is to run to completion then output infs and nans (or >worse plausible but wrong numbers) with no indication of where the >error occurred. If running to completion, ignoring all creation of Infs and NaNs, was the IEEE default, then I would agree with you. I worked on a CDC 6400, where this was the behavior, and sometimes I really wished I could get a trap at the time of a divide-by-zero. However, I believe that the standard specifies that by default all traps are enabled. > James B. Shearer Also, we have software that does not care where in a calculation that an overflow or divide-by-zero occurred, just whether it occurred. Being able to turn off trapping greatly simplifies the coding of this application. -- Craig Jackson dricejb@drilex.dri.mgh.com {bbn,axiom,redsox,atexnet,ka3ovk}!drilex!{dricej,dricejb}
dehnert@baalbek.asd.sgi.com (Jim Dehnert) (06/05/91)
In <9106010224.AA28532@ucbvax.Berkeley.EDU> jbs@WATSON.IBM.COM writes: > Also as has already been pointed out there is a software cost >due to such features as 0*x no longer being 0. In some cases this will >have a performance impact in that compilers will generate inferior code. As a compiler writer who has worked with IEEE systems, I don't know why, at least in any area with an important impact on performance. The reason for the IEEE definition (0*Inf=NaN) is that zero may represent a result merely close to zero, and Inf really represents a very large number -- I can pick representatives of these sets which multiplied together give me a true result in the representable range (not zero or Inf). If I see a literal zero while compiling (the only case I can think of when the compiler does anything that might care about what 0*x equals), I feel quite free to assume that it is really zero, in which case 0*x is really zero even for very large x. So I will do the same thing I would have for non-IEEE. (This is the only case I'm aware of where IEEE defines 0*x != x.) > As far as round to nearest goes, this does provide a real >benefit. I am not totally convinced it's worth the cost but a case can >made. The case for the round to even (when halfway) seems much more >dubious. I believe this represents a substantial portion of the cost >(comments?) and the additional benefits seem quite tenuous. I don't >see any good reason for the other 3 modes. Round to +/- infinity give you the floor and ceiling functions, which come up frequently, for example in many transcendental function algorithms. > I said: >>arithmetic may compute a totally bogus result with no error indication >>(other than the setting of an overflow bit which nobody looks at). On >>other systems the error is usually more obvious. IEEE also provides for trapping bogus results if the appropriate flags are set -- then you don't need to look at the overflow bit. Of course, if you're using an IBM RS/6000, this goes many times slower, so perhaps you don't consider this an option... > I guess I was a bit obscure here. On IBM hex systems when >floating overflow occurs the default behavior is for an error message >to be printed which includes a traceback pointing to the offending >instruction. Execution then continues (with the max floating number as >a fixup) but will terminate if more than a small number (5?) of over- >flows occur. I was under the impression that this was typical behavior >for pre-IEEE systems. Is that not the case? It is the case. It's also perfectly achievable behavior for IEEE systems if exceptions are trapped. This has nothing to do with the IEEE standard. Instead, it's probably a result of default Unix software behavior, which is not to trap in most implementations. Of course, you don't have much choice on an RS/6000... > I consider this behavior much friendlier than typical IEEE >behavior which is to run to completion then output infs and nans (or >worse plausible but wrong numbers) with no indication of where the >error occurred. Much friendlier than non-trapping behavior on any FP system, that is... > It seems to me that the existence of infs in IEEE is in- >creasing the likelihood of obtaining plausible-looking garbage. No, not trapping increases that likelihood. Given that you're not trapping, the existence of Infs and NaNs makes it somewhat more likely that you'll get reasonable results (though personally I'd prefer the traps). > 79 bits is clearly an absurd length for a floating point >format (the same is true of the 43 bit single extended format). An interesting statement. Are you of the "only powers of two are real numbers" persuasion?
hrubin@pop.stat.purdue.edu (Herman Rubin) (06/05/91)
In article <1991Jun5.045615.8772@fido.wpd.sgi.com>, dehnert@baalbek.asd.sgi.com (Jim Dehnert) writes: > In <9106010224.AA28532@ucbvax.Berkeley.EDU> jbs@WATSON.IBM.COM writes: ...................... > > 79 bits is clearly an absurd length for a floating point > >format (the same is true of the 43 bit single extended format). > > An interesting statement. Are you of the "only powers of two are real > numbers" persuasion? Why should it even be the case that the exponent and significand are in the same word? Or that multiple precision, or complex, numbers are in contiguous words? Even these give problems from the standpoint of vector machines. If we had the exponent and significand in different units, the separation of integer arithmetic from floating arithmetic might neve have occurred. Some machines do not even have separate units; they must have unnormalized arithmetic. -- Herman Rubin, Dept. of Statistics, Purdue Univ., West Lafayette IN47907-1399 Phone: (317)494-6054 hrubin@l.cc.purdue.edu (Internet, bitnet) {purdue,pur-ee}!l.cc!hrubin(UUCP)
dce@linus.mitre.org (Dale Earl) (06/05/91)
In article <9106010224.AA28532@ucbvax.Berkeley.EDU> jbs@WATSON.IBM.COM writes: > > Henry Spencer says: >My understanding is that the funnier-looking features, in particular the >infinities, NaNs, and signed zeros, mostly cost essentially nothing. > > I believe this statement is incorrect. Infs and nans compli- >cate the design of the floating point unit. Even if there is ultimately >no performance impact, this added design effort represents a very real >cost. Perhaps some hardware designers could comment. With my previous employer I helped design several floating point units and in my experience the most difficult part of the IEEE specification to handle were the denormalized numbers. The infinities, NANs, signed zeros, and rounding modes do not make the hardware substantially more complicated or significantly decrease performance. As has been pointed out in other related postings these features can be valuable for many numerical algorithms. The value vs. cost for denomalized numbers though is not favorable IMHO. I feel a representation for an infinitely small non-zero value would be useful and much easier to implement in hardware. But if you have to be able to say that your device is 100% IEEE floating point compatible you get stuck implementing the entire standard no matter what the hardware tradeoffs. A related method we also employed on another device that did not have to be 100% compliant may be of interest. This device performed both single precision integer and floating point operations. Therefore we had a full 32 bit adder and multiplier available. For this device we made the on board register file 40 bits wide, 8 bits for exponent and 32 bits for sign and mantissa. Without going into all the details, the integer operands used the 32 bit field and were processed in a normal manner (exponent field was passed unchanged). Floating point was performed using the full 32 bit mantissa with rounding only on output from the chip. This carried an extra 8 bits of precision from intermediate operations which remained within the chip. The chip also could extract the exponent field into an integer so we could emulate multiple precision in software if desired. Not exactly IEEE standard but probably would work well for many real applications where better precision is more important than rigorous numerical error bounding. -- Dale C. Earl dce@mitre.org These are my opinions, only my opinions, and nothing but my opinions! DCE
jbs@WATSON.IBM.COM (06/06/91)
I said: > I believe the papers to > which you refer basically require a way to obtain the double precis- > ion product of two single precision numbers. I consider this a some- > what different issue. Dik Winter said: It would be a different issue if it where true. But it is not true, the papers I refer to are about the accumulation of an inner product in double (extra) precision. See for instance: J.H.Wilkinson, Rounding Errors in Numeric Processes, Notes on Applied Science No. 32, HMSO, London; Prentice Hall, New Jersey, 1963. (Yes fairly old; it was already known in that time!) I have checked this reference (actually "Rounding Errors in Algebraic Processes"). I believe it is consistent with what I said. For example p. 10 "Nearly all digital computers produce in the first instance, the 2t-digit product of two t-digit numbers", p. 14 "The subroutine accepts single-precision factors and gives a double preci- sion product", p. 32 "If we are working on a computer on which the inner products can be accumulated, then there may be stages in the work when we obtain results which are correct to 2t binary figures". Throughout this section it is clear what is being discussed is a way to obtain inner products to twice working precision, not slightly more than working precision as in the IEEE extended formats. I said: > I agree that binary formats are superior to hex in that they > provide 2 extra bits of precision. Dik Winter said: A nit; they provide 3 more; nearly a digit. How do you get 3? Suppose the leading hex digit is 1, then we lose 3 bits from the leading zeros, 1 bit from the hidden 1, but gain 2 bits from the smaller exponent field for a net loss of 2 bits. I said > I don't agree (particually for 64-bit > formats) that this 2 bits makes it significantly easier to avoid numer- > ical problems. Dik Winter said: Oh, but they are, see the paper I mentioned above. I found nothing in the above reference to justify this state- ment. James B. Shearer
jbs@WATSON.IBM.COM (06/06/91)
I said: > Also as has already been pointed out there is a software cost >due to such features as 0*x no longer being 0. In some cases this will >have a performance impact in that compilers will generate inferior code. Jim Dehnert said: As a compiler writer who has worked with IEEE systems, I don't know why, at least in any area with an important impact on performance. The reason for the IEEE definition (0*Inf=NaN) is that zero may represent a result merely close to zero, and Inf really represents a very large number -- I can pick representatives of these sets which multiplied together give me a true result in the representable range (not zero or Inf). If I see a literal zero while compiling (the only case I can think of when the compiler does anything that might care about what 0*x equals), I feel quite free to assume that it is really zero, in which case 0*x is really zero even for very large x. So I will do the same thing I would have for non-IEEE. (This is the only case I'm aware of where IEEE defines 0*x != x.) Some people write: IF(0.D0*X.NE.0.D0)THEN to detect infs or NaNs (see recent posts to comp.land.fortran). This works on many systems. If you replace 0*x with 0 then it will not work on your system and your system is broken. Jim Dehnert also said Round to +/- infinity give you the floor and ceiling functions, which come up frequently, for example in many transcendental function algorithms. This appears to apply just to convert to integer instructions. Different modes could be offered for these instructions alone. In any case I don't believe they are useful for transcendental function algor- ithms. In fact the different rounding modes are a problem because the functions must work for all modes. Jim Dehnert also said (referring to trapping floating overflow): It is the case. It's also perfectly achievable behavior for IEEE systems if exceptions are trapped. This has nothing to do with the IEEE standard. Instead, it's probably a result of default Unix software behavior, which is not to trap in most implementations. Possibly I should be blaming Unix. I tend to associate the two. Note in Goldberg's paper he suggests rewriting x/(1+x*x) in a way which assumes trapping is not occurring. I said: > 79 bits is clearly an absurd length for a floating point >format (the same is true of the 43 bit single extended format). Jim Dehnert also said: An interesting statement. Are you of the "only powers of two are real numbers" persuasion? I believe floating format sizes should be compatible with the basic organization of the machine. IE if your machine is based on 8-bit bytes the length best be a multiple of 8. I believe 32 or 64 bits are typical widths of data paths in current machines. Therefore moving 79 bit things around will clearly be wasteful. Consider a machine with 64 bit floating registers and a 64 bit data path between the floating unit and cache. I see no efficient way to include a 79 bit format. An 128 bit format on the other hand is easily accomodated. Herman Rubin asked: Why should it even be the case that the exponent and significand are in the same word? Because if they aren't every load or store will require 2 mem- ory references instead of 1 halving the speed in many cases. James B. Shearer
firth@sei.cmu.edu (Robert Firth) (06/06/91)
In article <9106052113.AA14439@ucbvax.Berkeley.EDU> jbs@WATSON.IBM.COM writes: Jim Dehnert said: > If I see a literal zero while compiling (the only >case I can think of when the compiler does anything that might care >about what 0*x equals), I feel quite free to assume that it is really >zero, in which case 0*x is really zero even for very large x. > Some people write: IF(0.D0*X.NE.0.D0)THEN >to detect infs or NaNs (see recent posts to comp.land.fortran). This >works on many systems. If you replace 0*x with 0 then it will not work >on your system and your system is broken. Well, as another compiler writer, I did what I always do when faced with a conumdrum such as this: refer to the standard. The relevant section in this case is [ANSI X3.9-1978 sect 6.6.4], which says, very simply: Evaluation of Arithmetic Expressions ... Once the interpretation has been established in accordance with these rules, the processor may evaluate any mathematically equivalent expression, provided that the integrity of parentheses is not violated. Two arithmetic expressions are mathematiccally equivalent if, for all possible values of their primaries, their mathematical values are equal. However, mathematically equivalent arithmetic expressions may produce different computational results. In otherwords: yes, Jim Dehnert is fully entitled to replace 0.0*X with the mathematically equivalent expression 0.0; he is given explicit licence to do so by the reference document, and anyone who writes rubbish like IF (0.0*X .NE. 0.0) would be well advised to take the trouble to learn the language in which he is attempting to write code.
henry@zoo.toronto.edu (Henry Spencer) (06/06/91)
In article <26594@as0c.sei.cmu.edu> firth@sei.cmu.edu (Robert Firth) writes: >In otherwords: yes, Jim Dehnert is fully entitled to replace 0.0*X >with the mathematically equivalent expression 0.0... Uh, sorry, if the number system you are using includes infinities, those two expressions are *not* mathematically equivalent. I don't remember whether X3.9 pins down the number system in sufficient detail to preclude this, although I doubt it, since infinities have been around in some floating-point boxes for a long time. -- "We're thinking about upgrading from | Henry Spencer @ U of Toronto Zoology SunOS 4.1.1 to SunOS 3.5." | henry@zoo.toronto.edu utzoo!henry
seiler@vlsisj.uucp (%) (06/07/91)
I am not a numerical expert, but I was onthe mailing list of the IEEE floating point committee when the standard was created. At that time, I was a chemistry grad student learning that Bevington's book did not prepare a physical scientist for all the perverted things that computer hardware did in the name of floating point. In article <9106052113.AA14439@ucbvax.Berkeley.EDU>, jbs@WATSON.IBM.COM writes: |> |> Jim Dehnert also said |> Round to +/- infinity give you the floor and ceiling functions, which |> come up frequently, for example in many transcendental function |> algorithms. |> |> This appears to apply just to convert to integer instructions. |> Different modes could be offered for these instructions alone. In any |> case I don't believe they are useful for transcendental function algor- |> ithms. In fact the different rounding modes are a problem because the |> functions must work for all modes. The main justification for having all the rounding modes was the +/- infinity modes made creating an interval arith. package much easier. As I recall, without them an interval package could run 300x's slower than straight floating point. With them it was ca. 3x's slower. Besides, these rounding modes were cheap. |> |> Possibly I should be blaming Unix. I tend to associate the |> two. What does floating point hardware have to do with an operating system? Unix runs on mainframes and VAX's. IEEE floating point is used on PC's (IBM clones, Mac's, Amiga's) as well as work stations. |> I believe floating format sizes should be compatible with the |> basic organization of the machine. IE if your machine is based on |> 8-bit bytes the length best be a multiple of 8. I believe 32 or 64 |> bits are typical widths of data paths in current machines. Therefore |> moving 79 bit things around will clearly be wasteful. Consider a |> machine with 64 bit floating registers and a 64 bit data path between |> the floating unit and cache. I see no efficient way to include a 79 |> bit format. An 128 bit format on the other hand is easily accomodated. |> James B. Shearer The floating point chips from Intel and Motorola have 80 bit floating point registers. At the time, the standard developers decided that the main formats should be a power of two in size so array addressing is fast, it is useful to have a format with a wider mantissa to have guard bits, and a hardware ALU finds it convenient to have the mantissa to be a power of two in size. Putting the last two points together gives a lower limit of 79 bits. That is a wierd number so Intel and Moto. rounded up to 80. Actually, Moto. stores an extended to memory in 96 bits so it is long word aligned. If your compiler has good register allocation, there will be little extended loads and stores. I expect you know this already, but I am trying to understand what you saying above. Supporting any format greater than the FP ALU is going to be slow. What IS so bad about extended? My experience in pre-IEEE FP days wasn't as nice as yours. CDC's that I used did not trap at all on overflow. Instead the exponent wrapped around. Not only did systems not convert numbers like 12.0 accurately, the error was different for program literals than for numbers read by I/O. ( For some reason, this seemed un-natural to me them. After all, it can be exactly expressed). Generally, I found programs than worked on VAX's had problems on IBM 360's and blew up badly on Cybers unless I was very paranoid. Nowdays, we have occasionally found routines that work on IEE workstations (Apollo, Sun, HP) have had problems on VAX, which is my favorite of the pre IEEE formats. Yours in number crunching, Bruce Seiler seiler@compass-da.com
jbs@WATSON.IBM.COM (06/07/91)
Richard Firth quotes the fortran standard: Evaluation of Arithmetic Expressions ... Once the interpretation has been established in accordance with these rules, the processor may evaluate any mathematically equivalent expression, provided that the integrity of parentheses is not violated. Two arithmetic expressions are mathematiccally equivalent if, for all possible values of their primaries, their mathematical values are equal. However, mathematically equivalent arithmetic expressions may produce different computational results. to conclude: In otherwords: yes, Jim Dehnert is fully entitled to replace 0.0*X with the mathematically equivalent expression 0.0; Some comments: 1. The question is not what fortran requires but what IEEE imple- mented in fortran requires. One of the problems with IEEE's infs and NaNs is that they do not fit well with fortran. 2. Appeals to a standard which predates IEEE are not totally con- vincing. What does the Fortran 90 standard say? 3. In any case my reading of the standard is opposed to yours. Infinity is a possible value of x, 0*infinity does not have math- ematical value 0, therefore 0*x is not mathematically equivalent to 0 in IEEE. Do you replace 0*x with 0 when compiling without optimization? I think that if a (legal) program behaves differently when compiled with and without optimization that then the compiler is broken. Perhaps you disagree. How would you handle 0-x? Goldberg in his paper (p.33) spec- ifically forbids replacing it with -x although the fortran standard would appear to permit it since mathematically +0=-0. Richard Firth also said: ... anyone who writes rubbish like IF (0.0*X .NE. 0.0) would be well advised to take the trouble to learn the language in which he is attempting to write code. So how would you code this (detecting whether x inf or NaN)? James B. Shearer
firth@sei.cmu.edu (Robert Firth) (06/07/91)
In article <9106070109.AA02137@ucbvax.Berkeley.EDU> jbs@WATSON.IBM.COM writes: A couple of quick responses: > 1. The question is not what fortran requires but what IEEE imple- >mented in fortran requires. One of the problems with IEEE's infs and >NaNs is that they do not fit well with fortran. Yes, that's the problem. The Fortran standard describes the implementation of arithmetic expressions in terms of the "mathematical" rules, that is, the mathematical properties of real numbers. It is the business of the implementation to provide a reasonable approximation that, as far as possible, obeys those rules. If it fails to do so, i's not a legitimate "fortran machine" and shouldn't claim to support ANSI F-77. But a language implementation must follow the standard. That is the whole point of standards, after all, that they circumscribe the permissible implementations. You can't correct bogus hardware by writing a bogus compiler. > 3. In any case my reading of the standard is opposed to yours. >Infinity is a possible value of x, 0*infinity does not have math- >ematical value 0, therefore 0*x is not mathematically equivalent to 0 >in IEEE. I would rather say that IEEE introduces bit patterns that are not valid representations of real numbers, and then prescribes largely arbitrary and improper semantics for the behaviour of those bit patterns. The Fortran standard refers to the real numbers, not to bit patterns. > Do you replace 0*x with 0 when compiling without optimization? >I think that if a (legal) program behaves differently when compiled with >and without optimization that then the compiler is broken. Perhaps you >disagree. Yes, I disagree. The issue is the relevance of the difference. Is a compiler broken because an optimised program runs faster? That's a difference in behaviour, but you presumably think it an acceptable one. Likewise, if a program transformation is permitted by the standard, a compiler is entitled to apply that transformation; it is the responsibility of the programmer to be aware which transformations are legal and not assume any of them won't happen. > IF (0.0*X .NE. 0.0) > So how would you code this (detecting whether x inf or NaN)? At the correct level of abstraction. At the Fortran level, the abstraction is that of the real numbers - as is made clear in the standard. At that level, NaNs don't exist, so you can't look for them. If you are asking about the bit-level representation of numbers, it is necessary to go to the lower level of abstraction, the machine level. I would write an Assembly code subroutine, call it something like LOGICAL FUNCTION ISANAN(X) and comment it very, very carefully. Perhaps like this C On some machines, the representation of the Fortran C real numbers includes bit patterns that have no C valid interpretation as real numbers. The IEEE C representation is one such, for example. These C bit patterns are called 'NaN', "not-a-number". C C Some simple algorithms used in numerical programming C will fail, giving bizarre and useless results, if any C of the supplied REAL input parameters in fact holds a C bit pattern that is NaN. C C To guard against this, the programmer must test whether C a real variable holds a NaN. The following function C implements that test. Note that the test requires EXACT C KNOWLEDGE of the target representation of real numbers C and the semantics of target machine operations. The C implementation is therefore not portable. A ever, that's just my opinion; feel free to disagree or criticise.
sjc@borland.com (Steve Correll) (06/08/91)
In article <9106070109.AA02137@ucbvax.Berkeley.EDU> jbs@WATSON.IBM.COM writes: > 2. Appeals to a standard which predates IEEE are not totally con- >vincing. What does the Fortran 90 standard say? The Fortran 90 standard (as of draft S8.115, at least) says the same things the Fortran 77 standard does with regard to floating point. Fortran existed long before the birth of IEEE floating point (and will probably exist long after its death :-)) and the X3J3 committee, being sensitive to the diversity of floating point implementations, was reluctant to discriminate among them. A previous posting quoted the Fortran standard somewhat selectively (ANSI X3.9-1978 sect 6.6.4): > ... Once the interpretation has been established in > accordance with these rules, the processor may > evaluate any mathematically equivalent expression, > provided that the integrity of parentheses is not > violated. > > Two arithmetic expressions are mathematically > equivalent if, for all possible values of their > primaries, their mathematical values are equal. > However, mathematically equivalent arithmetic > expressions may produce different computational > results. The missing rule is (ANSI X3.9-1978 sect 6.6): "Any numeric operation whose result is not mathematically defined is prohibited in the execution of an executable program. Examples are dividing by zero and raising a zero-valued primary to a zero-valued or negative-valued power." Who knows whether IEEE infinity is a "mathematically defined result"? If you claim that it is, then a compiler which collapses 0*x => x violates section 6.6.4. If you claim that it is not, then a program which even generates IEEE infinity or NaN (not to mention a program which expects them to propagate through computations in a well-defined fashion) violates section 6.6. The IEEE fp committee chose not to deal with existing language standards. This hasn't prevented lots of people from trying to have their cake and eat it too, by saying that IEEE floating point should have no impact on compiler optimization, but the programmer should compute with infinity and NaN anyway, and it's the programmer's fault if s/he doesn't know that some IEEE-legal-but-Fortran-illegal features will burn you and others will not, depending on the implementation in question. In practice, of course, programmers will attempt to do exactly that, and lots of them will get burned. You can do useful work with a compiler which doesn't de-optimize its code to cater to the IEEE standard, but you do need to be careful about it.
dab@ubitrex.mb.ca (Danny Boulet) (06/08/91)
In article <26665@as0c.sei.cmu.edu> firth@sei.cmu.edu (Robert Firth) writes: >In article <9106070109.AA02137@ucbvax.Berkeley.EDU> jbs@WATSON.IBM.COM writes: > . . . >Yes, I disagree. The issue is the relevance of the difference. Is a >compiler broken because an optimised program runs faster? That's a >difference in behaviour, but you presumably think it an acceptable one. >Likewise, if a program transformation is permitted by the standard, a >compiler is entitled to apply that transformation; it is the responsibility >of the programmer to be aware which transformations are legal and not >assume any of them won't happen. > >> IF (0.0*X .NE. 0.0) >> So how would you code this (detecting whether x inf or NaN)? > >At the correct level of abstraction. At the Fortran level, the >abstraction is that of the real numbers - as is made clear in the >standard. At that level, NaNs don't exist, so you can't look for >them. > >If you are asking about the bit-level representation of numbers, it >is necessary to go to the lower level of abstraction, the machine >level. I would write an Assembly code subroutine, call it something >like > > LOGICAL FUNCTION ISANAN(X) > >and comment it very, very carefully. Perhaps like this > > C On some machines, the representation of the Fortran > C real numbers includes bit patterns that have no > C valid interpretation as real numbers. The IEEE > C representation is one such, for example. These > C bit patterns are called 'NaN', "not-a-number". > C > C Some simple algorithms used in numerical programming > C will fail, giving bizarre and useless results, if any > C of the supplied REAL input parameters in fact holds a > C bit pattern that is NaN. > C > C To guard against this, the programmer must test whether > C a real variable holds a NaN. The following function > C implements that test. Note that the test requires EXACT > C KNOWLEDGE of the target representation of real numbers > C and the semantics of target machine operations. The > C implementation is therefore not portable. Assuming that your Fortran compiler isn't too clever, the following detects a NaN: IF ( X .NE. X ) THEN PRINT *, 'X is a NaN' ENDIF This takes advantage of the rule that says that NaNs are unordered. Unfortunately, a clever Fortran compiler is strictly speaking (I think) free to notice that "X .NE. X" is always false 'mathematically' and apply the obvious optimization. If you're willing to live with the risk that such an optimization might be applied, you could probably come up with an equivalent trick for Infs. I'm not totally sure but this would probably work (there is almost certainly an easier way but I don't have the IEEE spec handy): IF ( X .NE. X ) THEN PRINT *, 'X is a NaN' ELSE Z = { some expression that always yields a +Inf } Y1 = Z + X { yields NaN if X is -Inf } Y2 = Z - X { yields NaN if X is +Inf } IF ( Y1 .NE. Y1 .OR. Y2 .NE. Y2 ) THEN PRINT *,'X is an Inf' ENDIF ENDIF All bets are off if the target machine isn't IEEE or if the compiler is too smart for your own good! Isn't writing portable code fun!!
jbs@WATSON.IBM.COM (06/08/91)
Bruce Seiler says: The main justification for having all the rounding modes was the +/- infinity modes made creating an interval arith. package much easier. As I recall, without them an interval package could run 300x's slower than straight floating point. With them it was ca. 3x's slower. Besides, these rounding modes were cheap. I don't believe interval arithmetic is used enough to justify any hardware support. Providing quad precision (much more useful in my opinion) would also provide some support of interval arithmetic. In any case I believe your estimate of 3x slower than straight floating point is unrealistic. Does anyone have any examples of interval arith- metic packages? What is their speed compared to straight floating point? The rounding modes may appear cheap but they have a large hidden cost. Every intrinsic function subroutine for example must worry about handling all rounding modes. I said: |> I believe floating format sizes should be compatible with the |> basic organization of the machine. IE if your machine is based on |> 8-bit bytes the length best be a multiple of 8. I believe 32 or 64 |> bits are typical widths of data paths in current machines. Therefore |> moving 79 bit things around will clearly be wasteful. Consider a |> machine with 64 bit floating registers and a 64 bit data path between |> the floating unit and cache. I see no efficient way to include a 79 |> bit format. An 128 bit format on the other hand is easily accomodated. Bruce Seiler said: The floating point chips from Intel and Motorola have 80 bit floating point registers. At the time, the standard developers decided that the main formats should be a power of two in size so array addressing is fast, it is useful to have a format with a wider mantissa to have guard bits, and a hardware ALU finds it convenient to have the mantissa to be a power of two in size. Putting the last two points together gives a lower limit of 79 bits. That is a wierd number so Intel and Moto. rounded up to 80. Actually, Moto. stores an extended to memory in 96 bits so it is long word aligned. If your compiler has good register allocation, there will be little extended loads and stores. I expect you know this already, but I am trying to understand what you saying above. Supporting any format greater than the FP ALU is going to be slow. What IS so bad about extended? According to "Computer Architecture a Quantitative Approach" (Hennessy and Patterson, p.A-53, p. E-2) none of the MIPS R3010, Weitek 3364, TI 8847, Intel i860, Sparc or Motorola M88000 architectures implement IEEE double ex- tended. This represents a lot of independent support for my position that the double extended format is not cost effective. Apparently these vendors do not agree that it is convenient for the fraction to be a power of 2 in length. If your chip has 80 bit registers I fail to see how you can avoid performing 80 bit loads and stores while saving and restoring these registers. My objection to the IEEE double extended format is that it should be 128 bits wide and called quad precision. Also I do not agree that it is useful to have a format with a few extra (guard) bits in the fraction. James B. Shearer
chased@rbbb.Eng.Sun.COM (David Chase) (06/08/91)
jbs@WATSON.IBM.COM writes: >If your chip has 80 bit registers I fail to see how you can avoid >performing 80 bit loads and stores while saving and restoring these registers. Other people have managed in the past. Perhaps you are insufficiently imaginative. The only problem comes when "machine epsilon" computed in registers does not match the "machine epsilon" stored in memory. Otherwise, it is a big help when accumulating into a register (as in inner product, for example) since the extra bits are most helpful in guarding against lost precision. I'm not certain I remember this correctly, but I recall that friends at Rice had this "problem" with a compiler that they wrote for a machine called the "PC/RT". Note that the same problem ("excess precision") occurs on the RS/6000 in certain situations. David Chase Sun
dik@cwi.nl (Dik T. Winter) (06/08/91)
In article <9106052113.AA14439@ucbvax.Berkeley.EDU> jbs@WATSON.IBM.COM writes: > I said: > > Also as has already been pointed out there is a software cost > >due to such features as 0*x no longer being 0. In some cases this will > >have a performance impact in that compilers will generate inferior code. I still have a question here: do you know realistic cases where inferior code will be generated? I do not think that 'a = 0.0 * b' is realistic. > Jim Dehnert said: > and Inf really represents a > very large number Or true infinite! > If I see a literal zero while compiling (the only > case I can think of when the compiler does anything that might care > about what 0*x equals), I feel quite free to assume that it is really > zero, in which case 0*x is really zero even for very large x. So I > will do the same thing I would have for non-IEEE. (This is the only > case I'm aware of where IEEE defines 0*x != x.) In that case your compiler is not IEEE conformant. To set it to 0.0 even for literal zero is wrong because the infinity might have come from a division by zero (and in that case it is a true infinity). > (about rounding modes:) > This appears to apply just to convert to integer instructions. > Different modes could be offered for these instructions alone. In any > case I don't believe they are useful for transcendental function algor- > ithms. In fact the different rounding modes are a problem because the > functions must work for all modes. I agree that floor and ceiling are not the reasons for the different rounding modes. The reason is simply interval arithmetic. How would you get reasonable performance with interval arithmetic if only one rounding mode is available? (See the papers from Kulisch et. at. from Karlstuhe University where they do not only require multiple rounding modes but also an inner product with only a single rounding to provide algorithms that give results with proven error bounds that are very small. Standard numerical analysis only provides error bounds that are extremely pessimistic estimates in most cases.) > (About trapping overflow.) > Possibly I should be blaming Unix. I tend to associate the > two. (Associating Unix and IEEE arithmetic.) An interesting remark when we know that the first IEEE conformant implementation was the 8087 (not completely, conforming to a draft). I would not associate the 8087 with Unix; other operating systems come to the mind. You should however not blame Unix (or any OS), you should blame the compiler (and its run-time system). Unix does not know anything about IEEE because it also runs on quite a few non-IEEE machines (PDP-11). The compiler ought to provide (through library functions) access to the quirks of IEEE. > > 79 bits is clearly an absurd length for a floating point > >format (the same is true of the 43 bit single extended format). I am interested how you arrive at 79 bits. I count 80: sign (1 bit), exponent (15 bits) and mantissa (64 bits) for double extended. Note: there is no implicit bit in the standard extended formats! > Herman Rubin asked: > Why should it even be the case that the exponent and significand are in > the same word? > > Because if they aren't every load or store will require 2 mem- > ory references instead of 1 halving the speed in many cases. Oh well, von Neumann advocated the use of scaled fixed point numbers. He did not even consider floating point hardware. But I agree here, when we have fp hardware it makes sense to have exponent and mantissa in the same unit of information. -- dik t. winter, cwi, amsterdam, nederland dik@cwi.nl
pmontgom@euphemia.math.ucla.edu (Peter Montgomery) (06/09/91)
In article <3652@charon.cwi.nl> dik@cwi.nl (Dik T. Winter) writes: >I still have a question here: do you know realistic cases where inferior >code will be generated? I do not think that 'a = 0.0 * b' is realistic. The literal zero will probably not appear, but it will not be unusual to have const double coef_friction = 0.0; /* Ignore friction */ early in a program and coef_friction multiplied by other quantities later on. Macro expansion (or inline expansion of procedures) is another possibility, if one argument is 0.0. A third possibility is loop unrolling: do i = 0, 2 deriv(i) = REAL(i)*poly(i) end do will expand into deriv(0) = 0.0*poly(0), deriv(1) = 1.0*poly(1), deriv(2) = 2.0*poly(2). The benefit of this unrolling really comes when it simplifies this to deriv(0) = 0, deriv(1) = poly(1), deriv(2) = poly(2) + poly(2). -- Peter L. Montgomery pmontgom@MATH.UCLA.EDU Department of Mathematics, UCLA, Los Angeles, CA 90024-1555 If I spent as much time on my dissertation as I do reading news, I'd graduate.
dik@cwi.nl (Dik T. Winter) (06/09/91)
Aren't we getting a bit confused? There are (in this case) two standards. The first one is the language standard, the second is the IEEE standard for floating point arithmetic. Now we find: In article <9106070109.AA02137@ucbvax.Berkeley.EDU> jbs@WATSON.IBM.COM writes: > Do you replace 0*x with 0 when compiling without optimization? > I think that if a (legal) program behaves differently when compiled with > and without optimization that then the compiler is broken. Perhaps you > disagree. It is perfectly allowed for a program to behave differently with optimization if the program goes beyond what the language standard prescribes. However, in this case the compiler can not claim to be IEEE conformant (and should say so) even if the box it runs on is IEEE conformant. In article <26665@as0c.sei.cmu.edu> firth@sei.cmu.edu (Robert Firth) writes: > > IF (0.0*X .NE. 0.0) > > So how would you code this (detecting whether x inf or NaN)? > At the correct level of abstraction. At the Fortran level, the > abstraction is that of the real numbers - as is made clear in the > standard. At that level, NaNs don't exist, so you can't look for > them. > Unless the compiler claims to fully support the arithmetic of the box it runs on. In that case it should be IEEE conformant, and hence not optimize such expressions. (write a routine:) > LOGICAL FUNCTION ISANAN(X) Certainly if you want to be portable and have your program run with 1. compilers that are not IEEE conformant, and/or 2. boxes that are not IEEE conformant. -- dik t. winter, cwi, amsterdam, nederland dik@cwi.nl
dik@cwi.nl (Dik T. Winter) (06/09/91)
In article <9106072346.AA08023@ucbvax.Berkeley.EDU> jbs@WATSON.IBM.COM writes: > I don't believe interval arithmetic is used enough to justify > any hardware support. Well, IBM thought it important enough to provide support in some models (43xx). Acrith is the keyword. > Providing quad precision (much more useful in > my opinion) would also provide some support of interval arithmetic. Nope, it would not provide support at all. You can not do interval arithmetic with any extended precision without extensive software support. On the other hand, proper rounding modes are sufficient. > In > any case I believe your estimate of 3x slower than straight floating > point is unrealistic. How come unrealistic? Example, interval add a to b giving c: set round to - inf; c.low = a.low + b.low; set round to + inf; c.upp = a.upp + b.upp; why would this be more than 3x slower than straight floating point? Now consider the same in software (considering reasonable arithmetic): c.low = a.low + b.low; if(a.low < b.low) { if(c.low - b.low > a.low) c.low = nextbefore(c.low); } else { if(c.low - a.low > b.low) c.low = nextbefore(c.low); } c.upp = a.upp + b.upp; if(a.upp < b.upp) { if(c.upp - b.upp < a.upp) c.upp = nextafter(c.upp); } else { if(c.upp - a.upp < b.upp) c.upp = nextafter(c.upp); } and consider the additional effort to implement nextbefore and nextafter. And do not try this on a hex machine; it will not work. So how would this benefit from some form of extended precision arithmetic? > The rounding modes may appear cheap but they have a large hidden cost. > Every intrinsic function subroutine for example must worry about handling > all rounding modes. Only if they are expecting overflow or underflow in some particular situations. In that case they better worry anyhow to prevent uncalled for overflows and underflows. The design of intrisic functions with the correct rounding modes is of course a different art. I think the Karlsruhe people have gone a long way along this path. > According to "Computer Architecture a Quantitative Approach" (Hennessy > and Patterson, p.A-53, p. E-2) none of the MIPS R3010, Weitek 3364, TI 8847, > Intel i860, Sparc or Motorola M88000 architectures implement IEEE double ex- > tended. ... > My objection to the IEEE double extended format is that it should be 128 bits > wide and called quad precision. In that case none of the above machines would have implemented quad precision. However, Motorola and Intel provide double extended, and the HP-PA architecture provides quad. But on the current implementations quad on HP-PA is in software, so what is the advantage? -- dik t. winter, cwi, amsterdam, nederland dik@cwi.nl
glew@pdx007.intel.com (Andy Glew) (06/10/91)
> > > Also as has already been pointed out there is a software cost > > >due to such features as 0*x no longer being 0. In some cases this will > > >have a performance impact in that compilers will generate inferior code. >I still have a question here: do you know realistic cases where inferior >code will be generated? I do not think that 'a = 0.0 * b' is realistic. Urghh.... I used to be of this opinion - until I started making heavy use of preprocessors so that a single "generic" code can serve multiple purposes. Basically, I do poor man's partial evaluation - where I say "these parameters are fixed" - now give me an optimized program that evaluates the rest of the numbers I want. In this programming model expressions like 0.0 * b arise frequently. And it sure would be nice if they could be optimized away. For an introduction to partial evaluation for scientific codes, read "Compiling scientific Code Using Partial Evaluation", Andrew Berlin and Daniel Weise, IEEE Computer, December 1990. -- Andy Glew, glew@ichips.intel.com Intel Corp., M/S JF1-19, 5200 NE Elam Young Parkway, Hillsboro, Oregon 97124-6497 This is a private posting; it does not indicate opinions or positions of Intel Corp.
khb@chiba.Eng.Sun.COM (Keith Bierman fpgroup) (06/10/91)
In article <9106072346.AA08023@ucbvax.Berkeley.EDU> jbs@WATSON.IBM.COM writes: > I don't believe interval arithmetic is used enough to justify > any hardware support. This falls into the "H.R." chicken and egg rathole. If interval arithmetic (or anything else new and claimed to be good) is 100x slower than the traditional approach, it will simply never get off the ground. The members of IEEE 754 chose to include rounding modes; at this point we should either craft arguments for deleting them from a future version of the standard or debate something else. Arguing for their inclusion in a system is now a moot point; they are part of the standard. >.... credentials... Kahan was given the ACM Turing award for his contributions to floating point (most notably the IEEE standard). I personally suspect this counts for more than some heated debates on usenet. It is pointless for us to post our respective resumes. >... misc. fortran vs. ieee interactions .... The ways various standards interact is usually unpleasant in edge cases (SVID vs. POSIX vs. ANSI C for example). One hopes that in time, sucessful standards will see evolution _towards_ mutual compatibility. In the meantime it is a series of "quality of implementation issues" for each vendor to cope with as best they can. X3J3 and X3J11 both steered away from trying to make their languages fully IEEE 754 savvy. Unfortunately, there does not yet seem to be a move afoot to correct this in their next efforts .... but "f90" is just out the door (standard-wise) so we have ample time to hope for the best from X3J3|WG5 next time ;> It is unclear to _me_ if X3J11 or X3J16 is really the best forum for trying to fix C the next time .... there is a group called NCEG which is taken on the challenge of crafting proposals and finding an eventual venue. As has been pointed out by others, there are very few computing environments which provide handles on the bulk of the IEEE754 knobs. This is unfortunate, as many of the benefits of IEEE754 are lost thereby. Thus providing those who wish to propose weaker fp standards with fodder of the sort "no one uses those features anyway" ... this is quite unfortunate. -- ---------------------------------------------------------------- Keith H. Bierman keith.bierman@Sun.COM| khb@chiba.Eng.Sun.COM SMI 2550 Garcia 12-33 | (415 336 2648) Mountain View, CA 94043
dseal (David Seal) (06/11/91)
In article <1991Jun7.204910.878@borland.com> sjc@borland.com (Steve Correll) writes: >... > >The missing rule is (ANSI X3.9-1978 sect 6.6): > > "Any numeric operation whose result is not mathematically defined is > prohibited in the execution of an executable program. Examples are > dividing by zero and raising a zero-valued primary to a zero-valued or > negative-valued power." > >Who knows whether IEEE infinity is a "mathematically defined result"? If you I would think it isn't: it doesn't correspond to a single mathematical real number in the way that ordinary IEEE numbers do. >claim that it is, then a compiler which collapses 0*x => x violates section >6.6.4. If you claim that it is not, then a program which even generates IEEE >infinity or NaN (not to mention a program which expects them to propagate >through computations in a well-defined fashion) violates section 6.6. The IEEE >fp committee chose not to deal with existing language standards. > >This hasn't prevented lots of people from trying to have their cake and eat >it too, by saying that IEEE floating point should have no impact on compiler >optimization, but the programmer should compute with infinity and NaN anyway, >and it's the programmer's fault if s/he doesn't know that some >IEEE-legal-but-Fortran-illegal features will burn you and others will not, >depending on the implementation in question. > >In practice, of course, programmers will attempt to do exactly that, and lots >of them will get burned. You can do useful work with a compiler which doesn't >de-optimize its code to cater to the IEEE standard, but you do need to be >careful about it. I don't see any evidence here that the "IEEE fp committee chose not to deal with existing language standards". The existence of a facility in IEEE arithmetic does *not* mean that every language has to supply it. For instance, in this case the quoted part of the Fortran standard effectively means that non-numeric values like infinities and NaNs must not occur when a Fortran program is run, and that a run-time error should occur in cases where the result of an operation can only be non-numeric. Unless the IEEE system's trap-handling is seriously deficient (e.g. by ignoring most of the "should" parts of the IEEE standard), this can be arranged by having the run-time system trap the IEEE invalid operation, divide-by-zero and overflow exceptions and produce appropriate run-time error messages. As long as data is not initialised to non-numeric values, these trap handlers will ensure that a run-time error will occur as soon as any such value is produced. (Ignoring such things as non-numeric values produced by writing to integer variables which are aliased with floating point variables - such tricks are highly system-dependent anyway.) This means that non-numeric values are "quarantined", so that the compiler may safely use such optimisations as 0.0*X = 0.0. In short, the IEEE fp committee seem to have dealt with existing language standards by providing a superset of their facilities, together with ways of trapping attempts to make use of the extra facilities when (as in the case of the quoted part of the Fortran standard) the language standard forbids their use. If you don't want to make use of infinities and NaNs, a Fortran system that traps them as above will suit your purposes and will strictly conform to the Fortran standard. If you do want to make use of them, the quoted part of the Fortran standard means that a strictly-conforming Fortran system is not for you. You can't have it both ways... Of course, some IEEE systems may be "broken" from this point of view. E.g. if trapping the invalid operation, divide-by-zero or overflow exceptions slows down your IEEE system dramatically, you may be faced with the awkward choice of having your programs run slowly and strictly in accordance with the Fortran standard, or fast and in a non-conforming way. I suspect many language implementors would choose the latter. Then lots of programmers may get burned... David Seal Sorry about the bad "From" line. This is on a list of things to be fixed here. Correct email address is "dseal@armltd.co.uk". All opinions are mine only...
jpk@ingres.com (Jon Krueger) (06/11/91)
jbs@WATSON.IBM.COM (James B. Shearer) writes: > One of the problems with IEEE's infs and > NaNs is that they do not fit well with fortran. Is this a problem with IEEE floats or with FORTRAN? What exceptions for numeric computation would you define? What language support would you specify for exception handling? What default behaviors would you require at execution time? What commercially available environments implement this? Who has validated that they do? For what languages? On what platforms? > I think that if a (legal) program behaves differently when compiled with > and without optimization that then the compiler is broken. Perhaps you > disagree. I take it you are unfamiliar with the concept of ambiguity in language definition. It means that a single program can behave in two different ways without the language definition identifying either one as wrong. Both ways are perfectly legal according to the standard. That's why we call it ambiguous. The optimizer is free to choose either way without violating the standard. I take it you are also unfamiliar with programming language specifications which identify certain usages as undefined. The specification states explicitly that the results are undefined, or not fully defined. Programs that use them are not guaranteed to behave the same way on all compilers conforming to the language specification. Legal programs can depend on them. Unwise, perhaps; not very portable, certainly; but still perfect legal programs in that language. Again, the optimizer is free to generate a different behavior from the unoptimized one. That this may in some cases be undesirable is not at issue here. Providing consistent behavior for these programs falls under the category of quality of implementation, not standards conformance. If inconsistent behavior is encountered here as a function of level of optimization, we may agree that the implementation is of poor quality. But the compiler isn't broken; the code is. > So how would you code this (detecting whether x inf or NaN)? > IF (0.0*X .NE. 0.0) In FORTRAN, I would code it IF (FLTINF(X) .OR. FLTNAN(X)) where FLTINF and FLTNAN are defined as returning true if X is set to IEEE infinity or IEEE not a number respectively, and false if X is set to anything else. I know of no portable FORTRAN expression that can do this inline. Implementation of FLTINF and FLTNAN would of course be system specific. For instance on SunOS I might implement FLTINF as: LOGICAL FUNCTION FLTINF(X) REAL X IF (X .EQ. r_infinity()) FLTINF = .TRUE. ELSE FLTINF = .FALSE. RETURN END On lesser operating systems correct implementation of FLTINF would require dropping into assembler. This however is an implementation question for those who maintain FLTINF. Those who call it may expect consistent behavior across compilers, operating systems, and levels of optimization. I know of no relevant standard which makes the same guarantees about (0.0*X .NE. 0.0). Perhaps you could help me by citing one. Please explain exactly what it guarantees and where it applies. -- Jon -- Jon Krueger, jpk@ingres.com
henry@zoo.toronto.edu (Henry Spencer) (06/12/91)
In article <192@armltd.uucp> dseal (David Seal) writes: >>Who knows whether IEEE infinity is a "mathematically defined result"? If you > >I would think it isn't: it doesn't correspond to a single mathematical real >number in the way that ordinary IEEE numbers do. Why do you insist that the standard real numbers be the mathematical system used? Number systems which include infinities are neither novel nor somehow illegitimate. The real question about FORTRAN vs IEEE is how precise the FORTRAN standard is about the nature of the number system it is running on. Given that FP systems containing infinities are *not* new -- IEEE did not invent this idea -- I would be surprised if a careful reading of the standard mandated pure mathematical reals. (The real answer, of course, is that the FORTRAN standard was too early to even consider IEEE FP, and attempts to determine the interaction of the two by reading between the lines are a bit silly.) -- "We're thinking about upgrading from | Henry Spencer @ U of Toronto Zoology SunOS 4.1.1 to SunOS 3.5." | henry@zoo.toronto.edu utzoo!henry
jbs@WATSON.IBM.COM (06/12/91)
I said: >If your chip has 80 bit registers I fail to see how you can avoid >performing 80 bit loads and stores while saving and restoring these registers. David Chase said: Other people have managed in the past. Perhaps you are insufficiently imaginative. Perhaps not. Of course one way people have managed in the past is to only save and restore the first 64 bits of the register. In my view this demonstrates excessive imagination. James B. Shearer
jbs@WATSON.IBM.COM (06/12/91)
I said: > I don't believe interval arithmetic is used enough to justify > any hardware support. Dik Winter said: Well, IBM thought it important enough to provide support in some models (43xx). Acrith is the keyword. Ah Acrith, I will let Kahan comment ("Anomalies in the IBM Acrith Package" by W. Kahan and E. LeBlanc, Version dated March 13, 1985, p. 9): "ACRITH is not at all typical of IBM products but seems instead first to have escaped prematurely from a research project and then to have evaded quality controls." I said: > In > any case I believe your estimate of 3x slower than straight floating > point is unrealistic. Dik Winter said How come unrealistic? Example, interval add a to b giving c: set round to - inf; c.low = a.low + b.low; set round to + inf; c.upp = a.upp + b.upp; why would this be more than 3x slower than straight floating point? Because it's 4 instructions vrs 1. Also lets see your code for interval multiplication. James B. Shearer
jbs@WATSON.IBM.COM (06/12/91)
I said: > I don't believe interval arithmetic is used enough to justify > any hardware support. Keith Bierman said: This falls into the "H.R." chicken and egg rathole. If interval arithmetic (or anything else new and claimed to be good) is 100x slower than the traditional approach, it will simply never get off the ground. Some comments: 1. Interval arithmetic is not "new". People have been writing books and papers on it for at least 30 years. 2. Herman Rubin is still trying to get people to implement his proposed instructions. The interval arithmetic people got their instructions into the IEEE standard and they have been implemented in hardware in many machines. Interval arithmetic still hasn't gotten off the ground. How long are we expected to wait until concluding it's not going to fly. 3. The real problem with interval arithmetic is not that it's slow but that on real applications its error estimates are gener- ally so pessimistic as to be useless. Extended precision generally gives better estimates with less work. There are many ideas which have seemed promising but never quite made it, bubble memory for example. While judgement can never be final, automatic error estimation in general and interval arithmetic in particular seems unlikely to ever fulfill the hopes many had for it. James B. Shearer
jbs@WATSON.IBM.COM (06/12/91)
David Hough writes: Part of the problem is that the benchmark programs in use measure only common cases. Various versions of the Linpack benchmark all have in common that the data is taken from a uniform random distribution, producing problems of very good condition. So the worst possible linear equation solver algorithm running on the dirtiest possible floating-point hardware should be able to produce a reasonably small residual, even for input problems of very large dimension. This is untrue. Consider pivoting on the column element of smallest magnitude while using IEEE single precision. I don't believe n has to be very large before you are in big trouble. David Hough writes: Such problems may actually correspond to some realistic applications, but many other realistic applications tend at times to get data that is much closer to the boundaries of reliable performance. This means that the same algorithms and input data will fail on some computer systems and not on others. In consequence benchmark suites that are developed by consortia of manufacturers proceeding largely by consensus, such as SPEC, will tend to exclude applications, however realistic, for which results of comparable quality can't be obtained by all the members' systems. If you know of a single realistic application where the diff- erence between 64 bit IEEE rounded to nearest and 64 bit IEEE chopped to zero makes a significant difference in the maximum performance ob- tainable I would like to see it. James B. Shearer
msp33327@uxa.cso.uiuc.edu (Michael S. Pereckas) (06/12/91)
In <9106112357.AA18157@ucbvax.Berkeley.EDU> jbs@WATSON.IBM.COM writes: > I said: >>If your chip has 80 bit registers I fail to see how you can avoid >>performing 80 bit loads and stores while saving and restoring these registers. > David Chase said: >Other people have managed in the past. Perhaps you are insufficiently >imaginative. > Perhaps not. Of course one way people have managed in the past >is to only save and restore the first 64 bits of the register. In my >view this demonstrates excessive imagination. > James B. Shearer Although 80 bit in-register intermediates can be useful, if all 80 bits aren't saved during context switches, you end up with random precision, and no one will be very happy. There have been machines in the past with this problem, if I remember correctly. -- < Michael Pereckas <> m-pereckas@uiuc.edu <> Just another student... > "This desoldering braid doesn't work. What's this cheap stuff made of, anyway?" "I don't know, looks like solder to me."
khb@chiba.Eng.Sun.COM (Keith Bierman fpgroup) (06/12/91)
In article <9106120054.AA19903@ucbvax.Berkeley.EDU> jbs@WATSON.IBM.COM writes:
instructions into the IEEE standard and they have been implemented
in hardware in many machines. Interval arithmetic still hasn't
gotten off the ground. How long are we expected to wait until
concluding it's not going to fly.
Until vendors (other than apple and sun) provide hooks that people can
actually use. Had such hooks been part of _any_ language standard, I
suspect folks with such interests would have merrily coded up systems.
3. The real problem with interval arithmetic is not that it's
slow but that on real applications its error estimates are gener-
ally so pessimistic as to be useless.
At last, something we agree on! But I confess that I have not used it
extensively, so I am prepared to believe that it may have utility I am
not personally cognizant of. Interval techniques will have to be
deployed in large quantities before it can be counted out of the
running, IMHO.
--
----------------------------------------------------------------
Keith H. Bierman keith.bierman@Sun.COM| khb@chiba.Eng.Sun.COM
SMI 2550 Garcia 12-33 | (415 336 2648)
Mountain View, CA 94043
hrubin@pop.stat.purdue.edu (Herman Rubin) (06/12/91)
In article <KHB.91Jun11185538@chiba.Eng.Sun.COM>, khb@chiba.Eng.Sun.COM (Keith Bierman fpgroup) writes: > In article <9106120054.AA19903@ucbvax.Berkeley.EDU> jbs@WATSON.IBM.COM writes: > instructions into the IEEE standard and they have been implemented > in hardware in many machines. Interval arithmetic still hasn't > gotten off the ground. How long are we expected to wait until > concluding it's not going to fly. > Until vendors (other than apple and sun) provide hooks that people can > actually use. Had such hooks been part of _any_ language standard, I > suspect folks with such interests would have merrily coded up systems. > 3. The real problem with interval arithmetic is not that it's > slow but that on real applications its error estimates are gener- > ally so pessimistic as to be useless. > At last, something we agree on! But I confess that I have not used it > extensively, so I am prepared to believe that it may have utility I am > not personally cognizant of. Interval techniques will have to be > deployed in large quantities before it can be counted out of the > running, IMHO. Interval techniques will often be needed when extreme accuracy is needed. But in these case, it will probably be necessary to use multiple precision anyway. There are cheaper ways to analyze errors in most situations than interval arithmetic, but they involve using various precisions. In some cases, a very good estimate can even be made in advance of any computation. I suggest that someone who thinks interval arithmetic is that great try it on some such task as computing a convolution using the FFT. There are many reasonable examples where one would even throw out the idea of using the faster FFT, even with multiprecision, in favor of methods not so prone to error. -- Herman Rubin, Dept. of Statistics, Purdue Univ., West Lafayette IN47907-1399 Phone: (317)494-6054 hrubin@l.cc.purdue.edu (Internet, bitnet) {purdue,pur-ee}!l.cc!hrubin(UUCP)
dana@hardy.hdw.csd.harris.com (Dan Aksel) (06/12/91)
In article <9106112357.AA18157@ucbvax.Berkeley.EDU> jbs@WATSON.IBM.COM writes: > > I said: >>If your chip has 80 bit registers I fail to see how you can avoid >>performing 80 bit loads and stores while saving and restoring these registers. >----------------------------------------------------------- > Perhaps not. Of course one way people have managed in the past >is to only save and restore the first 64 bits of the register. In my >view this demonstrates excessive imagination. > James B. Shearer Solution is relatively simple. Hardware provides a status register which indicates the precision of each register. A computed GOTO tells which format to save each individual register in. This may create too much overhead if a rapid context switch time is the critical path instead of memory space. THe only compromise I can see immediately is a two bit status field which tells the precision of the LARGEST format currently stored in the register file. All registers are then saved in this format. Not an optimal solution but it works. --- Dan AKsel
hrubin@pop.stat.purdue.edu (Herman Rubin) (06/13/91)
In article <3646@travis.csd.harris.com>, dana@hardy.hdw.csd.harris.com (Dan Aksel) writes: > In article <9106112357.AA18157@ucbvax.Berkeley.EDU> jbs@WATSON.IBM.COM writes: | > I said: | >>If your chip has 80 bit registers I fail to see how you can avoid | >>performing 80 bit loads and stores while saving and restoring these registers. | >----------------------------------------------------------- | > Perhaps not. Of course one way people have managed in the past | >is to only save and restore the first 64 bits of the register. In my | >view this demonstrates excessive imagination. | > James B. Shearer > Solution is relatively simple. Hardware provides a status register which > indicates the precision of each register. A computed GOTO tells which format to > save each individual register in. This may create too much overhead if a rapid > context switch time is the critical path instead of memory space. > > THe only compromise I can see immediately is a two bit status field which tells > the precision of the LARGEST format currently stored in the register file. All > registers are then saved in this format. Not an optimal solution but it works. For saving and restoring individual 80-bit registers, why not use either a 64+16or a 96 or even 128 bit memory unit? If a bunch of registers are to be saved, to be restored later, the hardware could set up the blocks of 16 and 64 bit pieces. -- Herman Rubin, Dept. of Statistics, Purdue Univ., West Lafayette IN47907-1399 Phone: (317)494-6054 hrubin@l.cc.purdue.edu (Internet, bitnet) {purdue,pur-ee}!l.cc!hrubin(UUCP)
ok@goanna.cs.rmit.oz.au (Richard A. O'Keefe) (06/13/91)
In article <1991Jun10.234313.2157@ingres.Ingres.COM>, jpk@ingres.com (Jon Krueger) writes: > jbs@WATSON.IBM.COM (James B. Shearer) writes: > > One of the problems with IEEE's infs and > > NaNs is that they do not fit well with fortran. > Is this a problem with IEEE floats or with FORTRAN? With all this noise about IEEE 754/854 and Fortran and 0.0*X and so on, I thought it was time to hear a voice from the past. We avoided algebraic constructions leading to known numerical problems. For example, when floating-point arithmetic is truncated or rounding occurs before renormalization, the expression 2.0*x often contains error. We used the expression x+x instead, because it is always error-free on binary machines and never worse than 2.0*x on other machines. Similarly, the expression 1.0-x was replaced, on machines that lacked guard digits, by the expression (0.5-x)+0.5, to protect precision for 0.0 < x < 1.0. ... Perhaps the most elegant example of error pinpointing occurred in initial tests at Wisconsin for the complete elliptic integral K(m). For m close to 1.0, function accuracy depends critically upon accurate evaluation of the expression (1.0 - m). As described earlier, this expression was replaced by the more stable expression (0.5 - m) + 0.5. Despite this tactic, the tests detected large errors for m close to 1.0. ... Univac\s optimizing compiler had ``helpfully'' restored the earlier expression ... [we] designed a simple counter-measure. FUNPACK -- a Package of Special Function Routines. W. J. Cody. MORAL: _any_ compiler which applies "algebraic" identities that apply to mathematical reals but not to the target machine's floats is going to make it rather harder for people to write accurate In another chapter of the same book (Sources and Development of Mathematical Software), Cody observes High-quality software is supposed to be fail-safe and transportable; it must work properly regardless of quirks in the host arithmetic system. Software production is seriously hampered when computer arithmetic violates simple arithmetic properties such as 1.0 * X = X X * Y = Y * X and X + X = 2.0 * X There exist mainframes of recent [1980] design in which each of these properties fails for appropriate floating-point X and Y. [He's talking about *finite* values, folks. Not inf, nan, or eps.] Worse yet, on some machines there exist floating-point X > 0.0 such that 1.0 * X = 0.0 X + X = 0.0 or SQRT(X)**2 = overflow or underflow. ALL THESE ANOMALIES ARE TRACEABLE TO ENGINEERING ECONOMIES. Computer architects repeatedly ignore complaints about such mathematical atrocities, and new anomalies seem to appear with each new machine. ... Not only is [IEEE 754] fee of anomalies, but it also contains new features SPECIFICALLY REQUESTED AND DESIGNED BY NUMERICAL ANALYSTS WITH SOFTWARE EXPERIENCE. Elsewhere in the same book (a collection of papers edited by W.R.Cowell and published in 1984) someone else complains that on the Cray-1 there is a _finite_ value X such that 1.0*X overflows. What price the optimisation 1.0*X => X on that machine? I suggest that if anyone wants to partially evaluate something like sum(i*c(i+1), i = 0,n) and have it turn first into 0*c(1) + 1*c(2) + 2*c(3) and then into c(2) + 2*c(3), the appropriate thing is for the partial evaluator to do that, not the compiler, and to do it only when licensed to by an explicit declaration "USE-REAL-AXIOMS(c(*))" (or when it knows that the target machine obeys the laws 0*x = 0, 1*x = x, 0+x = x). It took me a couple of years to get over the "optimisation" I was taught to use of replacing X/Y by X*YINV, with YINV precalculated; the last thing I want is an ``optimising'' compiler putting the less accurate expression back. -- Q: What should I know about quicksort? A: That it is *slow*. Q: When should I use it? A: When you have only 256 words of main storage.
dgh@validgh.com (David G. Hough on validgh) (06/13/91)
Responding to my earlier comments, in article <9106120131.AA20868@ucbvax.Berkeley.EDU>, jbs@WATSON.IBM.COM writes: > >> Various versions of the Linpack benchmark all have in common >> that the data is taken from a uniform random distribution, producing problems >> of very good condition. So the worst possible linear >> equation solver algorithm running on the dirtiest possible floating-point >> hardware should be able to produce a reasonably small residual, even for >> input problems of very large dimension. > > This is untrue. Consider pivoting on the column element of > smallest magnitude while using IEEE single precision. I don't believe > n has to be very large before you are in big trouble. I should have said "the worst PROBABLE algorithm" - no pivoting at all rather than agressively choosing the worst possible pivot. It ought to be possible to try this out by disabling the pivot selection in the Linpack benchmark. >> This means that the same algorithms and input data >> will fail on some computer systems and not on others. > > If you know of a single realistic application where the diff- > erence between 64 bit IEEE rounded to nearest and 64 bit IEEE chopped > to zero makes a significant difference in the maximum performance ob- > tainable I would like to see it. I doubt you'll find anything like this in normal scientific computation - the difference in problem domain between floating-point systems with identical exponent and significand fields and rounding schemes comparable in cost and care is rarely likely to be extremely noticable. I would expect to notice the difference in problems like orbit computations, however - rounding should yield acceptable results for longer simulated times than chopping, even if everything else is equal, simply because the statistics of rounding are more favorable. I have also heard that the details of rounding are important in financial calculations of bond yields. More to the point of benchmarking defined by vendor consortia, the differences in applicable domain between 32-bit IBM 370 format and 32-bit IEEE, or between 64-bit Cray format and 64-bit IEEE, or even between HP and TI hand calculators with 10 displayed digits, has been a part of the experience of a number of comp.arch readers. In the two former cases you can often get by with twice as much of the dirtier precision. In the Cray case that's much slower, however, and may eliminate most of the supercomputer's advantage over a workstation. The IBM 370 case may be slower too if the problem involves much more than +-*. -- David Hough dgh@validgh.com uunet!validgh!dgh na.hough@na-net.ornl.gov
dseal (David Seal) (06/13/91)
In article <1991Jun11.175639.22558@zoo.toronto.edu> henry@zoo.toronto.edu (Henry Spencer) writes: >In article <192@armltd.uucp> dseal (David Seal) writes: >>>Who knows whether IEEE infinity is a "mathematically defined result"? If you >> >>I would think it isn't: it doesn't correspond to a single mathematical real >>number in the way that ordinary IEEE numbers do. > >Why do you insist that the standard real numbers be the mathematical system >used? Number systems which include infinities are neither novel nor somehow >illegitimate. The real question about FORTRAN vs IEEE is how precise the >FORTRAN standard is about the nature of the number system it is running on. >Given that FP systems containing infinities are *not* new -- IEEE did not >invent this idea -- I would be surprised if a careful reading of the >standard mandated pure mathematical reals. > >(The real answer, of course, is that the FORTRAN standard was too early >to even consider IEEE FP, and attempts to determine the interaction of the >two by reading between the lines are a bit silly.) A fair point, and I agree it's open to interpretation. I would favour (*not* "insist on") the 'subset of the unextended real numbers' interpretation because (a) the unextended real numbers are a mathematical system which is *comparatively* easy to work with; (b) in the absence of any clues to the contrary, I would expect the Fortran standard to mean either 'using whatever number system is most appropriate on your machine' or 'using a system based on the unextended real numbers'. There seems to be a reasonable chance of writing reasonably portable Fortran code for the latter, despite the problems in establishing e.g. how big rounding errors can be and what the limits of the exponent range are. There are more major portability problems with using whatever number system you like - in particular, this whole thread was started by such problems with NaNs and infinities! As a major reason for having language standards is to promote portability, allowing a completely free choice of number system seems a bit awkward. In any case, the main point of my previous posting is that *if* you interpret the Fortran standard as insisting on the 'subset of the unextended real numbers' interpretation, you can achieve the required effect inside an IEEE system. Furthermore, statements like "0.0 * X = 0.0" are unconditionally true in such a system, so they can be used safely by optimisers. You are of course free to interpret the Fortran standard in other ways. (I should probably have been a bit less dogmatic in talking about "strict" compliance with the standard.) But if you use the full IEEE model, you then have to solve quite a few problems: e.g. how does the programmer detect the use of an infinity/NaN; how do you achieve a reasonable level of optimisation without breaking the IEEE standard; how do you make code written for your system portable to systems without IEEE arithmetic, etc.? To sum up: I suggested a way to have your Fortran adhere to the Fortran standard and your arithmetic to the IEEE standard, in a portable fashion. As a result, I don't see any evidence that the IEEE committee ignored existing language standards. There may well be other ways of achieving the same effect which additionally give you more complete access to the facilities of an IEEE arithmetic system, but IMHO they are all likely to be more complex and to require more understanding by the programmer. David Seal Sorry about the bad "From" line. This is on a list of things to be fixed here. Correct email address is "dseal@armltd.co.uk". All opinions are mine only...
dik@cwi.nl (Dik T. Winter) (06/14/91)
In article <9106120018.AA18733@ucbvax.Berkeley.EDU> jbs@WATSON.IBM.COM writes: > > I don't believe interval arithmetic is used enough to justify > > any hardware support. Me: > Well, IBM thought it important enough to provide support in some models (43xx). JBS: > Acrith is the keyword. > Ah Acrith, I will let Kahan comment ... No need, I know Kahan's opinions about Acrith. Still IBM thought it important enough to put it in hardware (actually microcode I think). Perhaps under severe pressure from IBM Germany? But at least as far as IBM saw it there was a niche where interval arithmetic is heavily used. Me: > How come unrealistic? Example, interval add a to b giving c: > set round to - inf; > c.low = a.low + b.low; > set round to + inf; > c.upp = a.upp + b.upp; > why would this be more than 3x slower than straight floating point? JBS: > Because it's 4 instructions vrs 1. Thus it would be 4 times as slow if each instruction takes exactly the same time to issue. This is not true on processors where the FPU is not pipelined (in that case the setting of the rounding mode is much faster than the addition). It might also not be true on some pipelined processors, i.e. if the multiply unit is not pipelined. Etc. Do you indeed know machines where the above sequence would be four times as slow as a normal multiply? JBS: > Also lets see your code > for interval multiplication. Not difficult, it goes something like: if a.low >= 0.0 then if b.low >= 0.0 then 4 operations else if b.upp <= 0.0 then 4 operations else if b.upp >= - b.low then 4 operations else 4 operations the remainder is left as an exercise for the reader. The code is a bit hairy, and on modern RISC machines you would not put it inline (a subroutine jump to a leaf routine is extremely cheap). We see that also here on many machines the multiply operations take most of the time. (The code assumes that intervals do not contain inf, if you allow that the code only gets hairier, not much more time consuming.) Of course division is still hairier, but still reasonably possible. -- dik t. winter, cwi, amsterdam, nederland dik@cwi.nl
dik@cwi.nl (Dik T. Winter) (06/14/91)
In article <KHB.91Jun11185538@chiba.Eng.Sun.COM> khb@chiba.Eng.Sun.COM (Keith Bierman fpgroup) writes: > In article <9106120054.AA19903@ucbvax.Berkeley.EDU> jbs@WATSON.IBM.COM writes: > 3. The real problem with interval arithmetic is not that it's > slow but that on real applications its error estimates are gener- > ally so pessimistic as to be useless. > At last, something we agree on! But I confess that I have not used it > extensively, so I am prepared to believe that it may have utility I am > not personally cognizant of. Interval techniques will have to be > deployed in large quantities before it can be counted out of the > running, IMHO. If we look at what the Karlsruhe people are doing there is no problem with the error estimates from interval arithmetic. To give an example: to solve a system of linear equations, first a standard method is used to approximate a solution. Next, using interval arithmetic (plus dot product with one rounding), in an iterative process the approximate solution is enclosed in a solution interval that is made as small as possible. I have seen from the process where the lower and upper bound of each solution component differed only in a single bit (and those bounds were guaranteed). -- dik t. winter, cwi, amsterdam, nederland dik@cwi.nl
dik@cwi.nl (Dik T. Winter) (06/14/91)
David Hough writes: > Part of the problem is that the benchmark programs in use measure only > common cases. Various versions of the Linpack benchmark all have in common > that the data is taken from a uniform random distribution, producing problems > of very good condition. So the worst possible linear > equation solver algorithm running on the dirtiest possible floating-point > hardware should be able to produce a reasonably small residual, even for > input problems of very large dimension. > In article <9106120131.AA20868@ucbvax.Berkeley.EDU> jbs@WATSON.IBM.COM writes: > This is untrue. Consider pivoting on the column element of > smallest magnitude while using IEEE single precision. I don't believe > n has to be very large before you are in big trouble. But you can get reasonable results without any pivoting if the condition is very good! Of course, if you do pivot you probably will get better results; if you do full pivoting you will probably get still better results. From my experience, once I wrote a solver where due to a coding error no pivoting was performed any time. It solved the 1024x1024 systems I initially tested it on very well. Of course, those systems were extremely good conditioned. The algorithm fell flat on its face when it was fed a not so random input. -- dik t. winter, cwi, amsterdam, nederland dik@cwi.nl
mohta@necom830.cc.titech.ac.jp (Masataka Ohta) (06/14/91)
In article <13450@mentor.cc.purdue.edu> hrubin@pop.stat.purdue.edu (Herman Rubin) writes: >> At last, something we agree on! But I confess that I have not used it >> extensively, so I am prepared to believe that it may have utility I am >> not personally cognizant of. Interval techniques will have to be >> deployed in large quantities before it can be counted out of the >> running, IMHO. >Interval techniques will often be needed when extreme accuracy is needed. >But in these case, it will probably be necessary to use multiple precision >anyway. Interval arithmetic is useful when robustness is necessary. For example, in a paper "On Ray Tracing Parametric Surfaces" (D. L. Toth, SIGGRAPH '85 Conference Proceedings) interval arithmetic is used to reliably find the nearest intersection beween a line and a parametric surface. >There are cheaper ways to analyze errors in most situations than >interval arithmetic, but they involve using various precisions. In some >cases, a very good estimate can even be made in advance of any computation. Error estimation is not enough for robust computation, but, error bounding by interval arithmetic is useful. As in Toth's paper, it is possible to use non-interval newton iteration if its unique convergence is assured by interval arithmetic. Masataka Ohta
mshute@cs.man.ac.uk (Malcolm Shute) (06/14/91)
In article <9106070109.AA02137@ucbvax.Berkeley.EDU> jbs@WATSON.IBM.COM writes: >I think that if a (legal) program behaves differently when compiled with >and without optimization that then the compiler is broken. Perhaps you >disagree. B = 5.0 IF ( ( B .GT. 3.0 ) .OR. MYTEST(A) ) PRINT some diagnotic messages LOGICAL FUNCTION MYTEST(X) REAL X MYTEST = some boolean work PRINT a few diagnostics on the screen RETURN END -- Malcolm SHUTE. (The AM Mollusc: v_@_ ) Disclaimer: all
peter@ficc.ferranti.com (peter da silva) (06/15/91)
In article <GLEW.91Jun9110817@pdx007.intel.com>, glew@pdx007.intel.com (Andy Glew) writes: > In this programming model expressions like 0.0 * b arise frequently. > And it sure would be nice if they could be optimized away. Couldn't you do something like: #define foo(...,x,...) ...((x)?((x) * b):(0.0))... This should give the compiler a clue, no? -- Peter da Silva; Ferranti International Controls Corporation; +1 713 274 5180; Sugar Land, TX 77487-5012; `-_-' "Have you hugged your wolf, today?"
peter@ficc.ferranti.com (peter da silva) (06/15/91)
In article <9106112357.AA18157@ucbvax.Berkeley.EDU>, jbs@WATSON.IBM.COM writes: > Of course one way people have managed in the past [to handle 80 bit registers on context switches] > is to only save and restore the first 64 bits of the register. In my > view this demonstrates excessive imagination. Another popular technique is to use a single-tasking system and to crash on floating point stack overflows. -- Peter da Silva; Ferranti International Controls Corporation; +1 713 274 5180; Sugar Land, TX 77487-5012; `-_-' "Have you hugged your wolf, today?"
jbs@WATSON.IBM.COM (06/15/91)
David Seal (quoting someone quoting the fortran standard): >The missing rule is (ANSI X3.9-1978 sect 6.6): > "Any numeric operation whose result is not mathematically defined is > prohibited in the execution of an executable program. Examples are > dividing by zero and raising a zero-valued primary to a zero-valued or > negative-valued power." >Who knows whether IEEE infinity is a "mathematically defined result"? If you I would think it isn't: it doesn't correspond to a single mathematical real number in the way that ordinary IEEE numbers do. Some comments: 1. The result of multiplying 2 reals is always mathematically de- fined. Therefore I don't see how this clause prohibits multiplies which will overflow or underflow. 2. Ordinary IEEE numbers don't really correspond to single mathema- tical real numbers, they correspond to intervals. Inf and -inf are just the names of the intervals on the ends. James B. Shearer
jbs@WATSON.IBM.COM (06/15/91)
I said: >Perhaps I should have said it is undesirable for a program to behave >differently when compiled with and without optimization regardless >of whether this is technically permitted by the standard. (Actually >I think it would desirable for the standard to exactly specify the >behavior of any legal program but apparently it does not.) David Boundy said: Undesirable, yes. But also unavoidable in a market where performance is what sells machines. Unavoidable is too strong. I believe there are compilers (such as xlf) for which compiling with optimization does not change the behavior of legal programs. David Boundy said: Thus, standards do "exactly specify the behavior of any legal program:" that's the defintion of a "legal program." I believe this is incorrect. If I write x=a+b+c and the stand- ard does not specify the order of evaluation then the compiler is free to evaluate it as (a+b)+c or as a+(b+c) (or according to some in either way depending on the optimization level). Thus the behavior of the pro- gram is not exactly specified. David Boundy said: ( There's the issue that on our 68000 machines, we do arithmetic at compile time in 64 bits, but the same expression unoptimized will be evaluated at run time in 80 bits. Sigh. ) Well if the behavior of any legal program is exactly specified how can this sort of thing occur. James B. Shearer
jbs@WATSON.IBM.COM (06/15/91)
Dik Winter said (the issue is whether 3x is a realistic estimate of how much slower interval arithmetic will be compared to straight floating point): > How come unrealistic? Example, interval add a to b giving c: > set round to - inf; > c.low = a.low + b.low; > set round to + inf; > c.upp = a.upp + b.upp; > why would this be more than 3x slower than straight floating point? I said: > Because it's 4 instructions vrs 1. Dik Winter: Thus it would be 4 times as slow if each instruction takes exactly the same time to issue. This is not true on processors where the FPU is not pipelined (in that case the setting of the rounding mode is much faster than the addition). It might also not be true on some pipelined processors, i.e. if the multiply unit is not pipelined. Etc. Do you indeed know machines where the above sequence would be four times as slow as a normal multiply? It will be at least 4 times slower on the Risc System 6000. It may be more if the set floating mode instructions screw up the pipe- line (I don't know if they do or not). I said > Also lets see your code > for interval multiplication. Dik Winter: Not difficult, it goes something like: if a.low >= 0.0 then if b.low >= 0.0 then 4 operations else if b.upp <= 0.0 then 4 operations else if b.upp >= - b.low then 4 operations else 4 operations the remainder is left as an exercise for the reader. The code is a bit hairy, and on modern RISC machines you would not put it inline (a subroutine jump to a leaf routine is extremely cheap). We see that also here on many machines the multiply operations take most of the time. (The code assumes that intervals do not contain inf, if you allow that the code only gets hairier, not much more time consuming.) Some comments: 1. Handling this by a subroutine call would seem to require 4 floating loads to pass the input arguments and 2 floating loads to re- trieve the answers. This is already expensive. 2. All the floating compares and branches will severely impact performance on the Risc System 6000. The slowdown will be much more than 3x. 3. Do you know of any machine where the above code will average 3x (or less) the time of a single multiply? On another topic Dik Winter said: But you can get reasonable results without any pivoting if the condition is very good! The need for pivoting is unrelated to the condition number. This matrix 0 1 has condition number 1 and requires pivoting. 1 0 This matrix 1+e 1-e (e small) has bad condition number and is not help- 1-e 1+e ed by pivoting. James B. Shearer
dik@cwi.nl (Dik T. Winter) (06/16/91)
In article <9106150258.AA16308@ucbvax.Berkeley.EDU> jbs@WATSON.IBM.COM writes trying to sidetrack me again. The origin of the discussion was a remark that interval arithmetic in software had been observed to be 300 times as slow as standard hardware floating point. While interval arithmetic had been observed to be 3 times as slow. Shearer questioned the number 3; I thought he questioned the order of magnitude but apparently he wants to know the exact number. Of course that can not be given as it is entirely machine dependent. Also the factor would be a function of the mix of operations performed. I gave an example of interval addition in four instructions and asked why it would be more than 3x slower; forgetting heavily pipelined machines that do not take along the current rounding mode state in the pipeline, forcing a slow setting of the rounding mode (the pipe must be empty to modify it). If the current rounding mode is carried along in the pipe there would be no problem with the setting of the rounding mode. JBS: > It will be at least 4 times slower on the Risc System 6000. Granted. The support for the particular rounding modes and changing them on the fly is not particularly well supported on the RS6000. There are also enough machines where it will be less than 3 times slower. About the multiplication routine, JBS: > 1. Handling this by a subroutine call would seem to require 4 > floating loads to pass the input arguments and 2 floating loads to re- > trieve the answers. This is already expensive. But you have in most cases to load these 4 numbers anyhow, so why would this be more expensive? Why loads to retrieve the answers? Why not just let them sit in the FP registers? > 2. All the floating compares and branches will severely impact > performance on the Risc System 6000. The slowdown will be much more > than 3x. O yes, on the RS6000. There are machines that will behave differently. Moreover, never it was said that there is an exact factor of 3 involved; that factor was simply observed, for a set of programs. > 3. Do you know of any machine where the above code will average > 3x (or less) the time of a single multiply? So this is irrelevant. What has been forgotten is that the same stuff in software would be extremely expensive. JBS, can you show a reasonable interval add in software? And an interval multiply? > On another topic Dik Winter said: > But you can get reasonable results without any pivoting if the condition > is very good! I should have added the context, where not only the condition of the complete matrix is very good, but also of all its principal minors. Of course you should never take a pivotal element that is too small. > > This matrix 1+e 1-e (e small) has bad condition number and is not help- > 1-e 1+e ed by pivoting. Yes, the condition is 1/e. Assume LDL' decomposition. The relative errors in L are small compared to the elements; we have to look at diagonal matrix D. The second element is 4e/(1+e). Given a machine precision t, the absolute error in that element is (%) t.|(1-e)^2/(1+e)| for a relative error (%) t.|(1-e)^2/4e|. So if e>0 the relative error is smaller than if e<0 showing that pivoting helps even here! Slightly of course, because e is small, but always with pivoting it does not matter so very much whether you pivot on the largest element or an element reasonably close to it. But I am already severely sidetracked again. -- (%) bounded by. -- dik t. winter, cwi, amsterdam, nederland dik@cwi.nl
dik@cwi.nl (Dik T. Winter) (06/16/91)
In article <9106150206.AA15019@ucbvax.Berkeley.EDU> jbs@WATSON.IBM.COM writes: > Unavoidable is too strong. I believe there are compilers > (such as xlf) for which compiling with optimization does not change > the behavior of legal programs. Right, it does some surprising things to get at that goal. Now you have to explain to your customers the reason that DO 10 I = 1, N DO 10 J = 1, N 10 X(I,J) = A * B(I) * C(J) is faster than DO 10 I = 1, N DO 10 J = 1, N 10 X(I,J) = A * B(J) * C(I) If you see the expressions some people write... I though we had left that distinction behind us some 15 years ago. More surprising is the code generated (non-optimized) for the statement: X(I) = 2.0 * A(I) + 2.0 it goes something like: load 2.0 in fr1 load a(i) in fr2 multiply fr1 and fr2 to fr0 load 2.0 in fr1 load 2.0 in fr1 multiply fr1 and fr2 and add fr1 to fr0 granted, it is non-optimized, but even than it sees that the maf operation can be used (otherwise it would not be able to use it in the optimized version). -- dik t. winter, cwi, amsterdam, nederland dik@cwi.nl
mccalpin@perelandra.cms.udel.edu (John D. McCalpin) (06/16/91)
>>>>> On 15 Jun 91 22:23:08 GMT, dik@cwi.nl (Dik T. Winter) said: Dik> In article <9106150258.AA16308@ucbvax.Berkeley.EDU> Dik> jbs@WATSON.IBM.COM writes: JBS> [....] 3. Do you know of any machine where the above code will JBS> average 3x (or less) the time of a single multiply? Dik> So this is irrelevant. Winter is claiming that Shearer is trying to obfuscate the issues about the performance of interval arithmetic (in terms of timing, not the usefulness of the results. I see Shearer asking a very simple question: Please provide *specific* information on machines where Winter's algorithms for interval add and interval multiply perform less than 3 times slower than the equivalent simple operation with a single rounding mode. Then I suppose that the next step is to compare the performance of interval arithmetic with 128-bit arithmetic on a machine like the RS/6000. To do this, we will first need to find someone who knows how expensive the rounding mode change is in the RS/6000 FPU pipe, and then someone who knows the timings of quad-precision arithmetic on the RS/6000. (I will be able to learn the latter as soon as I get xlf version 2 installed on my machine). -- John D. McCalpin mccalpin@perelandra.cms.udel.edu Assistant Professor mccalpin@brahms.udel.edu College of Marine Studies, U. Del. J.MCCALPIN/OMNET
glew@pdx007.intel.com (Andy Glew) (06/16/91)
Interval techniques will have to be deployed in large quantities before it can be counted out of the running, IMHO. ---------------------------------------------------------------- Keith H. Bierman keith.bierman@Sun.COM| khb@chiba.Eng.Sun.COM It will have to be successful before we can conclude that it is unsuccessful? -- Andy Glew, glew@ichips.intel.com Intel Corp., M/S JF1-19, 5200 NE Elam Young Parkway, Hillsboro, Oregon 97124-6497 This is a private posting; it does not indicate opinions or positions of Intel Corp.
jbs@watson.ibm.com (06/17/91)
David Hough said: > Various versions of the Linpack benchmark all have in common > that the data is taken from a uniform random distribution, producing problems > of very good condition. I believe "very good condition" is an exaggeration. Problems of size 1000 appear to typically have condition number between 10**5 and 10**6. I would not consider this very good. The condition num- ber appears to be growing considerably faster than n, does anyone know the expected asymptotic behavior? David Hough said: >The Linpack benchmark is intended to solve an equation A x = b >where the elements of A are chosen from a uniform random distribution >and b is computed so that the correct solution x is close to a vector >of 1's. So looking at the size of components of x-1's is supposed to >indicate how accurate the solution is, although >the residual b - A x is a more reliable measure of quality. Why do you consider the residual a more reliable measure of quality. If the condition is bad the residual will still be small when the answer is garbage. I asked: >> If you know of a single realistic application where the diff- >>erence between 64 bit IEEE rounded to nearest and 64 bit IEEE chopped >>to zero makes a significant difference in the maximum performance ob- >>tainable I would like to see it. David Hough said: >I doubt you'll find anything like this in normal scientific computation >- the difference in problem domain between floating-point systems with >identical exponent and significand fields and rounding schemes >comparable in cost and care is rarely likely to be extremely >noticable. I would expect to notice the difference in problems like >orbit computations, however - rounding should yield acceptable results >for longer simulated times than chopping, even if everything else is >equal, simply because the statistics of rounding are more favorable. I >have also heard that the details of rounding are important in financial >calculations of bond yields. I believe IEEE chopped will be easier to implement and perform better than IEEE rounded. If rounding almost never matters why require it? I would expect the accuracy of orbit computations to be limited by the accuracy to which the initial parameters are known and by the accur- acy of the physical model (if I am not mistaken the models of tidal eff- ects for example are not too accurate). I find it hard to believe the rounding method matters in bond calculations given 64-bit arithmetic and sound methods (I believe there are foolish ways to compute internal rates of return for example). James B. Shearer
dik@cwi.nl (Dik T. Winter) (06/17/91)
In article <MCCALPIN.91Jun16095840@pereland.cms.udel.edu> mccalpin@perelandra.cms.udel.edu (John D. McCalpin) writes: > >>>>> On 15 Jun 91 22:23:08 GMT, dik@cwi.nl (Dik T. Winter) said: > > Dik> In article <9106150258.AA16308@ucbvax.Berkeley.EDU> > Dik> jbs@WATSON.IBM.COM writes: > > JBS> [....] 3. Do you know of any machine where the above code will > JBS> average 3x (or less) the time of a single multiply? > > Dik> So this is irrelevant. > > Winter is claiming that Shearer is trying to obfuscate the issues > about the performance of interval arithmetic (in terms of timing, not > the usefulness of the results. > > I see Shearer asking a very simple question: Please provide *specific* > information on machines where Winter's algorithms for interval add and > interval multiply perform less than 3 times slower than the equivalent > simple operation with a single rounding mode. > > Then I suppose that the next step is to compare the performance of > interval arithmetic with 128-bit arithmetic on a machine like the > RS/6000. To do this, we will first need to find someone who knows how > expensive the rounding mode change is in the RS/6000 FPU pipe, and > then someone who knows the timings of quad-precision arithmetic on the > RS/6000. (I will be able to learn the latter as soon as I get xlf > version 2 installed on my machine). > -- > John D. McCalpin mccalpin@perelandra.cms.udel.edu > Assistant Professor mccalpin@brahms.udel.edu > College of Marine Studies, U. Del. J.MCCALPIN/OMNET -- dik t. winter, cwi, amsterdam, nederland dik@cwi.nl
dik@cwi.nl (Dik T. Winter) (06/17/91)
In article <3709@charon.cwi.nl> dik@cwi.nl (Dik T. Winter) writes: Sorry about that one; something went wrong. In article <MCCALPIN.91Jun16095840@pereland.cms.udel.edu> mccalpin@perelandra.cms.udel.edu (John D. McCalpin) writes: > I see Shearer asking a very simple question: Please provide *specific* > information on machines where Winter's algorithms for interval add and > interval multiply perform less than 3 times slower than the equivalent > simple operation with a single rounding mode. Oh yes, they exist. It depends on your programming model. I take the 80287/80387 and the 88100. They all have a fp control register that defines the rounding modes and the exception mask bits. To change rounding modes you can load that register with precalculated values, as the exception mask bits change only rarely. On those machines an interval add takes two load control register instructions and two adds, as compared to a single add. Cycle times: load fcr fp add 80287 10 85 80387 19 24-32 88100 1 3 So on the 80287 and the 88100 interval add is clearly better than 3 times a single add. On the 80387 it is only slightly slower. If we look at latency for the 88100, a single add has a latency of 5 cycles, an interval add a latency of 9 cycles. So given proper support for change of rounding modes the factor of 3 can easily be obtained for add. Multiply requires more than 3 due to compares and branches. Still I would expect a factor of about 3 for especially the 80287 and the 88100. The main point here is that those machines have a fpu control register and a fpu status register. Machines that combine the two (SPARC, MIPS, RS6000, HPPA, 68k, 32k) will be at an disadvantage because change of rounding mode implies fetching the current status/control register (which in turn implies that the fp execute queue must be empty), changing a single field and restoring the register. The situation on the RS6000 is a bit unclear. There are instructions that modify only a subfield of the FPSCR. So it would have been simple to organize the FPSCR such that a single instruction would modify only the rounding mode bits. This ought to be possible even if instructions are still in progress. I do not know whether this is done in reality (I do not even know the layout of the FPSCR, as the stupid Assembler manual does not give it!). > > Then I suppose that the next step is to compare the performance of > interval arithmetic with 128-bit arithmetic on a machine like the > RS/6000. Note: this is *not* a replacement for interval arithmetic. Granted, you get better results in more cases; the guarantee on your result is lacking. I am *not* an advocate for interval arithmetic (the people at Karlsruhe are). I do not use it. But I object to the way Shearer handles this: a. Shearer asks: what is the justification for the different rounding modes. b. Many responses come: interval arithmetic. c. Shearer asks: would it not be better helped with quad arithmetic? d. Response: observed speed difference a factor 3 with hardware rounding modes, a factor 300 in software. e. Shearer questions the factor 3. Apparently he believes the factor 300 (does he?). Even if the factor 3 would degrade on other machines to a factor of 5 or even 10, the difference with 300 is still striking. I ask Shearer again: come with an interval add assuming the base arithmetic is round to nearest only (or even worse, with truncating arithmetic, which you advocate in another article). -- dik t. winter, cwi, amsterdam, nederland dik@cwi.nl
herrickd@iccgcc.decnet.ab.com (06/18/91)
In article <196@armltd.uucp>, dseal (David Seal) writes: > A fair point, and I agree it's open to interpretation. I would favour (*not* > "insist on") the 'subset of the unextended real numbers' interpretation > because (a) the unextended real numbers are a mathematical system which is > *comparatively* easy to work with; One of the problems with this conception (and it dates back to Backus' team) is that any system of floating point numbers (ignoring new fangled ideas like nan and inf) is just a finite set of rationals. The REALs have odd features like 1) When you add two of them together, you get another of them 2) Addition is associative while the floating point number systems don't have such properties (because the underlying set is finite). dan herrick herrickd@iccgcc.decnet.ab.com
tim@proton.amd.com (Tim Olson) (06/18/91)
In article <3710@charon.cwi.nl> dik@cwi.nl (Dik T. Winter) writes: | Oh yes, they exist. It depends on your programming model. I take the | 80287/80387 and the 88100. They all have a fp control register that defines | the rounding modes and the exception mask bits. To change rounding modes | you can load that register with precalculated values, as the exception mask | bits change only rarely. On those machines an interval add takes two load | control register instructions and two adds, as compared to a single add. | Cycle times: | load fcr fp add | 80287 10 85 | 80387 19 24-32 | 88100 1 3 | So on the 80287 and the 88100 interval add is clearly better than 3 times a | single add. On the 80387 it is only slightly slower. If we look at latency | for the 88100, a single add has a latency of 5 cycles, an interval add a | latency of 9 cycles. One thing to remember, however, is that writes to control registers, such as a floating-point control register, are usually "serializing operations" in pipelined machines. This means that the FP pipeline will be flushed before the modification of the floating-point control register takes place. This doesn't make a difference in your timing analysis if result dependencies already existed in the code stream (so that the FP adds executed at their result rate, rather than their issue rate), but if there was enough pipelining/parallelism to allow the FP adds to run at full issue rate, then interval adds will be significantly more expensive: FP add load fcr latency issue Am29050 1 3 1 Series of 5 dependent FADDs: normal: 15 cycles interval: 40 cycles (2.67 X) Series of 5 independent FADDs: normal: 5 cycles interval: 40 cycles (8 X) I suppose that careful analysis of the code could point out where you can "batch" a series of independent calculations to reduce the round-mode switching, and thus the serialization penalty, but the penalty is still going to probably be much larger than 3X. -- -- Tim Olson Advanced Micro Devices (tim@amd.com)
dik@cwi.nl (Dik T. Winter) (06/18/91)
In article <1991Jun17.231640.3426@dvorak.amd.com> tim@amd.com (Tim Olson) writes: > One thing to remember, however, is that writes to control registers, > such as a floating-point control register, are usually "serializing > operations" in pipelined machines. This means that the FP pipeline will > be flushed before the modification of the floating-point control > register takes place. > The keypoint here is that the 88k and 80x87 do have *separate* control and status registers. If they are separated there would (if the unit is well designen) be no need for a write to the control register to be serializing. And indeed, I do not think it is on those two (at least I have found in nowhere in the documentation). I do not know about the 29050; having no docs for it. -- dik t. winter, cwi, amsterdam, nederland dik@cwi.nl
kenton@abyss.zk3.dec.com (Jeff Kenton OSG/UEG) (06/18/91)
In article <3736@charon.cwi.nl>, dik@cwi.nl (Dik T. Winter) writes: |> |> The keypoint here is that the 88k and 80x87 do have *separate* control and |> status registers. If they are separated there would (if the unit is well |> designen) be no need for a write to the control register to be serializing. |> And indeed, I do not think it is on those two (at least I have found in |> nowhere in the documentation). I do not know about the 29050; having no |> docs for it. |> The 88k does not serialize the processor when you write to the FP control register. You could add extra code to force serialization before a write. ----------------------------------------------------------------------------- == jeff kenton Consulting at kenton@decvax.dec.com == == (617) 894-4508 (603) 881-0011 == -----------------------------------------------------------------------------
dgh@validgh.com (David G. Hough on validgh) (06/18/91)
jbs@ibm.com says: > I believe IEEE chopped will be easier to implement and perform > better than IEEE rounded. This is true, but only marginally. The most important cost in IEEE (or VAX) floating-point arithmetic is getting the correct unrounded result, or at least enough of it to permit correct rounding. This requires provision of guard and (for chopping or unbiased rounding) sticky bits, postnormalization, development of full doubled-precision products, careful division and sqrt, etc. Once you've gotten the unrounded answer, rounding (as opposed to chopping) costs an extra add time and dealing with the possibility of a carry out of the most significant bit. Although these are real design costs, I doubt they've affected performance of modern pipelined IEEE designs compared to the cost of getting the unrounded result. I suspect that the reason IBM 370 arithmetic isn't correctly chopped is that the cost of doing it correctly - maintaining a sticky bit for subtraction - was deemed too high, and the computed result is more accurate in those cases where rounding is done instead of chopping. More accurate, but not in a useful way, however - the size of the error bounds you're likely to derive will have to reflect chopping, but any algorithms that might like to depend on correct chopping principles ( (x-z) < x if z is positive) will have to do without. In general, any process that involves repeated inexact conversions, particularly binary->decimal->binary->decimal, or addition of many small inexact quantities, will benefit from the better statistics of rounding over chopping - drifting tendencies will be reduced, and error bounds will be smaller. For instance, during the 1970's one of the minor regional stock exchanges noticed that its market stock average was declining in a bull market in which most of the component stocks were rising in value. There were a number of factors at work in this phenomenon; using chopping arithmetic was one of them. Using enough precision can sometimes overcome deficiencies of chopping or dirty arithmetic, otherwise Seymour Cray would have done things differently. But it isn't always enough, as most Cray customers find out occasionally. -- David Hough dgh@validgh.com uunet!validgh!dgh na.hough@na-net.ornl.gov
dseal@armltd.co.uk (David Seal) (06/18/91)
In article <9106150144.AA14442@ucbvax.Berkeley.EDU> jbs@WATSON.IBM.COM writes: > David Seal (quoting someone quoting the fortran standard): >>The missing rule is (ANSI X3.9-1978 sect 6.6): >> "Any numeric operation whose result is not mathematically defined is >> prohibited in the execution of an executable program. Examples are >> dividing by zero and raising a zero-valued primary to a zero-valued or >> negative-valued power." >>Who knows whether IEEE infinity is a "mathematically defined result"? If you >I would think it isn't: it doesn't correspond to a single mathematical real >number in the way that ordinary IEEE numbers do. > > Some comments: > 1. The result of multiplying 2 reals is always mathematically de- >fined. Therefore I don't see how this clause prohibits multiplies which >will overflow or underflow. It doesn't prohibit the multiply itself. However, the multiply is followed by an implicit rounding operation. In the case of an overflowing multiply (and assuming we are restricting our "mathematical" number system to be the reals), the rounding error in this rounding operation generates a non-real: see below for my argument on this. IMHO this entitles an implementation to prohibit such multiplies. (Though I would of course expect any implementation that did so to produce a run-time "overflow" error rather than quietly ignoring it and producing rubbish results...) > 2. Ordinary IEEE numbers don't really correspond to single mathema- >tical real numbers, they correspond to intervals. Inf and -inf are just >the names of the intervals on the ends. > James B. Shearer I don't think it's quite as simple as this. IEEE numbers produced as results of operations do indeed correspond to intervals. On the other hand, IEEE numbers used as operands are treated as if they correspond to single real numbers. The process of using a result as the source to some other operation is essentially where rounding errors appear. When one of the intervals at the end is produced, this "rounding error" causes the IEEE system to treat the result (an unbounded real interval) as something which is *not* a real number. This is where IEEE departs from the real number system. If you want IEEE numbers to genuinely correspond to intervals, you've got to do interval arithmetic with them :-). David Seal I believe the "From" line problem here has been fixed now. In case it isn't, my correct email address is "dseal@armltd.co.uk". All opinions are mine only...
jbs@WATSON.IBM.COM (06/19/91)
I said: >I think that if a (legal) program behaves differently when compiled with >and without optimization that then the compiler is broken. Perhaps you >disagree. Malcolm said: B = 5.0 IF ( ( B .GT. 3.0 ) .OR. MYTEST(A) ) PRINT some diagnotic messages LOGICAL FUNCTION MYTEST(X) REAL X MYTEST = some boolean work PRINT a few diagnostics on the screen RETURN END I modified this to: LOGICAL*4 MYTEST B = 5.0 IF ( ( B .GT. 3.0 ) .OR. MYTEST(B) )THEN WRITE(6,*)' MAIN' ENDIF END LOGICAL FUNCTION MYTEST(X) REAL X WRITE(6,*)' MYTEST' MYTEST = .TRUE. RETURN END I checked the behavior with the IBM VS Fortran and Xlf compi- lers. In both cases the behavior was not changed by optimization. The behavior differs between the two compilers but that is a different is- sue. So this example will not convince me that I'm wrong. (Actually I later softened my original statement to say it is undesirable for optimization to affect a program's behavior.) James B. Shearer
jbs@WATSON.IBM.COM (06/19/91)
Dik Winter said: The origin of the discussion was a remark that interval arithmetic in software had been observed to be 300 times as slow as standard hardware floating point. While interval arithmetic had been observed to be 3 times as slow. Shearer questioned the number 3; I thought he questioned the order of magnitude but apparently he wants to know the exact number. Of course that can not be given as it is entirely machine dependent. Also the factor would be a function of the mix of operations performed. I believe the original remark referred to estimates not observa- tions. I questioned in passing whether 3x was a realistic estimate. I continue to believe it is extremely optimistic and that 10x slower would be more reasonable. Dik Winter: About the multiplication routine, JBS: > 1. Handling this by a subroutine call would seem to require 4 > floating loads to pass the input arguments and 2 floating loads to re- > trieve the answers. This is already expensive. But you have in most cases to load these 4 numbers anyhow, so why would this be more expensive? Why loads to retrieve the answers? Why not just let them sit in the FP registers? If you are doing your operations memory to memory then this is correct. However if you are keeping intermediate results in registers as will often be the case, then you must move your 4 operands from the registers they are in to the registers the subroutine expects them in and you must move your 2 answers out of the return registers into the registers you want them in (if you just let them sit they will be wiped out by the next call). In general it will be possible to avoid doing some of this movement but I don't see how to avoid all of it. Dik Winter: Moreover, never it was said that there is an exact factor of 3 involved; that factor was simply observed, for a set of programs. > 3. Do you know of any machine where the above code will average > 3x (or less) the time of a single multiply? So this is irrelevant. Who observed this? Under what conditions? Dik Winter: > On another topic Dik Winter said: > But you can get reasonable results without any pivoting if the condition > is very good! I should have added the context, where not only the condition of the complete matrix is very good, but also of all its principal minors. This still isn't right. Consider e 1 e small (but not zero). 1 e Dik Winter (in a later post): I am *not* an advocate for interval arithmetic (the people at Karlsruhe are). I do not use it. But I object to the way Shearer handles this: a. Shearer asks: what is the justification for the different rounding modes. b. Many responses come: interval arithmetic. c. Shearer asks: would it not be better helped with quad arithmetic? d. Response: observed speed difference a factor 3 with hardware rounding modes, a factor 300 in software. e. Shearer questions the factor 3. Apparently he believes the factor 300 (does he?). Even if the factor 3 would degrade on other machines to a factor of 5 or even 10, the difference with 300 is still striking. I ask Shearer again: come with an interval add assuming the base arithmetic is round to nearest only (or even worse, with truncating arithmetic, which you advocate in another article). Some comments: Regarding b if interval arithmetic is the only reason for the different rounding modes then I think they may safely be junked. Regarding c what I actually said was I thought quad precision would provide some support for interval arithmetic (not better support). In any case upon further reflection I will withdraw this statement. Regarding d as I said above I believe these were estimates. Regarding e I don't really believe the 300x either and I part- icularly don't believe the 100x ratio. Regarding how I would implement interval arithemtic it is not particularly difficult to do without the rounding modes as long as you don't insist on maintaining the tightest possible intervals. There is no great loss in being a little sloppy since an extra ulp only matters for narrow intervals and the main problem with interval arithmetic is that the intervals don't stay narrow even if you are careful. James B. Shearer
dce@mitre.org (Dale Earl) (06/20/91)
> >Henry Spencer wrote (a while back now): > >My understanding is that the funnier-looking features, in particular the >infinities, NaNs, and signed zeros, mostly cost essentially nothing. > > I believe this statement is incorrect. Infs and nans compli- >cate the design of the floating point unit. Even if there is ultimately >no performance impact, this added design effort represents a very real >cost. Perhaps some hardware designers could comment. With my previous employer I helped design several floating point units and in my experience the most difficult part of the IEEE specification to handle were the denormalized numbers. The infinities, NANs, signed zeros, and rounding modes do not make the hardware substantially more complicated or significantly decrease performance. As has been pointed out in other related postings these features can be valuable for many numerical algorithms. The value vs. cost for denomalized numbers though is not favorable IMHO. I feel a representation for an infinitely small non-zero value would be useful and much easier to implement in hardware. But if you have to be able to say that your device is 100% IEEE floating point compatible you get stuck implementing the entire standard no matter what the hardware tradeoffs. A related method we also employed on another device that did not have to be 100% compliant may be of interest. This device performed both single precision integer and floating point operations. Therefore we had a full 32 bit adder and multiplier available. For this device we made the on board register file 40 bits wide, 8 bits for exponent and 32 bits for sign and mantissa. Without going into all the details, the integer operands used the 32 bit field and were processed in a normal manner (exponent field was passed unchanged). Floating point was performed using the full 32 bit mantissa with rounding only on output from the chip. This carried an extra 8 bits of precision from intermediate operations which remained within the chip. The chip also could extract the exponent field into an integer so we could emulate multiple precision in software if desired. Not exactly IEEE standard but probably would work well for many real applications where better precision is more important than rigorous numerical error bounding. (Please excuse this resend if you have seen it before, distribution problems.) -- Dale C. Earl dce@mitre.org These are my opinions, only my opinions, and nothing but my opinions! DCE
dik@cwi.nl (Dik T. Winter) (06/20/91)
In article <9106190252.AA29755@ucbvax.Berkeley.EDU> jbs@WATSON.IBM.COM writes: > Regarding how I would implement interval arithemtic it is not > particularly difficult to do without the rounding modes as long as you > don't insist on maintaining the tightest possible intervals. There is > no great loss in being a little sloppy since an extra ulp only matters > for narrow intervals and the main problem with interval arithmetic is > that the intervals don't stay narrow even if you are careful. This is were your opinion differs from the opinion of the Karlsruhe people. In another posting someone asked how to do pivoting when interval arithmetic is used. Also that misrepresents the work of the Karlsruhe people. To reiterate what the Karlsruhe people are doing to solve a set of linear equation: 1. Calculate an approximate technique using your favorite solver. No interval arithmetic is done in this stage! 2. Next find an enclosing interval for the solution. 3 Narrow the interval using iterative methods with the original matrix. So we see that *no* interval pivoting is done. Also to succesful narrow the interval in stage 3 it is important that the arithmetic provides the narrowest interval for every operation. In many cases they succeed in finding solutions were each component of the solution is pinned between two successive machine numbers. I have my doubts about it (many people have). But this is the most senseful approach to interval arithmetic I know of. Most work (finding the initial iteration) is done without any interval arithmetic. You use it only to squeeze out the last few bits. -- dik t. winter, cwi, amsterdam, nederland dik@cwi.nl
jpk@Ingres.COM (Jon Krueger) (06/20/91)
jbs@WATSON.IBM.COM writes: >I checked the behavior [of Malcom's code sample] with the IBM VS >Fortran and Xlf compilers. In both cases the behavior was not changed >by optimization ... So this example will not convince me that I'm wrong. I see. Therefore no standard FORTRAN compiler would be permitted to do any differently, right? These compilers define what is and is not a legal optimization, is that your assertion? -- Jon Jon Krueger, jpk@ingres.com
koenig@urz.unibas.ch (06/20/91)
In article <KHB.91Jun11185538@chiba.Eng.Sun.COM>, khb@chiba.Eng.Sun.COM (Keith Bierman fpgroup) writes: > > In article <9106120054.AA19903@ucbvax.Berkeley.EDU> jbs@WATSON.IBM.COM writes: > > instructions into the IEEE standard and they have been implemented > in hardware in many machines. Interval arithmetic still hasn't > gotten off the ground. How long are we expected to wait until > concluding it's not going to fly. > > Until vendors (other than apple and sun) provide hooks that people can > actually use. Had such hooks been part of _any_ language standard, I > suspect folks with such interests would have merrily coded up systems. > > 3. The real problem with interval arithmetic is not that it's > slow but that on real applications its error estimates are gener- > ally so pessimistic as to be useless. > > At last, something we agree on! But I confess that I have not used it > extensively, so I am prepared to believe that it may have utility I am > not personally cognizant of. Interval techniques will have to be > deployed in large quantities before it can be counted out Obviously, these discussions are about the *naive* approach to interval arithmetic, replacing all floating-point numbers by intervals and re- placing all floating-point operations of an algorithm by interval operations. - In consequence, a huge inflation occurs, if the number of operations increases, because dependent quantities are handled like independent quantities. A simple example: With X being an interval, X - X is not equal to zero(in general), but yields an interval, which lies symmetrically around zero and has a diameter twice as great as that of X. This is because the resulting interval must include all differences between any two values out of X at interval subtraction. The second operand is independently chosen from the first. Hence, interval arithmetic must be applied to special algorithms for achieving highly accurate results. Since more than one decade, mathe- maticians at the University of Karlsruhe (Germany) designed such methods, which are based on fixed point theorems (Brouwer) and yield highly accurate verified solutions - even in ill-conditioned cases. Example: The Hilbert matrix of dimension 14 has the condition number 1.85e+19. The inverse of that matrix can be computed with maximum accuracy (all components are intervals without any floating-point number between the lower and the upper bound!) using the IEEE double extended format and an implementation of the scalarproduct with maximum accuracy. Furthermore the algorithm yields not only the optimal verified enclosure of the solu- tion, but proves its existence and its uniqueness, too. Stefan Koenig -------------------------------------------------------------------------- Here are some references for details. @ARTICLE{BRUG87, AUTHOR = "G. Bohlender and L.B. Rall and Chr. Ullrich and J. Wolff von Gudenberg ", TITLE = "{PASCAL-SC: A Computer Language for Scientific Computation}", JOURNAL = "Perspectives in computing", VOLUME = "17", YEAR = "1987", NOTE = "Orlando"} @INPROCEEDINGS{Bohl90, AUTHOR = {G. Bohlender}, TITLE = "{What do we need beyond IEEE Arithmetic?}", BOOKTITLE = "{Computer Arithmetic and Self-Validating Numerical Methods}", PUBLISHER = "Academic Press", EDITOR = "{Chr. Ullrich}", ADDRESS = "Boston", YEAR = "1990"} @ARTICLE{FSH88, AUTHOR = {H.C. Fischer and G. Schuhmacher and R. Haggenm{\"u}ller}, TITLE = "{Evaluation of Arithmetic Expression with Guaranteed High Accuracy}", JOURNAL = "Computing, Suppl.", VOLUME = "6", PAGES = "149--158", YEAR = "1988", NOTE = "Springer, Wien"} @BOOK{KM84, AUTHOR = "E. Kaucher and W.L. Miranker:", TITLE = "{Self-Validating Numerics for Function Space Problems}", PUBLISHER = "Academic Press", ADDRESS = "New York", YEAR = "1984"} @BOOK{KM81, AUTHOR = "U. Kulisch and W.L. Miranker", TITLE = "{Computer Arithmetic in Theory and Practice}", PUBLISHER = "Academic Press", ADDRESS = "New York", YEAR = "1981"} @BOOK{KM83, EDITOR = "U. Kulisch and W.L. Miranker", TITLE = "{A New Approach to Scientific Computation}", PUBLISHER = "Academic Press", ADDRESS = "New York", YEAR = "1983"} @ARTICLE{KS88, AUTHOR = "U. Kulisch and H.J. Stetter", TITLE = "{Scientific Computation with Automatic Result Verification}", JOURNAL = "Computing Supplementum", VOLUME = "6", YEAR = "1988", NOTE = "Springer, Wien"} @INPROCEEDINGS{Kauc85, AUTHOR = "E. Kaucher", TITLE = "Self-Validating Methods for Special Differential-Equation-Problems", BOOKTITLE = "11th IMACS World Congress on System Simulation and Scientific Computation", YEAR = "1985", PAGES = "115", ORGANIZATION = "IMACS", ADDRESS = "Oslo, Norway" } @INPROCEEDINGS{Rall85, AUTHOR = "L.B. Rall", TITLE = "Computable Bounds for Solutions of Integral Equations", BOOKTITLE = "11th IMACS World Congress on System Simulation and Scientific Computation", YEAR = "1985", PAGES = "119", ORGANIZATION = "IMACS", ADDRESS = "Oslo, Norway" } @INPROCEEDINGS{Rump82, AUTHOR = "S.M. Rump", TITLE = "Solving Algebraic Problems with High Accuracy", BOOKTITLE = "10th IMACS World Congress on System Simulation and Scientific Computation", YEAR = "1982", PAGES = "389", ORGANIZATION = "IMACS", ADDRESS = "Montreal, Canada" } @INPROCEEDINGS{Mark89, AUTHOR = "S. Markov", TITLE = "Least-square Approximation under Interval Data", BOOKTITLE = "International Symposium on Computer Arithmetic and Self-Validating Numerical Methods", YEAR = "1989", PAGES = "", ORGANIZATION = "IMACS-GAMM-GI", ADDRESS = "Basel, Switzerland" } @INPROCEEDINGS{Lohn89, AUTHOR = "R. Lohner", TITLE = "Computing Enclosures for the Solution of Nonlinear Parameter Dependent Systems and Ordinary Boundary Value Problems", BOOKTITLE = "International Symposium on Computer Arithmetic and Self-Validating Numerical Methods", YEAR = "1989", PAGES = "", ORGANIZATION = "IMACS-GAMM-GI", ADDRESS = "Basel, Switzerland" } @INPROCEEDINGS{Corl89, AUTHOR = "G.F. Corliss", TITLE = "How Can You Apply Interval Techniques in an Industrial Setting?", BOOKTITLE = "International Symposium on Computer Arithmetic and Self-Validating Numerical Methods", YEAR = "1989", PAGES = "", ORGANIZATION = "IMACS-GAMM-GI", ADDRESS = "Basel, Switzerland" }
jbs@WATSON.IBM.COM (06/21/91)
I said: > I checked the behavior with the IBM VS Fortran and Xlf compi- >lers. In both cases the behavior was not changed by optimization. The >behavior differs between the two compilers but that is a different is- >sue. So this example will not convince me that I'm wrong. (Actually >I later softened my original statement to say it is undesirable for >optimization to affect a program's behavior.) (My original statement was a compiler which compiles a legal program so that it behaves differently when compiled with optimization is "broken".) Jon Krueger selectively quoted this as: >I checked the behavior <of Malcom's code sample> with the IBM VS >Fortran and Xlf compilers. In both cases the behavior was not changed >by optimization ... So this example will not convince me that I'm wrong. and commented: :I see. Therefore no standard FORTRAN compiler would be permitted :to do any differently, right? These compilers define what is and :is not a legal optimization, is that your assertion? I have the following comments: 1. This is not my position as I believe the final sentence in my post (which you tendentiously failed to quote) makes clear. Do you disagree that it is undesirable for optimization to affect a program's behavior? 2. I believe a compiler can obey the Fortran standard and still be properly called broken. For example it can take 100 hours to compile a trivial program. 3. The person who posted the code example didn't say what the exam- ple was supposed to show. If it was supposed to show any compiler would produce code that behaved differently when compiling with optimization then it is incorrect. James B. Shearer
prener@watson.ibm.com (Dan Prener) (06/22/91)
In article <9106190124.AA27410@ucbvax.Berkeley.EDU>, jbs@WATSON.IBM.COM writes: |> |> I said: |> >I think that if a (legal) program behaves differently when compiled with |> >and without optimization that then the compiler is broken. Perhaps you |> >disagree. |> Malcolm said: |> B = 5.0 |> IF ( ( B .GT. 3.0 ) .OR. MYTEST(A) ) |> PRINT some diagnotic messages |> LOGICAL FUNCTION MYTEST(X) |> REAL X |> MYTEST = some boolean work |> PRINT a few diagnostics on the screen |> RETURN |> END |> I modified this to: |> LOGICAL*4 MYTEST |> B = 5.0 |> IF ( ( B .GT. 3.0 ) .OR. MYTEST(B) )THEN |> WRITE(6,*)' MAIN' |> ENDIF |> END |> LOGICAL FUNCTION MYTEST(X) |> REAL X |> WRITE(6,*)' MYTEST' |> MYTEST = .TRUE. |> RETURN |> END |> I checked the behavior with the IBM VS Fortran and Xlf compi- |> lers. In both cases the behavior was not changed by optimization. The |> behavior differs between the two compilers but that is a different is- |> sue. So this example will not convince me that I'm wrong. (Actually |> I later softened my original statement to say it is undesirable for |> optimization to affect a program's behavior.) |> James B. Shearer As immodest as it might be for me to agree with my colleague, I must agree with James Shearer. Indeed, the spirit of FORTRAN is, overwhelmingly, that a program whose behavior is changed by optimization is not a legal program. In particular, I quote from the FORTRAN '77 standard It is not necessary for a processor to evaluate all of the operands of an expression if the value of the expression can be determined otherwise. This principle is most often applicable to logical expressions, but it applies to all expressions. For example, in evaluating the logical expression X .GT. Y .OR. L(Z) where X, Y, and Z are real, and L is a logical function, the function reference L(Z) need not be evaluated if X is greater than Y. to show that the compiler may do it either way. And I cite another passage from the same standard For example the statements A(I) = F(I) Y = G(X) + X are prohibited if the reference to F defines I or the reference to G defines X. which suggests that programs that admit more than one correct interpretation are illegal. I would say that, since the program(s) offered above are, according to the language standard, non-deterministic, then they are illegal. Note that optimization has nothing to do with this. According to the standard, a compiler could correctly produce a program which would call the function MYTEST or not depending upon the time of day at which it was run. -- Dan Prener (prener @ watson.ibm.com)
preston@ariel.rice.edu (Preston Briggs) (06/23/91)
prener@prener.watson.ibm.com (Dan Prener) writes: >Indeed, the spirit of FORTRAN is, overwhelmingly, that a program >whose behavior is changed by optimization is not a legal program. A wonderfully concise, quotable statement. So much nicer than Shearer's assertion that the optimizer must be broken.
ok@goanna.cs.rmit.oz.au (Richard A. O'Keefe) (06/24/91)
In article <1991Jun22.003029.16748@watson.ibm.com>, prener@watson.ibm.com (Dan Prener) writes: > As immodest as it might be for me to agree with my colleague, I must > agree with James Shearer. Indeed, the spirit of FORTRAN is, overwhelmingly, > that a program whose behavior is changed by optimization is not a legal > program. In that case, a subroutine like SUBROUTINE FOO(X, A, B, C) REAL X, A, B, C X = A*B + A*C END must not be legal. Darn it, I used to _like_ Fortran. (Yes, optimisation _may_ change the behaviour of this program in a signficant way.) Perhaps more obviously, SUBROUTINE FOO(X, A, B, C) REAL X, A, B, C X = A*B X = X + A*C END may _also_ have its behaviour changed by certain optimisations. (Hint: Sun's compilers have a command-line flag related to this one.) The use of parentheses won't help here. The moral of the story is that the values and operations provided by floating-point hardware _apprximate_ that values and operations of real arithmetic, but do not obey the same laws, so that a language which licences transformations which are valid for the reals but invalid for the floats is a language which licences a compiler to produce different numeric results (possibly HUGELY different) depending on optimisation level. -- I agree with Jim Giles about many of the deficiencies of present UNIX.
jpk@Ingres.COM (Jon Krueger) (06/25/91)
jbs@WATSON.IBM.COM (James B. Shearer) writes: > Do you disagree that it is undesirable for optimization to affect > a program's behavior? Yes. It is a good and joyful thing, always and everywhere, for compilers to expose broken programs. Correct programs can be written in available languages. The meaning of such programs does not depend on whether compilers apply permissible optimizations. -- Jon Jon Krueger, jpk@ingres.com
jbs@WATSON.IBM.COM (06/26/91)
Dan Prener said: >I would say that, since the program(s) offered above are, according to >the language standard, non-deterministic, then they are illegal. But any program with floating point arithmetic in it is, according to the language standard, non-deterministic (since the lang- uage standard does not specify the floating point model). Surely you don't believe all such programs are illegal. Preston Briggs quotes Dan Prener: :>Indeed, the spirit of FORTRAN is, overwhelmingly, that a program :>whose behavior is changed by optimization is not a legal program. and adds: >A wonderfully concise, quotable statement. >So much nicer than Shearer's assertion that the optimizer must be broken. Isn't it however logically equivalent to my statement: >I think that if a (legal) program behaves differently when compiled with >and without optimization that then the compiler is broken. I asked > Do you disagree that it is undesirable for optimization to affect > a program's behavior? Jon Krueger replied: >Yes. >It is a good and joyful thing, always and everywhere, for >compilers to expose broken programs. >Correct programs can be written in available languages. >The meaning of such programs does not depend on whether >compilers apply permissible optimizations. My comments: 1. I would not buy a compiler from someone who enjoys breaking user code. 2. As I pointed out above the behavior of any Fortran program with floating point operations is not specified by the standard. If you believe optimizers may make any transformation permitted by the standard then the meaning of any program with floating point opera- tions may depend on the optimization level. Hence the above quoted statement would imply all such programs are not correct. In general I am disturbed by the attitude that just because the fortran standard permits some atrocity (such as evaluating a*x+ a*y as a*(x+y)) it is improper to criticize a compiler for doing it. In my view the standard represents a least common denominator set of minimal requirements and any quality compiler will obey a stronger set of requirements. In this regard, I am not a language lawyer but it is my understanding that since the standard says nothing about the accuracy of intrinsic functions a compiler could replace sin(x) with 0 and cos(x) with 1 and be standard compliant. Is this the case? James B. Shearer
states@artemis.nlm.nih.gov (David States) (06/26/91)
In article <9106252305.AA27299@ucbvax.Berkeley.EDU>, jbs@WATSON.IBM.COM writes: |> In general I am disturbed by the attitude that just because |> the fortran standard permits some atrocity (such as evaluating a*x+ |> a*y as a*(x+y)) it is improper to criticize a compiler for doing it. |> In my view the standard represents a least common denominator set of |> minimal requirements and any quality compiler will obey a stronger |> set of requirements. A compiler can optimize for many things, (speed, space, I/O etc.), and the use of feedback in optimization through profiling is established if not necessarily wide spread, so why not allow a compiler to optimize for precision? Most readers of this group have undoubtedly seen their share of sloppy/ill considered/hasty code. Wouldn't a little help from the compiler be welcome? Issues: What is the appropriate measure? average error, worst case error, worst observed error, some combination of the above? Is the simple rearrangement of arithmetic sufficient, or would the compiler need to "know" trig identities and the properties of transcendtal functions to be effective? Is it OK to break a program which depends on imprecision/error to run as expected (correctly???)? How effective could this be without profiling and feedback? What would you need to save in profiling if you used it? |> James B. Shearer David States
slackey@bbn.com (Stan Lackey) (06/27/91)
In article <1991Jun24.211407.17389@ingres.Ingres.COM> jpk@Ingres.COM (Jon Krueger) writes: >jbs@WATSON.IBM.COM (James B. Shearer) writes: >> Do you disagree that it is undesirable for optimization to affect >> a program's behavior? >Yes. >It is a good and joyful thing, always and everywhere, for >compilers to expose broken programs. Many vector machines do summations in different than program order; thus optimization on vs. off can and does produce different answers. They can be very different depending upon the actual numbers. -Stan
jgreen@Alliant.COM (John C Green Jr) (06/28/91)
This discussion involves non `standard conforming' programs which get different answers with different compilers or with different optimization options on the same compiler. My favorite program of this type in genuine upper case FORTRAN IV was constructed by Stanley Rabinowitz in 1969. He got paid $50 by the magazine "Software Age" which published it in their column "X-Tran's Adventures In Troubletran". LOGICAL FOO I = 0 IF ( I .EQ. 0 .OR. FOO ( I ) ) I = I + 2 WRITE ( 6 , 99 ) I STOP 99 FORMAT ( 1X , 1I1 ) END LOGICAL FUNCTION FOO ( J ) J = J + 1 FOO = .FALSE. RETURN END This was run on an IBM System/360 Model 50 under three of the world's most popular compilers: Fortran G Answer: 1 Fortran H Answer: 2 Watfor Answer: 3 Watfor checked to see if I was 0 then called the function. Fortran H checked to see if I was 0 and then did not call the function. Fortran G called the function and then checked to see if I was 0. I found it amazing that such a small program can get three different answers with no compiler diagnostics and no runtime diagnostics on three of the most popular compilers on the same machine! The paragraph in the Fortran '66 standard says approximately: If a subroutine subprogram or a function subprogram changes any of its arguments or any element in COMMON then that argument or element of COMMON cannot appear anywhere else in any statement that calls the subroutine or invokes the function. In line 3 of the program the invocation of the function FOO changes I and I appears no less than 3 other times in this statement. The paragraph in the Fortran '77 standard section 6.6, `Evaluation of Expressions' says in part: The execution of a function reference in a statement may not alter the value of any other entity within the statement in which the function reference appears. The execution of a function reference in a statement may not alter the value of any entity in common that affects the value of any other function reference in that statement. However, execution of a function reference in the expression `e' of a logical IF statement is permitted to affect entities in the statement `st' that is executed when the value of the expression `e' is true. In line 3 of the program the invocation of the function FOO changes I and I appears in the expression `e' of the logical IF statement. In Fortran '77 there is no problem with the I = I + 2.
jpk@Ingres.COM (Jon Krueger) (06/28/91)
> 1. I would not buy a compiler from someone who enjoys breaking >user code. Neither would I. But a compiler which exposes broken user code is a different animal. -- Jon Jon Krueger, jpk@ingres.com
jbs@WATSON.IBM.COM (06/29/91)
I said: :> 1. I would not buy a compiler from someone who enjoys breaking :>user code. Jon Krueger replied: >Neither would I. >But a compiler which exposes broken user code is a different animal. User code which complies with a variant or extension of the Fortran 77 (or any other) standard may not be portable but it is not "broken". The DO, ENDDO construct is not in the Fortran 77 standard. Nevertheless many Fortran 77 compilers recognize it and many users use it. I would not advise omitting it from your compiler in order to "expose" "broken" user code. James B. Shearer
jpk@Ingres.COM (Jon Krueger) (06/29/91)
I wrote: >>It is a good thing for compilers to expose broken programs. Stan wrote: >Many vector machines do summations in different than program order; thus >optimization on vs. off can and does produce different answers. They >can be very different depending upon the actual numbers. Not what I was referring to. Such compilers are broken. The meaning of some programs depends on the ordering of some of their steps. To change the ordering is to change the meaning -- an impermissible optimization. A program in a sequential language may reasonably depend on the sequential ordering of its steps. Such programs are not broken. A program in a language where order of expression evaluation is undefined may not reasonably depend on left-to-right evaluation. Such programs are broken. -- Jon Jon Krueger, jpk@ingres.com
mccalpin@perelandra.cms.udel.edu (John D. McCalpin) (06/29/91)
>>>>> On 28 Jun 91 23:13:12 GMT, jbs@WATSON.IBM.COM said:
jbs> The DO, ENDDO construct is not in the Fortran 77 standard.
jbs> Nevertheless many Fortran 77 compilers recognize it and many users use
jbs> it. I would not advise omitting it from your compiler in order to
jbs> "expose" "broken" user code.
The word "broken" is clearly being used in this context to refer to
code whose behavior is undetermined under the standard --- therefore
*logically* "broken". The examples which have been the topic of
discussion here depend on either subtle issues of floating-point code
re-arrangement or (in the other thread) on a particular
implementation's treatment of functions with side effects.
The use of extensions is an entirely different issue. The resulting
codes are *syntactically* incorrect FORTRAN-77 --- their *logical*
interpretation under a *superset* of the FORTRAN language is
independent of the fact that extensions are used.
I certainly appreciate any efforts that a compiler makes to tell me my
code is either broken or non-portable. Most major vendors provide
compiler switches to flag ANSI non-compliance. Very few provide
robust and comprehensive extensions to detect common *logical* errors,
such as:
-- use of uninitialized variables
-- argument mismatch in calling sequences
-- use of undeclared variables (mispellings)
-- out-of-bounds array references in called routines
(often a result of an argument mismatch)
--
John D. McCalpin mccalpin@perelandra.cms.udel.edu
Assistant Professor mccalpin@brahms.udel.edu
College of Marine Studies, U. Del. J.MCCALPIN/OMNET
chuck@pluto.Harris-ATD.com (Chuck Musciano) (06/30/91)
In article <1991Jun27.215747.5329@ingres.Ingres.COM>, jpk@Ingres.COM (Jon Krueger) writes: > > 1. I would not buy a compiler from someone who enjoys breaking > >user code. > > Neither would I. > > But a compiler which exposes broken user code is a different animal. Oh, people who can break user code make the very best compiler writers. As every compiler writer knows (and I'm proud to call myself one), the easy cases are the obvious cases. The real work in a compiler is in the borderline stuff, the strange combinations of language features, the weird usages. A person that is always thinking of the bizarre situation that can break user code is just the person to work on a compiler, because they are thinking of those situations and making sure the compiler can handle them. I can recall cases where we were designing a multiprocessor version of Pascal, and someone wanted list processing primitives in the langauge. I opposed them, favoring instead more atomic primitives that could be used to construct list primitives. Only by coming up with strange and unusual, yet perfectly valid, usages could I convince the others that the atomic way was the way to go. -- Chuck Musciano ARPA : chuck@trantor.harris-atd.com Harris Corporation Usenet: ...!uunet!x102a!trantor!chuck PO Box 37, MS 3A/1912 AT&T : (407) 727-6131 Melbourne, FL 32902 FAX : (407) 729-3363 A good newspaper is never good enough, but a lousy newspaper is a joy forever. -- Garrison Keillor