[comp.arch] IEEE arithmetic

jbs@WATSON.IBM.COM (06/01/91)

          Henry Spencer says:
My understanding is that the funnier-looking features, in particular the
infinities, NaNs, and signed zeros, mostly cost essentially nothing.

          I believe this statement is incorrect.  Infs and nans compli-
cate the design of the floating point unit.  Even if there is ultimately
no performance impact, this added design effort represents a very real
cost.  Perhaps some hardware designers could comment.
          Also as has already been pointed out there is a software cost
due to such features as 0*x no longer being 0.  In some cases this will
have a performance impact in that compilers will generate inferior code.
          Henry Spencer says:
Getting the rounding right reportedly takes work, but that one seems easy
to support.

          As far as round to nearest goes, this does provide a real
benefit.  I am not totally convinced it's worth the cost but a case can
made.  The case for the round to even (when halfway) seems much more
dubious.  I believe this represents a substantial portion of the cost
(comments?) and the additional benefits seem quite tenuous.  I don't
see any good reason for the other 3 modes.
          I said:
>arithmetic may compute a totally bogus result with no error indication
>(other than the setting of an overflow bit which nobody looks at).  On
>other systems the error is usually more obvious.
          Henry Spencer said:
Only if the author goes to substantially more trouble than merely looking
at an overflow bit.  This really sounds to me like the old "people wrote
better when they had to retype a whole page to fix an error, because they
had to think about it more" fallacy.  People who took care before will
continue to take care; people who didn't before won't now.  The difference
is that both are somewhat less likely to get plausible-looking garbage now.
Nobody is proposing that IEEE arithmetic is a replacement for care and
understanding.  All it does is improve the odds that a given amount of
care and understanding will produce meaningful answers.

          I guess I was a bit obscure here.  On IBM hex systems when
floating overflow occurs the default behavior is for an error message
to be printed which includes a traceback pointing to the offending
instruction.  Execution then continues (with the max floating number as
a fixup) but will terminate if more than a small number (5?) of over-
flows occur.  I was under the impression that this was typical behavior
for pre-IEEE systems.  Is that not the case?
          I consider this behavior much friendlier than typical IEEE
behavior which is to run to completion then output infs and nans (or
worse plausible but wrong numbers) with no indication of where the
error occurred.
          It seems to me that the existence of infs in IEEE is in-
creasing the likelihood of obtaining plausible-looking garbage.
          I said quoting the paper:
>>According to Kahan, extended precision has 64 bits of significand be-
>>cause that was the widest precision across which carry propagation
>>could be done on the Intel 8087 without increasing the cycle time... >
>         This may constitute justification to 8087 designers, I don't
>see why it should be for the rest of us.
          Henry Spencer said:
To me it sounded like "the precise number of bits was not thought to be
crucial, so the limitations of a very important existing implementation
were taken into consideration".  If you don't like this, explain why the
resulting number of bits is wrong or inadequate.  If it's satisfactory,
why do you care how it was arrived at?  Standards decisions often turn
on such seemingly trivial or ephemeral issues; what matters is the
result.

          I was unaware the Intel 8087 could be considered a "very
important existing implementation" as compared to the DEC format for
example.  Do you have some evidence for this?
          79 bits is clearly an absurd length for a floating point
format (the same is true of the 43 bit single extended format).  As well
being very awkward the extended lengths are too close in precision to
the unextended lengths.  The rationale behind the extended formats:
providing extra precision for intermediate terms is ridiculous.  If the
added precision is available it should always be used otherwise it is
of little value (see various comments on hex format).
                       James B. Shearer

henry@zoo.toronto.edu (Henry Spencer) (06/02/91)

In article <9106010224.AA28532@ucbvax.Berkeley.EDU> jbs@WATSON.IBM.COM writes:
>>My understanding is that the funnier-looking features, in particular the
>>infinities, NaNs, and signed zeros, mostly cost essentially nothing.
>
>          I believe this statement is incorrect.  Infs and nans compli-
>cate the design of the floating point unit...

I've had some mail from people who've designed FPUs.  They say that the infs
make no difference to speak of, nor do the quiet NaNs.  They do say that the
signalling NaNs are a pain -- as is anything that can cause an exception --
on a pipelined machine with tight constraints on exception behavior.

>          I guess I was a bit obscure here.  On IBM hex systems when
>floating overflow occurs the default behavior is for an error message
>to be printed which includes a traceback pointing to the offending
>instruction.  Execution then continues (with the max floating number as
>a fixup) but will terminate if more than a small number (5?) of over-
>flows occur.  I was under the impression that this was typical behavior
>for pre-IEEE systems.  Is that not the case?

Not the ones I've used.  You could tell them to do this, but then you
can tell an IEEE system to do it too.

>          It seems to me that the existence of infs in IEEE is in-
>creasing the likelihood of obtaining plausible-looking garbage.

I'm not sure how.  Certainly forcing a result to inf is much better than
forcing it to some random number (e.g. the largest representable one),
because infs propagate through successive calculations, and generally don't
get turned back into plausible-looking garbage.

>>To me it sounded like "the precise number of bits was not thought to be
>>crucial, so the limitations of a very important existing implementation
>>were taken into consideration"...
>          I was unaware the Intel 8087 could be considered a "very
>important existing implementation" as compared to the DEC format for
>example.  Do you have some evidence for this?

Its implementation was well underway at the time and it plausibly was going
to sell far more FPUs than any DEC design.  Besides, you're missing the point
slightly:  *in the context of IEEE arithmetic*, the 8087 was an important
implementation.  The DEC implementations were irrelevant to the question of
how long IEEE extended formats should be, since they were not IEEE at all,
the campaign to base IEEE on DEC FP having failed (according to Goldberg,
mostly because DEC format had no room for things like infs and NaNs).

>          79 bits is clearly an absurd length for a floating point
>format (the same is true of the 43 bit single extended format)...

They are minima, not precise requirements, as witness the people who
implement single extended as 64-bit double so they can claim conformance
with the standard's very strong recommendation that there be an extended
format for every "normal" format.

>... The rationale behind the extended formats:
>providing extra precision for intermediate terms is ridiculous.  If the
>added precision is available it should always be used otherwise it is
>of little value...

Having had some need to compute hypotenuses without overflow in code that
otherwise wanted to use single precision for speed, I'm afraid I disagree.
Whether it is "of little value" depends a lot on what you are doing.
-- 
"We're thinking about upgrading from    | Henry Spencer @ U of Toronto Zoology
SunOS 4.1.1 to SunOS 3.5."              |  henry@zoo.toronto.edu  utzoo!henry

dik@cwi.nl (Dik T. Winter) (06/02/91)

In article <9106010224.AA28532@ucbvax.Berkeley.EDU> jbs@WATSON.IBM.COM writes:
 >           I guess I was a bit obscure here.  On IBM hex systems when
 > floating overflow occurs the default behavior is for an error message
 > to be printed which includes a traceback pointing to the offending
 > instruction.  Execution then continues (with the max floating number as
 > a fixup) but will terminate if more than a small number (5?) of over-
 > flows occur.  I was under the impression that this was typical behavior
 > for pre-IEEE systems.  Is that not the case?
No, it was not.  Amongst the system I used I have seen:
1.	Replace overflown value by largest representable value without
	any notification.
2.	Immediately trap on overflow, do not continue.
3.	Replace overflown value by an infinity which traps when used.
And more strategies have probably been implemented.

 >           I consider this behavior much friendlier than typical IEEE
 > behavior which is to run to completion then output infs and nans (or
 > worse plausible but wrong numbers) with no indication of where the
 > error occurred.
IEEE allows the user to specify that overflow traps.  However, especially
on the RS6000, this leads to e terrific decrease in performance.  So
apparently it is not cheap to trap.

 >           It seems to me that the existence of infs in IEEE is in-
 > creasing the likelihood of obtaining plausible-looking garbage.
The likelyhood is increased if you, the user, does not want to trap.  But
it also leads to some nice properties.  I disagree with the default
behaviour of IEEE, but I agree with the possibilities.

 >           79 bits is clearly an absurd length for a floating point
 > format (the same is true of the 43 bit single extended format).  As well
 > being very awkward the extended lengths are too close in precision to
 > the unextended lengths.
Perhaps, but note that the IEEE requirements for extended precision are
minimal requrements, so you may provide more in your implementation.  That
all systems that do provide extended double precision do provide the exact
requirements might tell something.  Note also that if a system implements
both IEEE single and IEEE double precision that double precision can be
seen as extended single precision (and I indeed do not know a single system
that implements true extended single precision).

 >                          The rationale behind the extended formats:
 > providing extra precision for intermediate terms is ridiculous.  If the
 > added precision is available it should always be used otherwise it is
 > of little value (see various comments on hex format).
Providing extra precision for intermediate terms is *not* ridiculous.
Witness the maf instructions on the RS6000.  Witness also the many old
papers that use inner products calculated in extra precision.
The problems with hex format are the following:
1.	The wobbling precision must let you use the worst case for error
	estimates.
2.	This makes single precision much more useles than binary single
	precision is already.
Note also that Blaauw reportedly has said that were he to design FP again
he would not use hex.
--
dik t. winter, cwi, amsterdam, nederland
dik@cwi.nl

cet1@cl.cam.ac.uk (C.E. Thompson) (06/03/91)

In article <9106010224.AA28532@ucbvax.Berkeley.EDU> jbs@WATSON.IBM.COM writes:
>
>          I guess I was a bit obscure here.  On IBM hex systems when
>floating overflow occurs the default behavior is for an error message
>to be printed which includes a traceback pointing to the offending
>instruction.  Execution then continues (with the max floating number as
>a fixup) but will terminate if more than a small number (5?) of over-
>flows occur.  I was under the impression that this was typical behavior
>for pre-IEEE systems.  Is that not the case?

Not particularly typical, as other postings have pointed out.

More to the point, what you are describing is the actions of a piece of
user-mode software: the IBM Fortran runtime system. There is nothing in
the IBM/3x0 architecture that specifies these fixup/termination actions;
it just generates a program interrupt. Indeed, doing the fixup involves
grubbing around in the failing instruction to find which floating point
register to modify.

Chris Thompson
JANET:    cet1@uk.ac.cam.phx
Internet: cet1%phx.cam.ac.uk@nsfnet-relay.ac.uk

colwell@pdx023.ichips (Robert Colwell) (06/03/91)

In article <9106010224.AA28532@ucbvax.Berkeley.EDU> jbs@WATSON.IBM.COM writes:
   Path: ichips!iWarp.intel.com!ogicse!dali.cs.montana.edu!uakari.primate.wisc.edu!zaphod.mps.ohio-state.edu!cis.ohio-state.edu!ucbvax!WATSON.IBM.COM!jbs

	     Henry Spencer says:
   My understanding is that the funnier-looking features, in particular the
   infinities, NaNs, and signed zeros, mostly cost essentially nothing.

	     I believe this statement is incorrect.  Infs and nans compli-
   cate the design of the floating point unit.  Even if there is ultimately
   no performance impact, this added design effort represents a very real
   cost.  Perhaps some hardware designers could comment.

Ok, I designed the square root/divide hardware in the first generation of
Multiflow's machines.  I'm mostly on Henry's side -- you do need to detect
inputs that can cause wrong answers (square root of negative, 0/0, etc.),
and it does take extra hardware and design time, but it isn't that hard,
and it doesn't affect the cycle time.  Of the "nanocode" that controlled
this floating point unit, something like half dealt with getting
single/double divide/sqrt results, and the other half looked for
exceptions, rounding, and the like.

Getting the exceptions right was harder than getting the basic
functionality right, and was tougher to validate.  We found four bugs in
this functional unit over the two years it was in use, and three were in
exceptions, not basic functionality.  But it still isn't all that hard:  I
spent three months coming up with the basic design from a standing start.
And since there was a standard, I could run industry standard validation
suites against it to build my own confidence in it.  To me, that alone
proved the worth of the standard.

Getting the denorms right is not easy, and it does affect performance.
Especially in a statically scheduled machine, where the compiler was
expecting the result in cycle N, but your floating point unit just figured
out that it generated an underflow in cycle N-1 and must now do some
additional processing.  This means that the floating unit must be able to
stall the machine, something it previously didn't need, and stall is
inevitably on the critical timing path.

Bob Colwell  colwell@ichips.intel.com  503-696-4550
Intel Corp.  JF1-19
5200 NE Elam Young Parkway
Hillsboro, Oregon 97124

jcallen@Encore.COM (Jerry Callen) (06/03/91)

In article <9106010224.AA28532@ucbvax.Berkeley.EDU> jbs@WATSON.IBM.COM writes:
>          I guess I was a bit obscure here.  On IBM hex systems when
>floating overflow occurs the default behavior is for an error message
>to be printed which includes a traceback pointing to the offending
>instruction.  Execution then continues (with the max floating number as
>a fixup) but will terminate if more than a small number (5?) of over-
>flows occur.  I was under the impression that this was typical behavior
>for pre-IEEE systems.  Is that not the case?
>          I consider this behavior much friendlier than typical IEEE
>behavior which is to run to completion then output infs and nans (or
>worse plausible but wrong numbers) with no indication of where the
>error occurred.

This may be obvious, but let's not confuse "compiler and runtime system 
behavior" with "floating point hardware behavior." Yes, Fortran on 360/370
systems behaves as you describe. In PL/1 it is possible to completely ignore
exponent overflow if you wish and get meaningless results.

Similarly, IEEE overflow behavior is under programmer control. If you WANT to 
ignore overflow and then just check to see if some error occurred at the
end of the calculation you may do so; if you want to know immediately, you can 
do that, too.

>                       James B. Shearer

-- Jerry Callen
   jcallen@encore.com

jbs@watson.ibm.com (06/04/91)

          Several people have challenged my characterization of typical
behavior on fp overflow in pre-IEEE environments.  In my defense I was
not intending to claim the exact behavior of IBM fortran was typical,
just the fact that x/(x*x+1) cannot silently produce garbage.  Even so
it appears that pre-IEEE behavior was more diverse than I had realized.
          I said:
>... The rationale behind the extended formats:
>providing extra precision for intermediate terms is ridiculous.  If the
>added precision is available it should always be used otherwise it is
>of little value...
          Henry Spencer says:
Having had some need to compute hypotenuses without overflow in code that
otherwise wanted to use single precision for speed, I'm afraid I disagree.
Whether it is "of little value" depends a lot on what you are doing.

          First a nitpick, you are using the extra exponent range not
extra precision.  In deciding whether a feature is of value one must
look at alternatives.  In your example the obvious alternative is to
compute the hypotenuses using double precision.  I do not see any reason
for an intermediate precision.  Btw how exactly would you expose an
intermediate format to the user?
          Dik Winter said (responding to the same statement):
Providing extra precision for intermediate terms is *not* ridiculous.
Witness the maf instructions on the RS6000.  Witness also the many old
papers that use inner products calculated in extra precision.
The problems with hex format are the following:
1.      The wobbling precision must let you use the worst case for error
        estimates.
2.      This makes single precision much more useles than binary single
        precision is already.
Note also that Blaauw reportedly has said that were he to design FP again
he would not use hex.

          I believe the maf instructions omit the intermediate round-
ing step primarily to improve performance.  I believe the papers to
which you refer basically require a way to obtain the double precis-
ion product of two single precision numbers.  I consider this a some-
what different issue.
          I agree that binary formats are superior to hex in that they
provide 2 extra bits of precision.  I don't agree (particually for 64-bit
formats) that this 2 bits makes it significantly easier to avoid numer-
ical problems.  If I were to design a FP format I would probably use
binary also.  I am not objecting to the use of binary.
                           James B. Shearer

henry@zoo.toronto.edu (Henry Spencer) (06/04/91)

In article <9106032144.AA15477@ucbvax.Berkeley.EDU> jbs@watson.ibm.com writes:
>>Having had some need to compute hypotenuses without overflow in code that
>>otherwise wanted to use single precision for speed, I'm afraid I disagree.
>
>... In deciding whether a feature is of value one must
>look at alternatives.  In your example the obvious alternative is to
>compute the hypotenuses using double precision...

Right, that is, using extended single precision.  That's why the extended
precisions are there, and why 754 very strongly recommends that an extended
precision be available for any normal precision.  What if I'd been using
double precision already?
-- 
"We're thinking about upgrading from    | Henry Spencer @ U of Toronto Zoology
SunOS 4.1.1 to SunOS 3.5."              |  henry@zoo.toronto.edu  utzoo!henry

jbs@WATSON.IBM.COM (06/04/91)

          Jerry Callen says:
This may be obvious, but let's not confuse "compiler and runtime system
behavior" with "floating point hardware behavior." Yes, Fortran on 360/370
systems behaves as you describe. In PL/1 it is possible to completely ignore
exponent overflow if you wish and get meaningless results.
Similarly, IEEE overflow behavior is under programmer control. If you WANT to
ignore overflow and then just check to see if some error occurred at the
end of the calculation you may do so; if you want to know immediately, you can
do that, too.

          I believe most users will find it difficult to do these things.
I, for example, do not know how do this in fortran on the IBM Risc System
6000.  This is a problem for other users on other machines as the "How to
detect NaN's" thread in comp.lang.fortran shows.
          Also it is desirable that any method for doing this should be
portable across IEEE systems (since IEEE is supposed to promote portabil-
ity) and have no significant performance impact when exceptions do not
occur (since otherwise users are encouraged to run in an unsafe way and
IEEE is supposed to promote safety).
                          James B. Shearer

dik@cwi.nl (Dik T. Winter) (06/04/91)

In article <9106032144.AA15477@ucbvax.Berkeley.EDU> jbs@watson.ibm.com writes:
 >                                             I believe the papers to
 > which you refer basically require a way to obtain the double precis-
 > ion product of two single precision numbers.  I consider this a some-
 > what different issue.
It would be a different issue if it where true.  But it is not true, the
papers I refer to are about the accumulation of an inner product in
double (extra) precision.  See for instance: J.H.Wilkinson, Rounding Errors
in Numeric Processes, Notes on Applied Science No. 32, HMSO, London;
Prentice Hall, New Jersey, 1963.   (Yes fairly old; it was already known
in that time!)

 >           I agree that binary formats are superior to hex in that they
 > provide 2 extra bits of precision.
A nit; they provide 3 more; nearly a digit.
 >                                     I don't agree (particually for 64-bit
 > formats) that this 2 bits makes it significantly easier to avoid numer-
 > ical problems.
Oh, but they are, see the paper I mentioned above.
--
dik t. winter, cwi, amsterdam, nederland
dik@cwi.nl

dricejb@drilex.UUCP (Craig Jackson drilex1) (06/04/91)

In article <9106010224.AA28532@ucbvax.Berkeley.EDU> jbs@WATSON.IBM.COM writes:
>
>          Henry Spencer says:
>My understanding is that the funnier-looking features, in particular the
>infinities, NaNs, and signed zeros, mostly cost essentially nothing.
>
>          I believe this statement is incorrect.  Infs and nans compli-
>cate the design of the floating point unit.  Even if there is ultimately
>no performance impact, this added design effort represents a very real
>cost.  Perhaps some hardware designers could comment.
>          Also as has already been pointed out there is a software cost
>due to such features as 0*x no longer being 0.  In some cases this will
>have a performance impact in that compilers will generate inferior code.

There is a significant software cost to not having NaNs--simulating them.
The reality is that data are often missing, and systems need to deal with
that fact.  One can either store the information as to whether a given
'double' is a valid floating point number as part of its bit representation,
or one can store it elsewhere.  (Perhaps as a bitmap attached to each
array in the application system.)

Many older systems used some valid floating point number, such as '-999',
as their missing-data specifier.  Such a convention is highly portable, but
runs the risk of valid data being interpreted as missing data.  When NaNs
are available, it becomes possible to use them to signify missing data.
Generally, normal NaN-producing operations will only produce a few of
the possible NaN bit patterns, and the remaining ones can be used to
code application-specific information about missing data.

On other systems there may be a bit that does not participate in arithmetic.
There is such a bit in the Burroughs Large Systems architecture, a relic
of the B5500 representation.  This has successfully allowed us to
represent missing data in the absence of NaNs.  When the same system
was ported to the IBM 370 architecture, I used non-zero unnormalized
floating point numbers to represent these values, but with much less
confidence.

>          I guess I was a bit obscure here.  On IBM hex systems when
>floating overflow occurs the default behavior is for an error message
>to be printed which includes a traceback pointing to the offending
>instruction.  Execution then continues (with the max floating number as
>a fixup) but will terminate if more than a small number (5?) of over-
>flows occur.  I was under the impression that this was typical behavior
>for pre-IEEE systems.  Is that not the case?

As others have pointed out,
this was the behavior of a particular run-time system on IBM 370s.  By
default, the hardware simply trapped on overflows.  This behavior assumes
that the user cares about the overflows.  If the overflows were insignificant,
then extra (possibly non-portable) coding had to be done to avoid the
messages.

>          I consider this behavior much friendlier than typical IEEE
>behavior which is to run to completion then output infs and nans (or
>worse plausible but wrong numbers) with no indication of where the
>error occurred.

If running to completion, ignoring all creation of Infs and NaNs, was
the IEEE default, then I would agree with you.  I worked on a CDC 6400,
where this was the behavior, and sometimes I really wished I could get
a trap at the time of a divide-by-zero.

However, I believe that the standard specifies that by default all traps
are enabled.

>                       James B. Shearer

Also, we have software that does not care where in a calculation that
an overflow or divide-by-zero occurred, just whether it occurred.  Being
able to turn off trapping greatly simplifies the coding of this application.
-- 
Craig Jackson
dricejb@drilex.dri.mgh.com
{bbn,axiom,redsox,atexnet,ka3ovk}!drilex!{dricej,dricejb}

dehnert@baalbek.asd.sgi.com (Jim Dehnert) (06/05/91)

In <9106010224.AA28532@ucbvax.Berkeley.EDU> jbs@WATSON.IBM.COM writes:

>          Also as has already been pointed out there is a software cost
>due to such features as 0*x no longer being 0.  In some cases this will
>have a performance impact in that compilers will generate inferior code.

As a compiler writer who has worked with IEEE systems, I don't know
why, at least in any area with an important impact on performance.
The reason for the IEEE definition (0*Inf=NaN) is that zero may
represent a result merely close to zero, and Inf really represents a
very large number -- I can pick representatives of these sets which
multiplied together give me a true result in the representable range
(not zero or Inf).  If I see a literal zero while compiling (the only
case I can think of when the compiler does anything that might care
about what 0*x equals), I feel quite free to assume that it is really
zero, in which case 0*x is really zero even for very large x.  So I
will do the same thing I would have for non-IEEE.  (This is the only
case I'm aware of where IEEE defines 0*x != x.)

>          As far as round to nearest goes, this does provide a real
>benefit.  I am not totally convinced it's worth the cost but a case can
>made.  The case for the round to even (when halfway) seems much more
>dubious.  I believe this represents a substantial portion of the cost
>(comments?) and the additional benefits seem quite tenuous.  I don't
>see any good reason for the other 3 modes.

Round to +/- infinity give you the floor and ceiling functions, which
come up frequently, for example in many transcendental function
algorithms.

>          I said:
>>arithmetic may compute a totally bogus result with no error indication
>>(other than the setting of an overflow bit which nobody looks at).  On
>>other systems the error is usually more obvious.

IEEE also provides for trapping bogus results if the
appropriate flags are set -- then you don't need to look at the
overflow bit.  Of course, if you're using an IBM RS/6000, this goes
many times slower, so perhaps you don't consider this an option...

>          I guess I was a bit obscure here.  On IBM hex systems when
>floating overflow occurs the default behavior is for an error message
>to be printed which includes a traceback pointing to the offending
>instruction.  Execution then continues (with the max floating number as
>a fixup) but will terminate if more than a small number (5?) of over-
>flows occur.  I was under the impression that this was typical behavior
>for pre-IEEE systems.  Is that not the case?

It is the case.  It's also perfectly achievable behavior for IEEE
systems if exceptions are trapped.  This has nothing to do with the
IEEE standard.  Instead, it's probably a result of default Unix
software behavior, which is not to trap in most implementations.
Of course, you don't have much choice on an RS/6000...

>          I consider this behavior much friendlier than typical IEEE
>behavior which is to run to completion then output infs and nans (or
>worse plausible but wrong numbers) with no indication of where the
>error occurred.

Much friendlier than non-trapping behavior on any FP system, that is...

>          It seems to me that the existence of infs in IEEE is in-
>creasing the likelihood of obtaining plausible-looking garbage.

No, not trapping increases that likelihood.  Given that you're not
trapping, the existence of Infs and NaNs makes it somewhat more likely
that you'll get reasonable results (though personally I'd prefer the
traps).

>          79 bits is clearly an absurd length for a floating point
>format (the same is true of the 43 bit single extended format).

An interesting statement.  Are you of the "only powers of two are real
numbers" persuasion?

hrubin@pop.stat.purdue.edu (Herman Rubin) (06/05/91)

In article <1991Jun5.045615.8772@fido.wpd.sgi.com>, dehnert@baalbek.asd.sgi.com (Jim Dehnert) writes:
> In <9106010224.AA28532@ucbvax.Berkeley.EDU> jbs@WATSON.IBM.COM writes:

			......................

> >          79 bits is clearly an absurd length for a floating point
> >format (the same is true of the 43 bit single extended format).
> 
> An interesting statement.  Are you of the "only powers of two are real
> numbers" persuasion?

Why should it even be the case that the exponent and significand are in
the same word?  Or that multiple precision, or complex, numbers are in
contiguous words?  Even these give problems from the standpoint of
vector machines.  If we had the exponent and significand in different
units, the separation of integer arithmetic from floating arithmetic
might neve have occurred.  Some machines do not even have separate units;
they must have unnormalized arithmetic.
-- 
Herman Rubin, Dept. of Statistics, Purdue Univ., West Lafayette IN47907-1399
Phone: (317)494-6054
hrubin@l.cc.purdue.edu (Internet, bitnet)   {purdue,pur-ee}!l.cc!hrubin(UUCP)

dce@linus.mitre.org (Dale Earl) (06/05/91)

In article <9106010224.AA28532@ucbvax.Berkeley.EDU> jbs@WATSON.IBM.COM writes:
 >
 >          Henry Spencer says:
 >My understanding is that the funnier-looking features, in particular the
 >infinities, NaNs, and signed zeros, mostly cost essentially nothing.
 >
 >          I believe this statement is incorrect.  Infs and nans compli-
 >cate the design of the floating point unit.  Even if there is ultimately
 >no performance impact, this added design effort represents a very real
 >cost.  Perhaps some hardware designers could comment.

     With my previous employer I helped design several floating point
units and in my experience the most difficult part of the IEEE
specification to handle were the denormalized numbers.  The
infinities, NANs, signed zeros, and rounding modes do not make the
hardware substantially more complicated or significantly decrease
performance.  As has been pointed out in other related postings these
features can be valuable for many numerical algorithms.  The value vs.
cost for denomalized numbers though is not favorable IMHO.  I feel a
representation for an infinitely small non-zero value would be useful
and much easier to implement in hardware.  But if you have to be able
to say that your device is 100% IEEE floating point compatible you get
stuck implementing the entire standard no matter what the hardware
tradeoffs.

     A related method we also employed on another device that did not
have to be 100% compliant may be of interest.  This device performed
both single precision integer and floating point operations.
Therefore we had a full 32 bit adder and multiplier available.  For
this device we made the on board register file 40 bits wide, 8 bits
for exponent and 32 bits for sign and mantissa.  Without going into
all the details, the integer operands used the 32 bit field and were
processed in a normal manner (exponent field was passed unchanged).
Floating point was performed using the full 32 bit mantissa with
rounding only on output from the chip.  This carried an extra 8 bits
of precision from intermediate operations which remained within the
chip.  The chip also could extract the exponent field into an integer
so we could emulate multiple precision in software if desired.  Not
exactly IEEE standard but probably would work well for many real
applications where better precision is more important than rigorous
numerical error bounding.   



--

Dale C. Earl                                                dce@mitre.org
These are my opinions, only my opinions, and nothing but my opinions! DCE 

jbs@WATSON.IBM.COM (06/06/91)

         I said:
 >                                             I believe the papers to
 > which you refer basically require a way to obtain the double precis-
 > ion product of two single precision numbers.  I consider this a some-
 > what different issue.
         Dik Winter said:
It would be a different issue if it where true.  But it is not true, the
papers I refer to are about the accumulation of an inner product in
double (extra) precision.  See for instance: J.H.Wilkinson, Rounding Errors
in Numeric Processes, Notes on Applied Science No. 32, HMSO, London;
Prentice Hall, New Jersey, 1963.   (Yes fairly old; it was already known
in that time!)

         I have checked this reference (actually "Rounding Errors in
Algebraic Processes").  I believe it is consistent with what I said.
For example p. 10 "Nearly all digital computers produce in the first
instance, the 2t-digit product of two t-digit numbers", p. 14 "The
subroutine accepts single-precision factors and gives a double preci-
sion product", p. 32 "If we are working on a computer on which the
inner products can be accumulated, then there may be stages in the
work when we obtain results which are correct to 2t binary figures".
Throughout this section it is clear what is being discussed is a way
to obtain inner products to twice working precision, not slightly
more than working precision as in the IEEE extended formats.
         I said:
 >           I agree that binary formats are superior to hex in that they
 > provide 2 extra bits of precision.
         Dik Winter said:
A nit; they provide 3 more; nearly a digit.

         How do you get 3?  Suppose the leading hex digit is 1, then we
lose 3 bits from the leading zeros, 1 bit from the hidden 1, but gain 2
bits from the smaller exponent field for a net loss of 2 bits.
         I said
 >                                     I don't agree (particually for 64-bit
 > formats) that this 2 bits makes it significantly easier to avoid numer-
 > ical problems.
         Dik Winter said:
Oh, but they are, see the paper I mentioned above.

         I found nothing in the above reference to justify this state-
ment.
                       James B. Shearer

jbs@WATSON.IBM.COM (06/06/91)

         I said:
>          Also as has already been pointed out there is a software cost
>due to such features as 0*x no longer being 0.  In some cases this will
>have a performance impact in that compilers will generate inferior code.
         Jim Dehnert said:
As a compiler writer who has worked with IEEE systems, I don't know
why, at least in any area with an important impact on performance.
The reason for the IEEE definition (0*Inf=NaN) is that zero may
represent a result merely close to zero, and Inf really represents a
very large number -- I can pick representatives of these sets which
multiplied together give me a true result in the representable range
(not zero or Inf).  If I see a literal zero while compiling (the only
case I can think of when the compiler does anything that might care
about what 0*x equals), I feel quite free to assume that it is really
zero, in which case 0*x is really zero even for very large x.  So I
will do the same thing I would have for non-IEEE.  (This is the only
case I'm aware of where IEEE defines 0*x != x.)

         Some people write:      IF(0.D0*X.NE.0.D0)THEN
to detect infs or NaNs (see recent posts to comp.land.fortran).  This
works on many systems.  If you replace 0*x with 0 then it will not work
on your system and your system is broken.
         Jim Dehnert also said
Round to +/- infinity give you the floor and ceiling functions, which
come up frequently, for example in many transcendental function
algorithms.

         This appears to apply just to convert to integer instructions.
Different modes could be offered for these instructions alone.  In any
case I don't believe they are useful for transcendental function algor-
ithms.  In fact the different rounding modes are a problem because the
functions must work for all modes.
         Jim Dehnert also said (referring to trapping floating overflow):
It is the case.  It's also perfectly achievable behavior for IEEE
systems if exceptions are trapped.  This has nothing to do with the
IEEE standard.  Instead, it's probably a result of default Unix
software behavior, which is not to trap in most implementations.

         Possibly I should be blaming Unix.  I tend to associate the
two.  Note in Goldberg's paper he suggests rewriting x/(1+x*x) in a
way which assumes trapping is not occurring.
         I said:
>          79 bits is clearly an absurd length for a floating point
>format (the same is true of the 43 bit single extended format).
         Jim Dehnert also said:
An interesting statement.  Are you of the "only powers of two are real
numbers" persuasion?

         I believe floating format sizes should be compatible with the
basic organization of the machine.  IE if your machine is based on
8-bit bytes the length best be a multiple of 8.  I believe 32 or 64
bits are typical widths of data paths in current machines.  Therefore
moving 79 bit things around will clearly be wasteful.  Consider a
machine with 64 bit floating registers and a 64 bit data path between
the floating unit and cache.  I see no efficient way to include a 79
bit format.  An 128 bit format on the other hand is easily accomodated.
         Herman Rubin asked:
Why should it even be the case that the exponent and significand are in
the same word?

         Because if they aren't every load or store will require 2 mem-
ory references instead of 1 halving the speed in many cases.
                       James B. Shearer

firth@sei.cmu.edu (Robert Firth) (06/06/91)

In article <9106052113.AA14439@ucbvax.Berkeley.EDU> jbs@WATSON.IBM.COM writes:

Jim Dehnert said:
> If I see a literal zero while compiling (the only
>case I can think of when the compiler does anything that might care
>about what 0*x equals), I feel quite free to assume that it is really
>zero, in which case 0*x is really zero even for very large x. 

>         Some people write:      IF(0.D0*X.NE.0.D0)THEN
>to detect infs or NaNs (see recent posts to comp.land.fortran).  This
>works on many systems.  If you replace 0*x with 0 then it will not work
>on your system and your system is broken.

Well, as another compiler writer, I did what I always do when faced
with a conumdrum such as this: refer to the standard.

The relevant section in this case is [ANSI X3.9-1978 sect 6.6.4],
which says, very simply:

	Evaluation of Arithmetic Expressions

	... Once the interpretation has been established in
	accordance with these rules, the processor may
	evaluate any mathematically equivalent expression,
	provided that the integrity of parentheses is not
	violated.

	Two arithmetic expressions are mathematiccally
	equivalent if, for all possible values of their
	primaries, their mathematical values are equal.
	However, mathematically equivalent arithmetic
	expressions may produce different computational
	results.

In otherwords: yes, Jim Dehnert is fully entitled to replace 0.0*X
with the mathematically equivalent expression 0.0; he is given
explicit licence to do so by the reference document, and anyone
who writes rubbish like

	IF (0.0*X .NE. 0.0)

would be well advised to take the trouble to learn the language
in which he is attempting to write code.

henry@zoo.toronto.edu (Henry Spencer) (06/06/91)

In article <26594@as0c.sei.cmu.edu> firth@sei.cmu.edu (Robert Firth) writes:
>In otherwords: yes, Jim Dehnert is fully entitled to replace 0.0*X
>with the mathematically equivalent expression 0.0...

Uh, sorry, if the number system you are using includes infinities, those
two expressions are *not* mathematically equivalent.  I don't remember
whether X3.9 pins down the number system in sufficient detail to preclude
this, although I doubt it, since infinities have been around in some
floating-point boxes for a long time.
-- 
"We're thinking about upgrading from    | Henry Spencer @ U of Toronto Zoology
SunOS 4.1.1 to SunOS 3.5."              |  henry@zoo.toronto.edu  utzoo!henry

seiler@vlsisj.uucp (%) (06/07/91)

  I am not a numerical expert, but I was onthe mailing list of the IEEE floating
point committee when the standard was created.  At that time, I was a chemistry
grad student learning that Bevington's book did not prepare a physical scientist
for all the perverted things that computer hardware did in the name of floating
point.

In article <9106052113.AA14439@ucbvax.Berkeley.EDU>, jbs@WATSON.IBM.COM writes:
|> 
|>          Jim Dehnert also said
|> Round to +/- infinity give you the floor and ceiling functions, which
|> come up frequently, for example in many transcendental function
|> algorithms.
|> 
|>          This appears to apply just to convert to integer instructions.
|> Different modes could be offered for these instructions alone.  In any
|> case I don't believe they are useful for transcendental function algor-
|> ithms.  In fact the different rounding modes are a problem because the
|> functions must work for all modes.

  The main justification for having all the rounding modes was the +/- infinity
modes made creating an interval arith. package much easier.  As I recall,
without them an interval package could run 300x's slower than straight floating
point.  With them it was ca. 3x's slower.  Besides, these rounding modes were
cheap.

|> 
|>          Possibly I should be blaming Unix.  I tend to associate the
|> two.

  What does floating point hardware have to do with an operating system?  Unix
runs on mainframes and VAX's.  IEEE floating point is used on PC's (IBM clones,
Mac's, Amiga's) as well as work stations.



|>          I believe floating format sizes should be compatible with the
|> basic organization of the machine.  IE if your machine is based on
|> 8-bit bytes the length best be a multiple of 8.  I believe 32 or 64
|> bits are typical widths of data paths in current machines.  Therefore
|> moving 79 bit things around will clearly be wasteful.  Consider a
|> machine with 64 bit floating registers and a 64 bit data path between
|> the floating unit and cache.  I see no efficient way to include a 79
|> bit format.  An 128 bit format on the other hand is easily accomodated.

|>                        James B. Shearer

  The floating point chips from Intel and Motorola have 80 bit floating point
registers.  At the time, the standard developers decided that the main formats
should be a power of two in size so array addressing is fast, it is useful to
have a format with a wider mantissa to have guard bits, and a hardware ALU finds
it convenient to have the mantissa to be a power of two in size.  Putting the
last two points together gives a lower limit of 79 bits.  That is a wierd number
so Intel and Moto. rounded up to 80.  Actually, Moto. stores an extended to
memory in 96 bits so it is long word aligned.  If your compiler has good
register allocation, there will be little extended loads and stores.
  I expect you know this already, but I am trying to understand what you saying
above.  Supporting any format greater than the FP ALU is going to be slow.  What
IS so bad about extended?

  My experience in pre-IEEE FP days wasn't as nice as yours.  CDC's that I used
did not trap at all on overflow.  Instead the exponent wrapped around.  Not only
did systems not convert numbers like 12.0 accurately, the error was different
for program literals than for numbers read by I/O. ( For some reason, this
seemed un-natural to me them.  After all, it can be exactly expressed). 
Generally, I found programs than worked on VAX's had problems on IBM 360's and
blew up badly on Cybers unless I was very paranoid.  Nowdays, we have
occasionally found routines that work on IEE workstations (Apollo, Sun, HP) have
had problems on VAX,
which is my favorite of the pre IEEE formats.

Yours in number crunching,

Bruce Seiler

seiler@compass-da.com

jbs@WATSON.IBM.COM (06/07/91)

         Richard Firth quotes the fortran standard:
        Evaluation of Arithmetic Expressions

        ... Once the interpretation has been established in
        accordance with these rules, the processor may
        evaluate any mathematically equivalent expression,
        provided that the integrity of parentheses is not
        violated.

        Two arithmetic expressions are mathematiccally
        equivalent if, for all possible values of their
        primaries, their mathematical values are equal.
        However, mathematically equivalent arithmetic
        expressions may produce different computational
        results.

        to conclude:
In otherwords: yes, Jim Dehnert is fully entitled to replace 0.0*X
with the mathematically equivalent expression 0.0;
        Some comments:
    1. The question is not what fortran requires but what IEEE imple-
mented in fortran requires.  One of the problems with IEEE's infs and
NaNs is that they do not fit well with fortran.
    2. Appeals to a standard which predates IEEE are not totally con-
vincing.  What does the Fortran 90 standard say?
    3. In any case my reading of the standard is opposed to yours.
Infinity is a possible value of x, 0*infinity does not have math-
ematical value 0, therefore 0*x is not mathematically equivalent to 0
in IEEE.
        Do you replace 0*x with 0 when compiling without optimization?
I think that if a (legal) program behaves differently when compiled with
and without optimization that then the compiler is broken.  Perhaps you
disagree.
        How would you handle 0-x?  Goldberg in his paper (p.33) spec-
ifically forbids replacing it with -x although the fortran standard
would appear to permit it since mathematically +0=-0.
        Richard Firth also said:
                                                     ... anyone
who writes rubbish like

        IF (0.0*X .NE. 0.0)

would be well advised to take the trouble to learn the language
in which he is attempting to write code.
        So how would you code this (detecting whether x inf or NaN)?
                        James B. Shearer

firth@sei.cmu.edu (Robert Firth) (06/07/91)

In article <9106070109.AA02137@ucbvax.Berkeley.EDU> jbs@WATSON.IBM.COM writes:

A couple of quick responses:

>    1. The question is not what fortran requires but what IEEE imple-
>mented in fortran requires.  One of the problems with IEEE's infs and
>NaNs is that they do not fit well with fortran.

Yes, that's the problem.  The Fortran standard describes the implementation
of arithmetic expressions in terms of the "mathematical" rules, that is,
the mathematical properties of real numbers.  It is the business of the
implementation to provide a reasonable approximation that, as far as
possible, obeys those rules.  If it fails to do so, i's not a legitimate
"fortran machine" and shouldn't claim to support ANSI F-77.

But a language implementation must follow the standard.  That is the
whole point of standards, after all, that they circumscribe the
permissible implementations.  You can't correct bogus hardware by
writing a bogus compiler.

>    3. In any case my reading of the standard is opposed to yours.
>Infinity is a possible value of x, 0*infinity does not have math-
>ematical value 0, therefore 0*x is not mathematically equivalent to 0
>in IEEE.

I would rather say that IEEE introduces bit patterns that are not valid
representations of real numbers, and then prescribes largely arbitrary
and improper semantics for the behaviour of those bit patterns. The
Fortran standard refers to the real numbers, not to bit patterns.

>        Do you replace 0*x with 0 when compiling without optimization?
>I think that if a (legal) program behaves differently when compiled with
>and without optimization that then the compiler is broken.  Perhaps you
>disagree.

Yes, I disagree.  The issue is the relevance of the difference.  Is a
compiler broken because an optimised program runs faster?  That's a
difference in behaviour, but you presumably think it an acceptable one.
Likewise, if a program transformation is permitted by the standard, a
compiler is entitled to apply that transformation; it is the responsibility
of the programmer to be aware which transformations are legal and not
assume any of them won't happen.

>        IF (0.0*X .NE. 0.0)
>        So how would you code this (detecting whether x inf or NaN)?

At the correct level of abstraction.  At the Fortran level, the
abstraction is that of the real numbers - as is made clear in the
standard.  At that level, NaNs don't exist, so you can't look for
them.

If you are asking about the bit-level representation of numbers, it
is necessary to go to the lower level of abstraction, the machine
level.  I would write an Assembly code subroutine, call it something
like

	LOGICAL FUNCTION ISANAN(X)

and comment it very, very carefully.  Perhaps like this

	C On some machines, the representation of the Fortran
	C real numbers includes bit patterns that have no
	C valid interpretation as real numbers.  The IEEE
	C representation is one such, for example.  These
	C bit patterns are called 'NaN', "not-a-number".
	C
	C Some simple algorithms used in numerical programming
	C will fail, giving bizarre and useless results, if any
	C of the supplied REAL input parameters in fact holds a
	C bit pattern that is NaN.
	C
	C To guard against this, the programmer must test whether
	C a real variable holds a NaN.  The following function
	C implements that test.  Note that the test requires EXACT
	C KNOWLEDGE of the target representation of real numbers
	C and the semantics of target machine operations.  The
	C implementation is therefore not portable.

A ever, that's just my opinion; feel free to disagree or criticise.

sjc@borland.com (Steve Correll) (06/08/91)

In article <9106070109.AA02137@ucbvax.Berkeley.EDU> jbs@WATSON.IBM.COM writes:
>    2. Appeals to a standard which predates IEEE are not totally con-
>vincing.  What does the Fortran 90 standard say?

The Fortran 90 standard (as of draft S8.115, at least) says the same things
the Fortran 77 standard does with regard to floating point. Fortran existed
long before the birth of IEEE floating point (and will probably exist long
after its death :-)) and the X3J3 committee, being sensitive to the diversity
of floating point implementations, was reluctant to discriminate among them.

A previous posting quoted the Fortran standard somewhat selectively (ANSI
X3.9-1978 sect 6.6.4):

>        ... Once the interpretation has been established in
>        accordance with these rules, the processor may
>        evaluate any mathematically equivalent expression,
>        provided that the integrity of parentheses is not
>        violated.
>
>        Two arithmetic expressions are mathematically
>        equivalent if, for all possible values of their
>        primaries, their mathematical values are equal.
>        However, mathematically equivalent arithmetic
>        expressions may produce different computational
>        results.

The missing rule is (ANSI X3.9-1978 sect 6.6):

    "Any numeric operation whose result is not mathematically defined is
    prohibited in the execution of an executable program. Examples are
    dividing by zero and raising a zero-valued primary to a zero-valued or
    negative-valued power."

Who knows whether IEEE infinity is a "mathematically defined result"? If you
claim that it is, then a compiler which collapses 0*x => x violates section
6.6.4. If you claim that it is not, then a program which even generates IEEE
infinity or NaN (not to mention a program which expects them to propagate
through computations in a well-defined fashion) violates section 6.6. The IEEE
fp committee chose not to deal with existing language standards.

This hasn't prevented lots of people from trying to have their cake and eat
it too, by saying that IEEE floating point should have no impact on compiler
optimization, but the programmer should compute with infinity and NaN anyway,
and it's the programmer's fault if s/he doesn't know that some
IEEE-legal-but-Fortran-illegal features will burn you and others will not,
depending on the implementation in question.

In practice, of course, programmers will attempt to do exactly that, and lots
of them will get burned. You can do useful work with a compiler which doesn't
de-optimize its code to cater to the IEEE standard, but you do need to be
careful about it.

dab@ubitrex.mb.ca (Danny Boulet) (06/08/91)

In article <26665@as0c.sei.cmu.edu> firth@sei.cmu.edu (Robert Firth) writes:
>In article <9106070109.AA02137@ucbvax.Berkeley.EDU> jbs@WATSON.IBM.COM writes:
>

	.
	.
	.

>Yes, I disagree.  The issue is the relevance of the difference.  Is a
>compiler broken because an optimised program runs faster?  That's a
>difference in behaviour, but you presumably think it an acceptable one.
>Likewise, if a program transformation is permitted by the standard, a
>compiler is entitled to apply that transformation; it is the responsibility
>of the programmer to be aware which transformations are legal and not
>assume any of them won't happen.
>
>>        IF (0.0*X .NE. 0.0)
>>        So how would you code this (detecting whether x inf or NaN)?
>
>At the correct level of abstraction.  At the Fortran level, the
>abstraction is that of the real numbers - as is made clear in the
>standard.  At that level, NaNs don't exist, so you can't look for
>them.
>
>If you are asking about the bit-level representation of numbers, it
>is necessary to go to the lower level of abstraction, the machine
>level.  I would write an Assembly code subroutine, call it something
>like
>
>	LOGICAL FUNCTION ISANAN(X)
>
>and comment it very, very carefully.  Perhaps like this
>
>	C On some machines, the representation of the Fortran
>	C real numbers includes bit patterns that have no
>	C valid interpretation as real numbers.  The IEEE
>	C representation is one such, for example.  These
>	C bit patterns are called 'NaN', "not-a-number".
>	C
>	C Some simple algorithms used in numerical programming
>	C will fail, giving bizarre and useless results, if any
>	C of the supplied REAL input parameters in fact holds a
>	C bit pattern that is NaN.
>	C
>	C To guard against this, the programmer must test whether
>	C a real variable holds a NaN.  The following function
>	C implements that test.  Note that the test requires EXACT
>	C KNOWLEDGE of the target representation of real numbers
>	C and the semantics of target machine operations.  The
>	C implementation is therefore not portable.

Assuming that your Fortran compiler isn't too clever, the following
detects a NaN:

	IF ( X .NE. X ) THEN
		PRINT *, 'X is a NaN'
	ENDIF

This takes advantage of the rule that says that NaNs are unordered.
Unfortunately, a clever Fortran compiler is strictly speaking (I think)
free to notice that "X .NE. X" is always false 'mathematically' and
apply the obvious optimization.

If you're willing to live with the risk that such an optimization might
be applied, you could probably come up with an equivalent trick for
Infs.  I'm not totally sure but this would probably work (there is
almost certainly an easier way but I don't have the IEEE spec handy):

	IF ( X .NE. X ) THEN
	    PRINT *, 'X is a NaN'
	ELSE
	    Z = { some expression that always yields a +Inf }
	    Y1 = Z + X	{ yields NaN if X is -Inf }
	    Y2 = Z - X	{ yields NaN if X is +Inf }
	    IF ( Y1 .NE. Y1 .OR. Y2 .NE. Y2 ) THEN
		PRINT *,'X is an Inf'
	    ENDIF
	ENDIF

All bets are off if the target machine isn't IEEE or if the compiler
is too smart for your own good!  Isn't writing portable code fun!!

jbs@WATSON.IBM.COM (06/08/91)

         Bruce Seiler says:
  The main justification for having all the rounding modes was the +/- infinity
modes made creating an interval arith. package much easier.  As I recall,
without them an interval package could run 300x's slower than straight floating
point.  With them it was ca. 3x's slower.  Besides, these rounding modes were
cheap.

         I don't believe interval arithmetic is used enough to justify
any hardware support.  Providing quad precision (much more useful in
my opinion) would also provide some support of interval arithmetic.  In
any case I believe your estimate of 3x slower than straight floating
point is unrealistic.  Does anyone have any examples of interval arith-
metic packages?  What is their speed compared to straight floating point?
The rounding modes may appear cheap but they have a large hidden cost.
Every intrinsic function subroutine for example must worry about handling
all rounding modes.
         I said:
|>          I believe floating format sizes should be compatible with the
|> basic organization of the machine.  IE if your machine is based on
|> 8-bit bytes the length best be a multiple of 8.  I believe 32 or 64
|> bits are typical widths of data paths in current machines.  Therefore
|> moving 79 bit things around will clearly be wasteful.  Consider a
|> machine with 64 bit floating registers and a 64 bit data path between
|> the floating unit and cache.  I see no efficient way to include a 79
|> bit format.  An 128 bit format on the other hand is easily accomodated.
         Bruce Seiler said:
  The floating point chips from Intel and Motorola have 80 bit floating point
registers.  At the time, the standard developers decided that the main formats
should be a power of two in size so array addressing is fast, it is useful to
have a format with a wider mantissa to have guard bits, and a hardware ALU finds
it convenient to have the mantissa to be a power of two in size.  Putting the
last two points together gives a lower limit of 79 bits.  That is a wierd number
so Intel and Moto. rounded up to 80.  Actually, Moto. stores an extended to
memory in 96 bits so it is long word aligned.  If your compiler has good
register allocation, there will be little extended loads and stores.
  I expect you know this already, but I am trying to understand what you saying
above.  Supporting any format greater than the FP ALU is going to be slow.  What
IS so bad about extended?

         According to "Computer Architecture a Quantitative Approach" (Hennessy
and Patterson, p.A-53, p. E-2) none of the MIPS R3010, Weitek 3364, TI 8847,
Intel i860, Sparc or Motorola M88000 architectures implement IEEE double ex-
tended.  This represents a lot of independent support for my position that
the double extended format is not cost effective.  Apparently these vendors
do not agree that it is convenient for the fraction to be a power of 2 in
length.  If your chip has 80 bit registers I fail to see how you can avoid
performing 80 bit loads and stores while saving and restoring these registers.
My objection to the IEEE double extended format is that it should be 128 bits
wide and called quad precision.  Also I do not agree that it is useful to
have a format with a few extra (guard) bits in the fraction.
                           James B. Shearer

chased@rbbb.Eng.Sun.COM (David Chase) (06/08/91)

jbs@WATSON.IBM.COM writes:
>If your chip has 80 bit registers I fail to see how you can avoid
>performing 80 bit loads and stores while saving and restoring these registers.

Other people have managed in the past.  Perhaps you are insufficiently
imaginative.  The only problem comes when "machine epsilon" computed
in registers does not match the "machine epsilon" stored in memory.
Otherwise, it is a big help when accumulating into a register (as in
inner product, for example) since the extra bits are most helpful in
guarding against lost precision.

I'm not certain I remember this correctly, but I recall that friends
at Rice had this "problem" with a compiler that they wrote for a
machine called the "PC/RT".  Note that the same problem ("excess
precision") occurs on the RS/6000 in certain situations.

David Chase
Sun

dik@cwi.nl (Dik T. Winter) (06/08/91)

In article <9106052113.AA14439@ucbvax.Berkeley.EDU> jbs@WATSON.IBM.COM writes:
 >          I said:
 > >          Also as has already been pointed out there is a software cost
 > >due to such features as 0*x no longer being 0.  In some cases this will
 > >have a performance impact in that compilers will generate inferior code.
I still have a question here: do you know realistic cases where inferior
code will be generated?  I do not think that 'a = 0.0 * b' is realistic.

 >          Jim Dehnert said:
 >                                          and Inf really represents a
 > very large number
Or true infinite!
 >                     If I see a literal zero while compiling (the only
 > case I can think of when the compiler does anything that might care
 > about what 0*x equals), I feel quite free to assume that it is really
 > zero, in which case 0*x is really zero even for very large x.  So I
 > will do the same thing I would have for non-IEEE.  (This is the only
 > case I'm aware of where IEEE defines 0*x != x.)
In that case your compiler is not IEEE conformant.  To set it to 0.0 even for
literal zero is wrong because the infinity might have come from a division
by zero (and in that case it is a true infinity).
 > 
(about rounding modes:)
 >          This appears to apply just to convert to integer instructions.
 > Different modes could be offered for these instructions alone.  In any
 > case I don't believe they are useful for transcendental function algor-
 > ithms.  In fact the different rounding modes are a problem because the
 > functions must work for all modes.
I agree that floor and ceiling are not the reasons for the different rounding
modes.  The reason is simply interval arithmetic.  How would you get
reasonable performance with interval arithmetic if only one rounding mode
is available?  (See the papers from Kulisch et. at. from Karlstuhe University
where they do not only require multiple rounding modes but also an inner
product with only a single rounding to provide algorithms that give results
with proven error bounds that are very small.  Standard numerical analysis
only provides error bounds that are extremely pessimistic estimates in most
cases.)
 > 
(About trapping overflow.)
 >          Possibly I should be blaming Unix.  I tend to associate the
 > two.
(Associating Unix and IEEE arithmetic.)
An interesting remark when we know that the first IEEE conformant
implementation was the 8087 (not completely, conforming to a draft).
I would not associate the 8087 with Unix; other operating systems come
to the mind.  You should however not blame Unix (or any OS), you should
blame the compiler (and its run-time system).  Unix does not know anything
about IEEE because it also runs on quite a few non-IEEE machines (PDP-11).  The
compiler ought to provide (through library functions) access to the quirks
of IEEE.

 > >          79 bits is clearly an absurd length for a floating point
 > >format (the same is true of the 43 bit single extended format).
I am interested how you arrive at 79 bits.  I count 80: sign (1 bit),
exponent (15 bits) and mantissa (64 bits) for double extended.  Note:
there is no implicit bit in the standard extended formats!

 >          Herman Rubin asked:
 > Why should it even be the case that the exponent and significand are in
 > the same word?
 > 
 >          Because if they aren't every load or store will require 2 mem-
 > ory references instead of 1 halving the speed in many cases.
Oh well, von Neumann advocated the use of scaled fixed point numbers.  He
did not even consider floating point hardware.  But I agree here, when we
have fp hardware it makes sense to have exponent and mantissa in the same
unit of information.
--
dik t. winter, cwi, amsterdam, nederland
dik@cwi.nl

pmontgom@euphemia.math.ucla.edu (Peter Montgomery) (06/09/91)

In article <3652@charon.cwi.nl> dik@cwi.nl (Dik T. Winter) writes:
>I still have a question here: do you know realistic cases where inferior
>code will be generated?  I do not think that 'a = 0.0 * b' is realistic.

	The literal zero will probably not appear, but it will not be
unusual to have

	const double coef_friction = 0.0;	/* Ignore friction */

early in a program and coef_friction multiplied by other quantities
later on.  Macro expansion (or inline expansion of procedures)
is another possibility, if one argument is 0.0.  
A third possibility is loop unrolling:

	do i = 0, 2
	    deriv(i) = REAL(i)*poly(i)
	end do

will expand into deriv(0) = 0.0*poly(0), deriv(1) = 1.0*poly(1),
deriv(2) = 2.0*poly(2).  The benefit of this unrolling
really comes when it simplifies this to deriv(0) = 0, 
deriv(1) = poly(1), deriv(2) = poly(2) + poly(2).
--
        Peter L. Montgomery 
        pmontgom@MATH.UCLA.EDU 
        Department of Mathematics, UCLA, Los Angeles, CA 90024-1555
If I spent as much time on my dissertation as I do reading news, I'd graduate.

dik@cwi.nl (Dik T. Winter) (06/09/91)

Aren't we getting a bit confused?  There are (in this case) two standards.
The first one is the language standard, the second is the IEEE standard for
floating point arithmetic.  Now we find:

In article <9106070109.AA02137@ucbvax.Berkeley.EDU> jbs@WATSON.IBM.COM writes:
 >         Do you replace 0*x with 0 when compiling without optimization?
 > I think that if a (legal) program behaves differently when compiled with
 > and without optimization that then the compiler is broken.  Perhaps you
 > disagree.
It is perfectly allowed for a program to behave differently with optimization
if the program goes beyond what the language standard prescribes.  However,
in this case the compiler can not claim to be IEEE conformant (and should say
so) even if the box it runs on is IEEE conformant.

In article <26665@as0c.sei.cmu.edu> firth@sei.cmu.edu (Robert Firth) writes:
 > >        IF (0.0*X .NE. 0.0)
 > >        So how would you code this (detecting whether x inf or NaN)?
 > At the correct level of abstraction.  At the Fortran level, the
 > abstraction is that of the real numbers - as is made clear in the
 > standard.  At that level, NaNs don't exist, so you can't look for
 > them.
 > 
Unless the compiler claims to fully support the arithmetic of the box it runs
on.  In that case it should be IEEE conformant, and hence not optimize such
expressions.

(write a routine:)
 > 	LOGICAL FUNCTION ISANAN(X)

Certainly if you want to be portable and have your program run with
1.  compilers that are not IEEE conformant, and/or
2.  boxes that are not IEEE conformant.
--
dik t. winter, cwi, amsterdam, nederland
dik@cwi.nl

dik@cwi.nl (Dik T. Winter) (06/09/91)

In article <9106072346.AA08023@ucbvax.Berkeley.EDU> jbs@WATSON.IBM.COM writes:
 >          I don't believe interval arithmetic is used enough to justify
 > any hardware support.
Well, IBM thought it important enough to provide support in some models (43xx).
Acrith is the keyword.

 >                        Providing quad precision (much more useful in
 > my opinion) would also provide some support of interval arithmetic.
Nope, it would not provide support at all.  You can not do interval arithmetic
with any extended precision without extensive software support.  On the other
hand, proper rounding modes are sufficient.

 >                                                                      In
 > any case I believe your estimate of 3x slower than straight floating
 > point is unrealistic.
How come unrealistic?  Example, interval add a to b giving c:
	set round to - inf;
	c.low = a.low + b.low;
	set round to + inf;
	c.upp = a.upp + b.upp;
why would this be more than 3x slower than straight floating point?  Now
consider the same in software (considering reasonable arithmetic):
	c.low = a.low + b.low;
	if(a.low < b.low) {
		if(c.low - b.low > a.low) c.low = nextbefore(c.low);
	} else {
		if(c.low - a.low > b.low) c.low = nextbefore(c.low);
	}
	c.upp = a.upp + b.upp;
	if(a.upp < b.upp) {
		if(c.upp - b.upp < a.upp) c.upp = nextafter(c.upp);
	} else {
		if(c.upp - a.upp < b.upp) c.upp = nextafter(c.upp);
	}
and consider the additional effort to implement nextbefore and nextafter.
And do not try this on a hex machine; it will not work.
So how would this benefit from some form of extended precision arithmetic?

 > The rounding modes may appear cheap but they have a large hidden cost.
 > Every intrinsic function subroutine for example must worry about handling
 > all rounding modes.
Only if they are expecting overflow or underflow in some particular situations.
In that case they better worry anyhow to prevent uncalled for overflows and
underflows.  The design of intrisic functions with the correct rounding modes
is of course a different art.  I think the Karlsruhe people have gone a long
way along this path.

 >        According to "Computer Architecture a Quantitative Approach" (Hennessy
 > and Patterson, p.A-53, p. E-2) none of the MIPS R3010, Weitek 3364, TI 8847,
 > Intel i860, Sparc or Motorola M88000 architectures implement IEEE double ex-
 > tended.
...
 > My objection to the IEEE double extended format is that it should be 128 bits
 > wide and called quad precision.
In that case none of the above machines would have implemented quad precision.
However, Motorola and Intel provide double extended, and the HP-PA architecture
provides quad.  But on the current implementations quad on HP-PA is in
software, so what is the advantage?
--
dik t. winter, cwi, amsterdam, nederland
dik@cwi.nl

glew@pdx007.intel.com (Andy Glew) (06/10/91)

> > >          Also as has already been pointed out there is a software cost
> > >due to such features as 0*x no longer being 0.  In some cases this will
> > >have a performance impact in that compilers will generate inferior code.
>I still have a question here: do you know realistic cases where inferior
>code will be generated?  I do not think that 'a = 0.0 * b' is realistic.


Urghh.... I used to be of this opinion - until I started making heavy
use of preprocessors so that a single "generic" code can serve
multiple purposes.

Basically, I do poor man's partial evaluation - where I say "these
parameters are fixed" - now give me an optimized program that
evaluates the rest of the numbers I want.

In this programming model expressions like 0.0 * b arise frequently.
And it sure would be nice if they could be optimized away.


For an introduction to partial evaluation for scientific codes, read
"Compiling scientific Code Using Partial Evaluation", Andrew Berlin
and Daniel Weise, IEEE Computer, December 1990.
--

Andy Glew, glew@ichips.intel.com
Intel Corp., M/S JF1-19, 5200 NE Elam Young Parkway, 
Hillsboro, Oregon 97124-6497

This is a private posting; it does not indicate opinions or positions
of Intel Corp.

khb@chiba.Eng.Sun.COM (Keith Bierman fpgroup) (06/10/91)

In article <9106072346.AA08023@ucbvax.Berkeley.EDU> jbs@WATSON.IBM.COM writes:


>   I don't believe interval arithmetic is used enough to justify
>   any hardware support.

This falls into the "H.R." chicken and egg rathole. If interval
arithmetic (or anything else new and claimed to be good) is 100x
slower than the traditional approach, it will simply never get off the
ground. The members of IEEE 754 chose to include rounding modes; at
this point we should either craft arguments for deleting them from a
future version of the standard or debate something else. Arguing for
their inclusion in a system is now a moot point; they are part of the
standard. 

>.... credentials...

Kahan was given the ACM Turing award for his contributions to floating point
(most notably the IEEE standard). I personally suspect this counts for
more than some heated debates on usenet. It is pointless for us to
post our respective resumes. 

>... misc. fortran vs. ieee interactions ....

The ways various standards interact is usually unpleasant in edge
cases (SVID vs. POSIX vs. ANSI C for example). One hopes that in time,
sucessful standards will see evolution _towards_ mutual compatibility.
In the meantime it is a series of "quality of implementation issues"
for each vendor to cope with as best they can.

X3J3 and X3J11 both steered away from trying to make their languages
fully IEEE 754 savvy. Unfortunately, there does not yet seem to be a
move afoot to correct this in their next efforts .... but "f90" is
just out the door (standard-wise) so we have ample time to hope for
the best from X3J3|WG5 next time ;> It is unclear to _me_ if X3J11 or
X3J16 is really the best forum for trying to fix C the next time ....
there is a group called NCEG which is taken on the challenge of
crafting proposals and finding an eventual venue.

As has been pointed out by others, there are very few computing
environments which provide handles on the bulk of the IEEE754 knobs.
This is unfortunate, as many of the benefits of IEEE754 are lost
thereby. Thus providing those who wish to propose weaker fp standards
with fodder of the sort "no one uses those features anyway" ... this
is quite unfortunate.

--
----------------------------------------------------------------
Keith H. Bierman    keith.bierman@Sun.COM| khb@chiba.Eng.Sun.COM
SMI 2550 Garcia 12-33			 | (415 336 2648)   
    Mountain View, CA 94043

dseal (David Seal) (06/11/91)

In article <1991Jun7.204910.878@borland.com> sjc@borland.com (Steve Correll)
writes:

>...
>
>The missing rule is (ANSI X3.9-1978 sect 6.6):
>
>    "Any numeric operation whose result is not mathematically defined is
>    prohibited in the execution of an executable program. Examples are
>    dividing by zero and raising a zero-valued primary to a zero-valued or
>    negative-valued power."
>
>Who knows whether IEEE infinity is a "mathematically defined result"? If you

I would think it isn't: it doesn't correspond to a single mathematical real
number in the way that ordinary IEEE numbers do.

>claim that it is, then a compiler which collapses 0*x => x violates section
>6.6.4. If you claim that it is not, then a program which even generates IEEE
>infinity or NaN (not to mention a program which expects them to propagate
>through computations in a well-defined fashion) violates section 6.6. The IEEE
>fp committee chose not to deal with existing language standards.
>
>This hasn't prevented lots of people from trying to have their cake and eat
>it too, by saying that IEEE floating point should have no impact on compiler
>optimization, but the programmer should compute with infinity and NaN anyway,
>and it's the programmer's fault if s/he doesn't know that some
>IEEE-legal-but-Fortran-illegal features will burn you and others will not,
>depending on the implementation in question.
>
>In practice, of course, programmers will attempt to do exactly that, and lots
>of them will get burned. You can do useful work with a compiler which doesn't
>de-optimize its code to cater to the IEEE standard, but you do need to be
>careful about it.

I don't see any evidence here that the "IEEE fp committee chose not to deal
with existing language standards". The existence of a facility in IEEE
arithmetic does *not* mean that every language has to supply it.

For instance, in this case the quoted part of the Fortran standard
effectively means that non-numeric values like infinities and NaNs must not
occur when a Fortran program is run, and that a run-time error should occur
in cases where the result of an operation can only be non-numeric. Unless
the IEEE system's trap-handling is seriously deficient (e.g. by ignoring
most of the "should" parts of the IEEE standard), this can be arranged by
having the run-time system trap the IEEE invalid operation, divide-by-zero
and overflow exceptions and produce appropriate run-time error messages.

As long as data is not initialised to non-numeric values, these trap
handlers will ensure that a run-time error will occur as soon as any such
value is produced. (Ignoring such things as non-numeric values produced by
writing to integer variables which are aliased with floating point variables
- such tricks are highly system-dependent anyway.) This means that
non-numeric values are "quarantined", so that the compiler may safely use
such optimisations as 0.0*X = 0.0.

In short, the IEEE fp committee seem to have dealt with existing language
standards by providing a superset of their facilities, together with ways of
trapping attempts to make use of the extra facilities when (as in the case
of the quoted part of the Fortran standard) the language standard forbids
their use.

If you don't want to make use of infinities and NaNs, a Fortran system that
traps them as above will suit your purposes and will strictly conform to the
Fortran standard. If you do want to make use of them, the quoted part of the
Fortran standard means that a strictly-conforming Fortran system is not for
you. You can't have it both ways...

Of course, some IEEE systems may be "broken" from this point of view. E.g.
if trapping the invalid operation, divide-by-zero or overflow exceptions
slows down your IEEE system dramatically, you may be faced with the awkward
choice of having your programs run slowly and strictly in accordance with
the Fortran standard, or fast and in a non-conforming way. I suspect many
language implementors would choose the latter. Then lots of programmers may
get burned...

David Seal
Sorry about the bad "From" line. This is on a list of things to be fixed
here. Correct email address is "dseal@armltd.co.uk".

All opinions are mine only...

jpk@ingres.com (Jon Krueger) (06/11/91)

jbs@WATSON.IBM.COM (James B. Shearer) writes:

> One of the problems with IEEE's infs and
> NaNs is that they do not fit well with fortran.

Is this a problem with IEEE floats or with FORTRAN?

What exceptions for numeric computation would you define?  What
language support would you specify for exception handling?  What
default behaviors would you require at execution time?  What
commercially available environments implement this?  Who has validated
that they do?  For what languages?  On what platforms?

> I think that if a (legal) program behaves differently when compiled with
> and without optimization that then the compiler is broken.  Perhaps you
> disagree.

I take it you are unfamiliar with the concept of ambiguity in language
definition.  It means that a single program can behave in two different
ways without the language definition identifying either one as wrong.
Both ways are perfectly legal according to the standard.  That's why we
call it ambiguous.  The optimizer is free to choose either way without
violating the standard.

I take it you are also unfamiliar with programming language
specifications which identify certain usages as undefined.  The
specification states explicitly that the results are undefined, or not
fully defined.  Programs that use them are not guaranteed to behave the
same way on all compilers conforming to the language specification.
Legal programs can depend on them.  Unwise, perhaps; not very portable,
certainly; but still perfect legal programs in that language.  Again,
the optimizer is free to generate a different behavior from the
unoptimized one.  That this may in some cases be undesirable is not at
issue here.  Providing consistent behavior for these programs falls
under the category of quality of implementation, not standards
conformance.  If inconsistent behavior is encountered here as a
function of level of optimization, we may agree that the implementation
is of poor quality.  But the compiler isn't broken; the code is.

> So how would you code this (detecting whether x inf or NaN)?
>         IF (0.0*X .NE. 0.0)

In FORTRAN, I would code it

      IF (FLTINF(X) .OR. FLTNAN(X))

where FLTINF and FLTNAN are defined as returning true if X is set to
IEEE infinity or IEEE not a number respectively, and false if X is set
to anything else.  I know of no portable FORTRAN expression that can do
this inline.  Implementation of FLTINF and FLTNAN would of course be
system specific.  For instance on SunOS I might implement FLTINF as:

      LOGICAL FUNCTION FLTINF(X)
      REAL X
      IF (X .EQ. r_infinity())
          FLTINF = .TRUE.
      ELSE
          FLTINF = .FALSE.
      RETURN
      END

On lesser operating systems correct implementation of FLTINF would
require dropping into assembler.  This however is an implementation
question for those who maintain FLTINF.  Those who call it may expect
consistent behavior across compilers, operating systems, and levels of
optimization.  I know of no relevant standard which makes the same
guarantees about (0.0*X .NE. 0.0).  Perhaps you could help me by citing
one.  Please explain exactly what it guarantees and where it applies.

-- Jon
--

Jon Krueger, jpk@ingres.com 

henry@zoo.toronto.edu (Henry Spencer) (06/12/91)

In article <192@armltd.uucp> dseal (David Seal) writes:
>>Who knows whether IEEE infinity is a "mathematically defined result"? If you
>
>I would think it isn't: it doesn't correspond to a single mathematical real
>number in the way that ordinary IEEE numbers do.

Why do you insist that the standard real numbers be the mathematical system
used?  Number systems which include infinities are neither novel nor somehow
illegitimate.  The real question about FORTRAN vs IEEE is how precise the
FORTRAN standard is about the nature of the number system it is running on.
Given that FP systems containing infinities are *not* new -- IEEE did not
invent this idea -- I would be surprised if a careful reading of the
standard mandated pure mathematical reals.

(The real answer, of course, is that the FORTRAN standard was too early
to even consider IEEE FP, and attempts to determine the interaction of the
two by reading between the lines are a bit silly.)
-- 
"We're thinking about upgrading from    | Henry Spencer @ U of Toronto Zoology
SunOS 4.1.1 to SunOS 3.5."              |  henry@zoo.toronto.edu  utzoo!henry

jbs@WATSON.IBM.COM (06/12/91)

          I said:
>If your chip has 80 bit registers I fail to see how you can avoid
>performing 80 bit loads and stores while saving and restoring these registers.
          David Chase said:
Other people have managed in the past.  Perhaps you are insufficiently
imaginative.

          Perhaps not.  Of course one way people have managed in the past
is to only save and restore the first 64 bits of the register.  In my
view this demonstrates excessive imagination.
                          James B. Shearer

jbs@WATSON.IBM.COM (06/12/91)

         I said:
 >          I don't believe interval arithmetic is used enough to justify
 > any hardware support.
         Dik Winter said:
Well, IBM thought it important enough to provide support in some models (43xx).
Acrith is the keyword.

         Ah Acrith, I will let Kahan comment ("Anomalies in the IBM
Acrith Package" by W. Kahan and E. LeBlanc, Version dated March 13,
1985, p. 9):
         "ACRITH is not at all typical of IBM products but seems instead
first to have escaped prematurely from a research project and then to
have evaded quality controls."
         I said:
 >                                                                      In
 > any case I believe your estimate of 3x slower than straight floating
 > point is unrealistic.
         Dik Winter said
How come unrealistic?  Example, interval add a to b giving c:
        set round to - inf;
        c.low = a.low + b.low;
        set round to + inf;
        c.upp = a.upp + b.upp;
why would this be more than 3x slower than straight floating point?

        Because it's 4 instructions vrs 1.  Also lets see your code
for interval multiplication.
                    James B. Shearer

jbs@WATSON.IBM.COM (06/12/91)

         I said:
>   I don't believe interval arithmetic is used enough to justify
>   any hardware support.
         Keith Bierman said:
This falls into the "H.R." chicken and egg rathole. If interval
arithmetic (or anything else new and claimed to be good) is 100x
slower than the traditional approach, it will simply never get off the
ground.
         Some comments:
    1. Interval arithmetic is not "new".  People have been writing
books and papers on it for at least 30 years.
    2. Herman Rubin is still trying to get people to implement his
proposed instructions.  The interval arithmetic people got their
instructions into the IEEE standard and they have been implemented
in hardware in many machines.  Interval arithmetic still hasn't
gotten off the ground.  How long are we expected to wait until
concluding it's not going to fly.
    3. The real problem with interval arithmetic is not that it's
slow but that on real applications its error estimates are gener-
ally so pessimistic as to be useless.  Extended precision generally
gives better estimates with less work.
         There are many ideas which have seemed promising but never
quite made it, bubble memory for example.  While judgement can
never be final, automatic error estimation in general and interval
arithmetic in particular seems unlikely to ever fulfill the hopes
many had for it.
                      James B. Shearer

jbs@WATSON.IBM.COM (06/12/91)

         David Hough writes:
Part of the problem is that the benchmark programs in use measure only
common cases.  Various versions of the Linpack benchmark all have in common
that the data is taken from a uniform random distribution, producing problems
of very good condition.   So the worst possible linear
equation solver algorithm running on the dirtiest possible floating-point
hardware should be able to produce a reasonably small residual, even for
input problems of very large dimension.

         This is untrue.  Consider pivoting on the column element of
smallest magnitude while using IEEE single precision.  I don't believe
n has to be very large before you are in big trouble.
         David Hough writes:
Such problems may actually correspond to some realistic applications, but many
other realistic applications
tend at times to get data that is much closer to the boundaries of
reliable performance.   This means that the same algorithms and input data
will fail on some computer systems and not on others.   In consequence
benchmark suites that are developed by consortia of manufacturers
proceeding largely by consensus, such as SPEC, will tend to exclude
applications, however realistic, for which results of comparable quality
can't be obtained by all the members' systems.

         If you know of a single realistic application where the diff-
erence between 64 bit IEEE rounded to nearest and 64 bit IEEE chopped
to zero makes a significant difference in the maximum performance ob-
tainable I would like to see it.
                       James B. Shearer

msp33327@uxa.cso.uiuc.edu (Michael S. Pereckas) (06/12/91)

In <9106112357.AA18157@ucbvax.Berkeley.EDU> jbs@WATSON.IBM.COM writes:
>          I said:
>>If your chip has 80 bit registers I fail to see how you can avoid
>>performing 80 bit loads and stores while saving and restoring these registers.
>          David Chase said:
>Other people have managed in the past.  Perhaps you are insufficiently
>imaginative.
>          Perhaps not.  Of course one way people have managed in the past
>is to only save and restore the first 64 bits of the register.  In my
>view this demonstrates excessive imagination.
>                          James B. Shearer

Although 80 bit in-register intermediates can be useful, if all 80
bits aren't saved during context switches, you end up with random
precision, and no one will be very happy.  There have been machines in
the past with this problem, if I remember correctly.

--

< Michael Pereckas  <>  m-pereckas@uiuc.edu  <>  Just another student... >
   "This desoldering braid doesn't work.  What's this cheap stuff made
    of, anyway?"  "I don't know, looks like solder to me."

khb@chiba.Eng.Sun.COM (Keith Bierman fpgroup) (06/12/91)

In article <9106120054.AA19903@ucbvax.Berkeley.EDU> jbs@WATSON.IBM.COM writes:

   instructions into the IEEE standard and they have been implemented
   in hardware in many machines.  Interval arithmetic still hasn't
   gotten off the ground.  How long are we expected to wait until
   concluding it's not going to fly.

Until vendors (other than apple and sun) provide hooks that people can
actually use. Had such hooks been part of _any_ language standard, I
suspect folks with such interests would have merrily coded up systems.

       3. The real problem with interval arithmetic is not that it's
   slow but that on real applications its error estimates are gener-
   ally so pessimistic as to be useless.

At last, something we agree on! But I confess that I have not used it
extensively, so I am prepared to believe that it may have utility I am
not personally cognizant of. Interval techniques will have to be
deployed in large quantities before it can be counted out of the
running, IMHO.
--
----------------------------------------------------------------
Keith H. Bierman    keith.bierman@Sun.COM| khb@chiba.Eng.Sun.COM
SMI 2550 Garcia 12-33			 | (415 336 2648)   
    Mountain View, CA 94043

hrubin@pop.stat.purdue.edu (Herman Rubin) (06/12/91)

In article <KHB.91Jun11185538@chiba.Eng.Sun.COM>, khb@chiba.Eng.Sun.COM (Keith Bierman fpgroup) writes:
 
> In article <9106120054.AA19903@ucbvax.Berkeley.EDU> jbs@WATSON.IBM.COM writes:
 
>    instructions into the IEEE standard and they have been implemented
>    in hardware in many machines.  Interval arithmetic still hasn't
>    gotten off the ground.  How long are we expected to wait until
>    concluding it's not going to fly.
 
> Until vendors (other than apple and sun) provide hooks that people can
> actually use. Had such hooks been part of _any_ language standard, I
> suspect folks with such interests would have merrily coded up systems.
 
>        3. The real problem with interval arithmetic is not that it's
>    slow but that on real applications its error estimates are gener-
>    ally so pessimistic as to be useless.
 
> At last, something we agree on! But I confess that I have not used it
> extensively, so I am prepared to believe that it may have utility I am
> not personally cognizant of. Interval techniques will have to be
> deployed in large quantities before it can be counted out of the
> running, IMHO.

Interval techniques will often be needed when extreme accuracy is needed. 
But in these case, it will probably be necessary to use multiple precision
anyway.  There are cheaper ways to analyze errors in most situations than
interval arithmetic, but they involve using various precisions.  In some
cases, a very good estimate can even be made in advance of any computation.

I suggest that someone who thinks interval arithmetic is that great try it
on some such task as computing a convolution using the FFT.  There are many
reasonable examples where one would even throw out the idea of using the
faster FFT, even with multiprecision, in favor of methods not so prone to
error.
-- 
Herman Rubin, Dept. of Statistics, Purdue Univ., West Lafayette IN47907-1399
Phone: (317)494-6054
hrubin@l.cc.purdue.edu (Internet, bitnet)   {purdue,pur-ee}!l.cc!hrubin(UUCP)

dana@hardy.hdw.csd.harris.com (Dan Aksel) (06/12/91)

In article <9106112357.AA18157@ucbvax.Berkeley.EDU> jbs@WATSON.IBM.COM writes:
>
>          I said:
>>If your chip has 80 bit registers I fail to see how you can avoid
>>performing 80 bit loads and stores while saving and restoring these registers.
>-----------------------------------------------------------
>          Perhaps not.  Of course one way people have managed in the past
>is to only save and restore the first 64 bits of the register.  In my
>view this demonstrates excessive imagination.
>                          James B. Shearer
Solution is relatively simple.  Hardware provides a status register which
indicates the precision of each register.  A computed GOTO tells which format to
save each individual register in.  This may create too much overhead if a rapid
context switch time is the critical path instead of memory space.

THe only compromise I can see immediately is a two bit status field which tells
the precision of the LARGEST format currently stored in the register file.  All
registers are then saved in this format.  Not an optimal solution but it works.
---
Dan AKsel

hrubin@pop.stat.purdue.edu (Herman Rubin) (06/13/91)

In article <3646@travis.csd.harris.com>, dana@hardy.hdw.csd.harris.com (Dan Aksel) writes:
> In article <9106112357.AA18157@ucbvax.Berkeley.EDU> jbs@WATSON.IBM.COM writes:

| >          I said:
| >>If your chip has 80 bit registers I fail to see how you can avoid
| >>performing 80 bit loads and stores while saving and restoring these registers.
| >-----------------------------------------------------------
| >          Perhaps not.  Of course one way people have managed in the past
| >is to only save and restore the first 64 bits of the register.  In my
| >view this demonstrates excessive imagination.
| >                          James B. Shearer

> Solution is relatively simple.  Hardware provides a status register which
> indicates the precision of each register.  A computed GOTO tells which format to
> save each individual register in.  This may create too much overhead if a rapid
> context switch time is the critical path instead of memory space.
> 
> THe only compromise I can see immediately is a two bit status field which tells
> the precision of the LARGEST format currently stored in the register file.  All
> registers are then saved in this format.  Not an optimal solution but it works.
For saving and restoring individual 80-bit registers, why not use either a 64+16or a 96 or even 128 bit memory unit?  If a bunch of registers are to be saved, 
to be restored later, the hardware could set up the blocks of 16 and 64 bit
pieces.
-- 
Herman Rubin, Dept. of Statistics, Purdue Univ., West Lafayette IN47907-1399
Phone: (317)494-6054
hrubin@l.cc.purdue.edu (Internet, bitnet)   {purdue,pur-ee}!l.cc!hrubin(UUCP)

ok@goanna.cs.rmit.oz.au (Richard A. O'Keefe) (06/13/91)

In article <1991Jun10.234313.2157@ingres.Ingres.COM>, jpk@ingres.com (Jon Krueger) writes:
> jbs@WATSON.IBM.COM (James B. Shearer) writes:

> > One of the problems with IEEE's infs and
> > NaNs is that they do not fit well with fortran.

> Is this a problem with IEEE floats or with FORTRAN?

With all this noise about IEEE 754/854 and Fortran and 0.0*X and so on,
I thought it was time to hear a voice from the past.

    We avoided algebraic constructions leading to known numerical
    problems.  For example, when floating-point arithmetic is truncated
    or rounding occurs before renormalization, the expression 2.0*x
    often contains error.  We used the expression x+x instead, because
    it is always error-free on binary machines and never worse than
    2.0*x on other machines.  Similarly, the expression 1.0-x was
    replaced, on machines that lacked guard digits, by the expression
    (0.5-x)+0.5, to protect precision for 0.0 < x < 1.0.
	...	
    Perhaps the most elegant example of error pinpointing occurred in
    initial tests at Wisconsin for the complete elliptic integral K(m).
    For m close to 1.0, function accuracy depends critically upon
    accurate evaluation of the expression (1.0 - m).  As described
    earlier, this expression was replaced by the more stable expression
    (0.5 - m) + 0.5.  Despite this tactic, the tests detected large
    errors for m close to 1.0.  ...  Univac\s optimizing compiler had
    ``helpfully'' restored the earlier expression  ... [we] designed a
    simple counter-measure.


	FUNPACK -- a Package of Special Function Routines.
	W. J. Cody.

MORAL:  _any_ compiler which applies "algebraic" identities that apply
to mathematical reals but not to the target machine's floats is going to
make it rather harder for people to write accurate

In another chapter of the same book (Sources and Development of
Mathematical Software), Cody observes

	High-quality software is supposed to be fail-safe and transportable;
	it must work properly regardless of quirks in the host arithmetic
	system.  Software production is seriously hampered when computer
	arithmetic violates simple arithmetic properties such as
		1.0 * X = X
		X * Y = Y * X
	and	X + X = 2.0 * X
	There exist mainframes of recent [1980] design in which each of
	these properties fails for appropriate floating-point X and Y.
	[He's talking about *finite* values, folks.  Not inf, nan, or eps.]
	Worse yet, on some machines there exist floating-point X > 0.0 such
	that
		1.0 * X = 0.0
		X + X = 0.0
	or	SQRT(X)**2 = overflow or underflow.
	ALL THESE ANOMALIES ARE TRACEABLE TO ENGINEERING ECONOMIES.
	Computer architects repeatedly ignore complaints about such
	mathematical atrocities, and new anomalies seem to appear
	with each new machine.
	...
	Not only is [IEEE 754] fee of anomalies, but it also contains
	new features SPECIFICALLY REQUESTED AND DESIGNED BY NUMERICAL
	ANALYSTS WITH SOFTWARE EXPERIENCE.

Elsewhere in the same book (a collection of papers edited by W.R.Cowell
and published in 1984) someone else complains that on the Cray-1 there
is a _finite_ value X such that 1.0*X overflows.  What price the
optimisation 1.0*X => X on that machine?

I suggest that if anyone wants to partially evaluate something like
	sum(i*c(i+1), i = 0,n)
and have it turn first into 0*c(1) + 1*c(2) + 2*c(3) and then into
c(2) + 2*c(3), the appropriate thing is for the partial evaluator to do
that, not the compiler, and to do it only when licensed to by an
explicit declaration "USE-REAL-AXIOMS(c(*))" (or when it knows that the
target machine obeys the laws 0*x = 0, 1*x = x, 0+x = x). It took me a
couple of years to get over the "optimisation" I was taught to use of
replacing X/Y by X*YINV, with YINV precalculated; the last thing I want
is an ``optimising'' compiler putting the less accurate expression back.

-- 
Q:  What should I know about quicksort?   A:  That it is *slow*.
Q:  When should I use it?  A:  When you have only 256 words of main storage.

dgh@validgh.com (David G. Hough on validgh) (06/13/91)

Responding to my earlier comments,
in article <9106120131.AA20868@ucbvax.Berkeley.EDU>, jbs@WATSON.IBM.COM writes:
> 
>> Various versions of the Linpack benchmark all have in common
>> that the data is taken from a uniform random distribution, producing problems
>> of very good condition.   So the worst possible linear
>> equation solver algorithm running on the dirtiest possible floating-point
>> hardware should be able to produce a reasonably small residual, even for
>> input problems of very large dimension.
> 
>          This is untrue.  Consider pivoting on the column element of
> smallest magnitude while using IEEE single precision.  I don't believe
> n has to be very large before you are in big trouble.

I should have said "the worst PROBABLE algorithm" - no pivoting at all rather
than agressively choosing the worst possible pivot.  It ought to be possible to
try this out by disabling the pivot selection in the Linpack benchmark.

>> This means that the same algorithms and input data
>> will fail on some computer systems and not on others.
> 
>          If you know of a single realistic application where the diff-
> erence between 64 bit IEEE rounded to nearest and 64 bit IEEE chopped
> to zero makes a significant difference in the maximum performance ob-
> tainable I would like to see it.

I doubt you'll find anything like this in normal scientific computation
- the difference in problem domain between floating-point systems with
identical exponent and significand fields and rounding schemes
comparable in cost and care is rarely likely to be extremely
noticable.  I would expect to notice the difference in problems like
orbit computations, however - rounding should yield acceptable results
for longer simulated times than chopping, even if everything else is
equal, simply because the statistics of rounding are more favorable.  I
have also heard that the details of rounding are important in financial
calculations of bond yields.

More to the point of benchmarking defined by vendor consortia, the
differences in applicable domain between 32-bit IBM 370 format and
32-bit IEEE, or between 64-bit Cray format and 64-bit IEEE, or even
between HP and TI hand calculators with 10 displayed digits, has been a
part of the experience of a number of comp.arch readers.   In the two
former cases you can often get by with twice as much of the dirtier
precision.  In the Cray case that's much slower, however, and may
eliminate most of the supercomputer's advantage over a workstation.
The IBM 370 case may be slower too if the problem involves much more
than +-*.
-- 

David Hough

dgh@validgh.com		uunet!validgh!dgh	na.hough@na-net.ornl.gov

dseal (David Seal) (06/13/91)

In article <1991Jun11.175639.22558@zoo.toronto.edu> henry@zoo.toronto.edu
(Henry Spencer) writes:

>In article <192@armltd.uucp> dseal (David Seal) writes:
>>>Who knows whether IEEE infinity is a "mathematically defined result"? If you
>>
>>I would think it isn't: it doesn't correspond to a single mathematical real
>>number in the way that ordinary IEEE numbers do.
>
>Why do you insist that the standard real numbers be the mathematical system
>used?  Number systems which include infinities are neither novel nor somehow
>illegitimate.  The real question about FORTRAN vs IEEE is how precise the
>FORTRAN standard is about the nature of the number system it is running on.
>Given that FP systems containing infinities are *not* new -- IEEE did not
>invent this idea -- I would be surprised if a careful reading of the
>standard mandated pure mathematical reals.
>
>(The real answer, of course, is that the FORTRAN standard was too early
>to even consider IEEE FP, and attempts to determine the interaction of the
>two by reading between the lines are a bit silly.)

A fair point, and I agree it's open to interpretation. I would favour (*not*
"insist on") the 'subset of the unextended real numbers' interpretation
because (a) the unextended real numbers are a mathematical system which is
*comparatively* easy to work with; (b) in the absence of any clues to the
contrary, I would expect the Fortran standard to mean either 'using whatever
number system is most appropriate on your machine' or 'using a system based
on the unextended real numbers'. There seems to be a reasonable chance of
writing reasonably portable Fortran code for the latter, despite the
problems in establishing e.g. how big rounding errors can be and what the
limits of the exponent range are. There are more major portability problems
with using whatever number system you like - in particular, this whole
thread was started by such problems with NaNs and infinities! As a major
reason for having language standards is to promote portability, allowing a
completely free choice of number system seems a bit awkward.

In any case, the main point of my previous posting is that *if* you
interpret the Fortran standard as insisting on the 'subset of the unextended
real numbers' interpretation, you can achieve the required effect inside an
IEEE system. Furthermore, statements like "0.0 * X = 0.0" are
unconditionally true in such a system, so they can be used safely by
optimisers.

You are of course free to interpret the Fortran standard in other ways. (I
should probably have been a bit less dogmatic in talking about "strict"
compliance with the standard.) But if you use the full IEEE model, you then
have to solve quite a few problems: e.g. how does the programmer detect the
use of an infinity/NaN; how do you achieve a reasonable level of
optimisation without breaking the IEEE standard; how do you make code
written for your system portable to systems without IEEE arithmetic, etc.?

To sum up: I suggested a way to have your Fortran adhere to the Fortran
standard and your arithmetic to the IEEE standard, in a portable fashion. As
a result, I don't see any evidence that the IEEE committee ignored existing
language standards. There may well be other ways of achieving the same
effect which additionally give you more complete access to the facilities of
an IEEE arithmetic system, but IMHO they are all likely to be more complex
and to require more understanding by the programmer.

David Seal
Sorry about the bad "From" line. This is on a list of things to be fixed
here. Correct email address is "dseal@armltd.co.uk".

All opinions are mine only...

dik@cwi.nl (Dik T. Winter) (06/14/91)

In article <9106120018.AA18733@ucbvax.Berkeley.EDU> jbs@WATSON.IBM.COM writes:
 >  >          I don't believe interval arithmetic is used enough to justify
 >  > any hardware support.
Me:
 > Well, IBM thought it important enough to provide support in some models (43xx).
JBS:
 > Acrith is the keyword.
 >          Ah Acrith, I will let Kahan comment ...
No need, I know Kahan's opinions about Acrith.  Still IBM thought it
important enough to put it in hardware (actually microcode I think).
Perhaps under severe pressure from IBM Germany?  But at least as far as
IBM saw it there was a niche where interval arithmetic is heavily used.

Me:
 > How come unrealistic?  Example, interval add a to b giving c:
 >         set round to - inf;
 >         c.low = a.low + b.low;
 >         set round to + inf;
 >         c.upp = a.upp + b.upp;
 > why would this be more than 3x slower than straight floating point?
JBS:
 >         Because it's 4 instructions vrs 1.
Thus it would be 4 times as slow if each instruction takes exactly the same
time to issue.  This is not true on processors where the FPU is not pipelined
(in that case the setting of the rounding mode is much faster than the
addition).  It might also not be true on some pipelined processors, i.e.
if the multiply unit is not pipelined.  Etc.  Do you indeed know machines
where the above sequence would be four times as slow as a normal multiply?

JBS:
 >                                             Also lets see your code
 > for interval multiplication.
Not difficult, it goes something like:
	if a.low >= 0.0 then
		if b.low >= 0.0 then
			4 operations
		else if b.upp <= 0.0 then
			4 operations
		else if b.upp >= - b.low then
			4 operations
		else
			4 operations
the remainder is left as an exercise for the reader.  The code is a bit
hairy, and on modern RISC machines you would not put it inline (a subroutine
jump to a leaf routine is extremely cheap).  We see that also here on
many machines the multiply operations take most of the time.  (The code
assumes that intervals do not contain inf, if you allow that the code only
gets hairier, not much more time consuming.)

Of course division is still hairier, but still reasonably possible.
--
dik t. winter, cwi, amsterdam, nederland
dik@cwi.nl

dik@cwi.nl (Dik T. Winter) (06/14/91)

In article <KHB.91Jun11185538@chiba.Eng.Sun.COM> khb@chiba.Eng.Sun.COM (Keith Bierman fpgroup) writes:
 > In article <9106120054.AA19903@ucbvax.Berkeley.EDU> jbs@WATSON.IBM.COM writes:
 >        3. The real problem with interval arithmetic is not that it's
 >    slow but that on real applications its error estimates are gener-
 >    ally so pessimistic as to be useless.
 > At last, something we agree on! But I confess that I have not used it
 > extensively, so I am prepared to believe that it may have utility I am
 > not personally cognizant of. Interval techniques will have to be
 > deployed in large quantities before it can be counted out of the
 > running, IMHO.
If we look at what the Karlsruhe people are doing there is no problem with the
error estimates from interval arithmetic.  To give an example: to solve a
system of linear equations, first a standard method is used to approximate
a solution.  Next, using interval arithmetic (plus dot product with one
rounding), in an iterative process the approximate solution is enclosed in
a solution interval that is made as small as possible.  I have seen from
the process where the lower and upper bound of each solution component
differed only in a single bit (and those bounds were guaranteed).
--
dik t. winter, cwi, amsterdam, nederland
dik@cwi.nl

dik@cwi.nl (Dik T. Winter) (06/14/91)

David Hough writes:
 > Part of the problem is that the benchmark programs in use measure only
 > common cases.  Various versions of the Linpack benchmark all have in common
 > that the data is taken from a uniform random distribution, producing problems
 > of very good condition.   So the worst possible linear
 > equation solver algorithm running on the dirtiest possible floating-point
 > hardware should be able to produce a reasonably small residual, even for
 > input problems of very large dimension.
 > 
In article <9106120131.AA20868@ucbvax.Berkeley.EDU> jbs@WATSON.IBM.COM writes:
 >          This is untrue.  Consider pivoting on the column element of
 > smallest magnitude while using IEEE single precision.  I don't believe
 > n has to be very large before you are in big trouble.

But you can get reasonable results without any pivoting if the condition
is very good!  Of course, if you do pivot you probably will get better
results; if you do full pivoting you will probably get still better results.

From my experience, once I wrote a solver where due to a coding error no
pivoting was performed any time.  It solved the 1024x1024 systems I initially
tested it on very well.  Of course, those systems were extremely good
conditioned.  The algorithm fell flat on its face when it was fed a not
so random input.
--
dik t. winter, cwi, amsterdam, nederland
dik@cwi.nl

mohta@necom830.cc.titech.ac.jp (Masataka Ohta) (06/14/91)

In article <13450@mentor.cc.purdue.edu>
	hrubin@pop.stat.purdue.edu (Herman Rubin) writes:

>> At last, something we agree on! But I confess that I have not used it
>> extensively, so I am prepared to believe that it may have utility I am
>> not personally cognizant of. Interval techniques will have to be
>> deployed in large quantities before it can be counted out of the
>> running, IMHO.

>Interval techniques will often be needed when extreme accuracy is needed. 
>But in these case, it will probably be necessary to use multiple precision
>anyway. 

Interval arithmetic is useful when robustness is necessary. For example,
in a paper "On Ray Tracing Parametric Surfaces" (D. L. Toth, SIGGRAPH '85
Conference Proceedings) interval arithmetic is used to reliably find the
nearest intersection beween a line and a parametric surface.

>There are cheaper ways to analyze errors in most situations than
>interval arithmetic, but they involve using various precisions.  In some
>cases, a very good estimate can even be made in advance of any computation.

Error estimation is not enough for robust computation, but, error bounding
by interval arithmetic is useful. As in Toth's paper, it is possible to use
non-interval newton iteration if its unique convergence is assured by
interval arithmetic.

						Masataka Ohta

mshute@cs.man.ac.uk (Malcolm Shute) (06/14/91)

In article <9106070109.AA02137@ucbvax.Berkeley.EDU> jbs@WATSON.IBM.COM writes:
>I think that if a (legal) program behaves differently when compiled with
>and without optimization that then the compiler is broken.  Perhaps you
>disagree.

      B = 5.0
      IF ( ( B .GT. 3.0 ) .OR. MYTEST(A) )
          PRINT some diagnotic messages


      LOGICAL FUNCTION MYTEST(X)
      REAL X
      MYTEST = some boolean work
      PRINT a few diagnostics on the screen
      RETURN
      END

--

Malcolm SHUTE.         (The AM Mollusc:   v_@_ )        Disclaimer: all

peter@ficc.ferranti.com (peter da silva) (06/15/91)

In article <GLEW.91Jun9110817@pdx007.intel.com>, glew@pdx007.intel.com (Andy Glew) writes:
> In this programming model expressions like 0.0 * b arise frequently.
> And it sure would be nice if they could be optimized away.

Couldn't you do something like:

#define foo(...,x,...) ...((x)?((x) * b):(0.0))...

This should give the compiler a clue, no?
-- 
Peter da Silva; Ferranti International Controls Corporation; +1 713 274 5180;
Sugar Land, TX  77487-5012;         `-_-' "Have you hugged your wolf, today?"

peter@ficc.ferranti.com (peter da silva) (06/15/91)

In article <9106112357.AA18157@ucbvax.Berkeley.EDU>, jbs@WATSON.IBM.COM writes:
> Of course one way people have managed in the past [to handle 80 bit registers
  on context switches]
> is to only save and restore the first 64 bits of the register.  In my
> view this demonstrates excessive imagination.

Another popular technique is to use a single-tasking system and to crash on
floating point stack overflows.
-- 
Peter da Silva; Ferranti International Controls Corporation; +1 713 274 5180;
Sugar Land, TX  77487-5012;         `-_-' "Have you hugged your wolf, today?"

jbs@WATSON.IBM.COM (06/15/91)

           David Seal (quoting someone quoting the fortran standard):
>The missing rule is (ANSI X3.9-1978 sect 6.6):
>    "Any numeric operation whose result is not mathematically defined is
>    prohibited in the execution of an executable program. Examples are
>    dividing by zero and raising a zero-valued primary to a zero-valued or
>    negative-valued power."
>Who knows whether IEEE infinity is a "mathematically defined result"? If you
I would think it isn't: it doesn't correspond to a single mathematical real
number in the way that ordinary IEEE numbers do.

           Some comments:
     1.  The result of multiplying 2 reals is always mathematically de-
fined.  Therefore I don't see how this clause prohibits multiplies which
will overflow or underflow.
     2.  Ordinary IEEE numbers don't really correspond to single mathema-
tical real numbers, they correspond to intervals.  Inf and -inf are just
the names of the intervals on the ends.
                       James B. Shearer

jbs@WATSON.IBM.COM (06/15/91)

         I said:
>Perhaps I should have said it is undesirable for a program to behave
>differently when compiled with and without optimization regardless
>of whether this is technically permitted by the standard.  (Actually
>I think it would desirable for the standard to exactly specify the
>behavior of any legal program but apparently it does not.)
         David Boundy said:
Undesirable, yes.  But also unavoidable in a market where performance
is what sells machines.

         Unavoidable is too strong.  I believe there are compilers
(such as xlf) for which compiling with optimization does not change
the behavior of legal programs.
         David Boundy said:
                                                                Thus,
standards do "exactly specify the behavior of any legal program:"  that's
the defintion of a "legal program."

         I believe this is incorrect.  If I write x=a+b+c and the stand-
ard does not specify the order of evaluation then the compiler is free
to evaluate it as (a+b)+c or as a+(b+c) (or according to some in either
way depending on the optimization level).  Thus the behavior of the pro-
gram is not exactly specified.
         David Boundy said:
           ( There's the issue that on our 68000 machines, we do arithmetic at
compile time in 64 bits, but the same expression unoptimized will be evaluated
at run time in 80 bits.  Sigh. )

         Well if the behavior of any legal program is exactly specified
how can this sort of thing occur.
                          James B. Shearer

jbs@WATSON.IBM.COM (06/15/91)

          Dik Winter said (the issue is whether 3x is a realistic
estimate of how much slower interval arithmetic will be compared to
straight floating point):
 > How come unrealistic?  Example, interval add a to b giving c:
 >         set round to - inf;
 >         c.low = a.low + b.low;
 >         set round to + inf;
 >         c.upp = a.upp + b.upp;
 > why would this be more than 3x slower than straight floating point?
          I said:
 >         Because it's 4 instructions vrs 1.
          Dik Winter:
Thus it would be 4 times as slow if each instruction takes exactly the same
time to issue.  This is not true on processors where the FPU is not pipelined
(in that case the setting of the rounding mode is much faster than the
addition).  It might also not be true on some pipelined processors, i.e.
if the multiply unit is not pipelined.  Etc.  Do you indeed know machines
where the above sequence would be four times as slow as a normal multiply?

          It will be at least 4 times slower on the Risc System 6000.
It may be more if the set floating mode instructions screw up the pipe-
line (I don't know if they do or not).
          I said
 >                                             Also lets see your code
 > for interval multiplication.
          Dik Winter:
Not difficult, it goes something like:
        if a.low >= 0.0 then
                if b.low >= 0.0 then
                        4 operations
                else if b.upp <= 0.0 then
                        4 operations
                else if b.upp >= - b.low then
                        4 operations
                else
                        4 operations
the remainder is left as an exercise for the reader.  The code is a bit
hairy, and on modern RISC machines you would not put it inline (a subroutine
jump to a leaf routine is extremely cheap).  We see that also here on
many machines the multiply operations take most of the time.  (The code
assumes that intervals do not contain inf, if you allow that the code only
gets hairier, not much more time consuming.)

          Some comments:
     1.  Handling this by a subroutine call would seem to require 4
floating loads to pass the input arguments and 2 floating loads to re-
trieve the answers.  This is already expensive.
     2.  All the floating compares and branches will severely impact
performance on the Risc System 6000.  The slowdown will be much more
than 3x.
     3.  Do you know of any machine where the above code will average
3x (or less) the time of a single multiply?
          On another topic Dik Winter said:
But you can get reasonable results without any pivoting if the condition
is very good!

          The need for pivoting is unrelated to the condition number.
This matrix 0 1 has condition number 1 and requires pivoting.
            1 0
This matrix 1+e 1-e (e small) has bad condition number and is not help-
            1-e 1+e           ed by pivoting.
                      James B. Shearer

dik@cwi.nl (Dik T. Winter) (06/16/91)

In article <9106150258.AA16308@ucbvax.Berkeley.EDU> jbs@WATSON.IBM.COM writes
trying to sidetrack me again.

The origin of the discussion was a remark that interval arithmetic in software
had been observed to be 300 times as slow as standard hardware floating point.
While interval arithmetic had been observed to be 3 times as slow.  Shearer
questioned the number 3; I thought he questioned the order of magnitude but
apparently he wants to know the exact number.  Of course that can not be
given as it is entirely machine dependent.  Also the factor would be a function
of the mix of operations performed.

I gave an example of interval addition in four instructions and asked
why it would be more than 3x slower; forgetting heavily pipelined machines
that do not take along the current rounding mode state in the pipeline,
forcing a slow setting of the rounding mode (the pipe must be empty to
modify it).  If the current rounding mode is carried along in the pipe
there would be no problem with the setting of the rounding mode.

JBS:
 >           It will be at least 4 times slower on the Risc System 6000.
Granted.  The support for the particular rounding modes and changing them
on the fly is not particularly well supported on the RS6000.  There are
also enough machines where it will be less than 3 times slower.

About the multiplication routine, JBS:
 >      1.  Handling this by a subroutine call would seem to require 4
 > floating loads to pass the input arguments and 2 floating loads to re-
 > trieve the answers.  This is already expensive.
But you have in most cases to load these 4 numbers anyhow, so why would this
be more expensive?  Why loads to retrieve the answers?  Why not just let
them sit in the FP registers?
 >      2.  All the floating compares and branches will severely impact
 > performance on the Risc System 6000.  The slowdown will be much more
 > than 3x.
O yes, on the RS6000.  There are machines that will behave differently.
Moreover, never it was said that there is an exact factor of 3 involved;
that factor was simply observed, for a set of programs.
 >      3.  Do you know of any machine where the above code will average
 > 3x (or less) the time of a single multiply?
So this is irrelevant.

What has been forgotten is that the same stuff in software would be
extremely expensive.  JBS, can you show a reasonable interval add in
software?  And an interval multiply?

 >           On another topic Dik Winter said:
 > But you can get reasonable results without any pivoting if the condition
 > is very good!
I should have added the context, where not only the condition of the complete
matrix is very good, but also of all its principal minors.  Of course you
should never take a pivotal element that is too small.
 > 
 > This matrix 1+e 1-e (e small) has bad condition number and is not help-
 >             1-e 1+e           ed by pivoting.
Yes, the condition is 1/e.  Assume LDL' decomposition.  The relative errors
in L are small compared to the elements; we have to look at diagonal matrix D.
The second element is 4e/(1+e).  Given a machine precision t, the absolute
error in that element is (%) t.|(1-e)^2/(1+e)| for a relative error (%)
t.|(1-e)^2/4e|.  So if e>0 the relative error is smaller than if e<0 showing
that pivoting helps even here!  Slightly of course, because e is small, but
always with pivoting it does not matter so very much whether you pivot on the
largest element or an element reasonably close to it.

But I am already severely sidetracked again.
--
(%) bounded by.
--
dik t. winter, cwi, amsterdam, nederland
dik@cwi.nl

dik@cwi.nl (Dik T. Winter) (06/16/91)

In article <9106150206.AA15019@ucbvax.Berkeley.EDU> jbs@WATSON.IBM.COM writes:
 >          Unavoidable is too strong.  I believe there are compilers
 > (such as xlf) for which compiling with optimization does not change
 > the behavior of legal programs.
Right, it does some surprising things to get at that goal.  Now you have to
explain to your customers the reason that
	DO 10 I = 1, N
	DO 10 J = 1, N
10	X(I,J) = A * B(I) * C(J)
is faster than
	DO 10 I = 1, N
	DO 10 J = 1, N
10	X(I,J) = A * B(J) * C(I)

If you see the expressions some people write...
I though we had left that distinction behind us some 15 years ago.
More surprising is the code generated (non-optimized) for the statement:
	X(I) = 2.0 * A(I) + 2.0
it goes something like:
	load 2.0 in fr1
	load a(i) in fr2
	multiply fr1 and fr2 to fr0
	load 2.0 in fr1
	load 2.0 in fr1
	multiply fr1 and fr2 and add fr1 to fr0
granted, it is non-optimized, but even than it sees that the maf operation can
be used (otherwise it would not be able to use it in the optimized version).
--
dik t. winter, cwi, amsterdam, nederland
dik@cwi.nl

mccalpin@perelandra.cms.udel.edu (John D. McCalpin) (06/16/91)

>>>>> On 15 Jun 91 22:23:08 GMT, dik@cwi.nl (Dik T. Winter) said:

Dik> In article <9106150258.AA16308@ucbvax.Berkeley.EDU>
Dik> jbs@WATSON.IBM.COM writes:

JBS> [....]  3.  Do you know of any machine where the above code will
JBS> average 3x (or less) the time of a single multiply?

Dik> So this is irrelevant.

Winter is claiming that Shearer is trying to obfuscate the issues
about the performance of interval arithmetic (in terms of timing, not
the usefulness of the results.  

I see Shearer asking a very simple question: Please provide *specific*
information on machines where Winter's algorithms for interval add and
interval multiply perform less than 3 times slower than the equivalent
simple operation with a single rounding mode.

Then I suppose that the next step is to compare the performance of
interval arithmetic with 128-bit arithmetic on a machine like the
RS/6000.  To do this, we will first need to find someone who knows how
expensive the rounding mode change is in the RS/6000 FPU pipe, and
then someone who knows the timings of quad-precision arithmetic on the
RS/6000.  (I will be able to learn the latter as soon as I get xlf
version 2 installed on my machine).
--
John D. McCalpin			mccalpin@perelandra.cms.udel.edu
Assistant Professor			mccalpin@brahms.udel.edu
College of Marine Studies, U. Del.	J.MCCALPIN/OMNET

glew@pdx007.intel.com (Andy Glew) (06/16/91)

    Interval techniques will have to be deployed in large quantities
    before it can be counted out of the running, IMHO.
    
    ----------------------------------------------------------------
    Keith H. Bierman    keith.bierman@Sun.COM| khb@chiba.Eng.Sun.COM

It will have to be successful before we can conclude that it is
unsuccessful?

--

Andy Glew, glew@ichips.intel.com
Intel Corp., M/S JF1-19, 5200 NE Elam Young Parkway, 
Hillsboro, Oregon 97124-6497

This is a private posting; it does not indicate opinions or positions
of Intel Corp.

jbs@watson.ibm.com (06/17/91)

         David Hough said:
> Various versions of the Linpack benchmark all have in common
> that the data is taken from a uniform random distribution, producing problems
> of very good condition.
         I believe "very good condition" is an exaggeration.  Problems
of size 1000 appear to typically have condition number between 10**5
and 10**6.  I would not consider this very good.  The condition num-
ber appears to be growing considerably faster than n, does anyone
know the expected asymptotic behavior?
         David Hough said:
>The Linpack benchmark is intended to solve an equation A x = b
>where the elements of A are chosen from a uniform random distribution
>and b is computed so that the correct solution x is close to a vector
>of 1's.  So looking at the size of components of x-1's is supposed to
>indicate how accurate the solution is, although
>the residual b - A x is a more reliable measure of quality.
         Why do you consider the residual a more reliable measure of
quality.  If the condition is bad the residual will still be small
when the answer is garbage.
         I asked:
>>         If you know of a single realistic application where the diff-
>>erence between 64 bit IEEE rounded to nearest and 64 bit IEEE chopped
>>to zero makes a significant difference in the maximum performance ob-
>>tainable I would like to see it.
         David Hough said:
>I doubt you'll find anything like this in normal scientific computation
>- the difference in problem domain between floating-point systems with
>identical exponent and significand fields and rounding schemes
>comparable in cost and care is rarely likely to be extremely
>noticable.  I would expect to notice the difference in problems like
>orbit computations, however - rounding should yield acceptable results
>for longer simulated times than chopping, even if everything else is
>equal, simply because the statistics of rounding are more favorable.  I
>have also heard that the details of rounding are important in financial
>calculations of bond yields.
         I believe IEEE chopped will be easier to implement and perform
better than IEEE rounded.  If rounding almost never matters why require
it?  I would expect the accuracy of orbit computations to be limited by
the accuracy to which the initial parameters are known and by the accur-
acy of the physical model (if I am not mistaken the models of tidal eff-
ects for example are not too accurate).  I find it hard to believe the
rounding method matters in bond calculations given 64-bit arithmetic and
sound methods (I believe there are foolish ways to compute internal rates
of return for example).
                       James B. Shearer

dik@cwi.nl (Dik T. Winter) (06/17/91)

In article <MCCALPIN.91Jun16095840@pereland.cms.udel.edu> mccalpin@perelandra.cms.udel.edu (John D. McCalpin) writes:
 > >>>>> On 15 Jun 91 22:23:08 GMT, dik@cwi.nl (Dik T. Winter) said:
 > 
 > Dik> In article <9106150258.AA16308@ucbvax.Berkeley.EDU>
 > Dik> jbs@WATSON.IBM.COM writes:
 > 
 > JBS> [....]  3.  Do you know of any machine where the above code will
 > JBS> average 3x (or less) the time of a single multiply?
 > 
 > Dik> So this is irrelevant.
 > 
 > Winter is claiming that Shearer is trying to obfuscate the issues
 > about the performance of interval arithmetic (in terms of timing, not
 > the usefulness of the results.  
 > 
 > I see Shearer asking a very simple question: Please provide *specific*
 > information on machines where Winter's algorithms for interval add and
 > interval multiply perform less than 3 times slower than the equivalent
 > simple operation with a single rounding mode.
 > 
 > Then I suppose that the next step is to compare the performance of
 > interval arithmetic with 128-bit arithmetic on a machine like the
 > RS/6000.  To do this, we will first need to find someone who knows how
 > expensive the rounding mode change is in the RS/6000 FPU pipe, and
 > then someone who knows the timings of quad-precision arithmetic on the
 > RS/6000.  (I will be able to learn the latter as soon as I get xlf
 > version 2 installed on my machine).
 > --
 > John D. McCalpin			mccalpin@perelandra.cms.udel.edu
 > Assistant Professor			mccalpin@brahms.udel.edu
 > College of Marine Studies, U. Del.	J.MCCALPIN/OMNET


--
dik t. winter, cwi, amsterdam, nederland
dik@cwi.nl

dik@cwi.nl (Dik T. Winter) (06/17/91)

In article <3709@charon.cwi.nl> dik@cwi.nl (Dik T. Winter) writes:
Sorry about that one; something went wrong.

In article <MCCALPIN.91Jun16095840@pereland.cms.udel.edu> mccalpin@perelandra.cms.udel.edu (John D. McCalpin) writes:
 > I see Shearer asking a very simple question: Please provide *specific*
 > information on machines where Winter's algorithms for interval add and
 > interval multiply perform less than 3 times slower than the equivalent
 > simple operation with a single rounding mode.
Oh yes, they exist.  It depends on your programming model.  I take the
80287/80387 and the 88100.  They all have a fp control register that defines
the rounding modes and the exception mask bits.  To change rounding modes
you can load that register with precalculated values, as the exception mask
bits change only rarely.  On those machines an interval add takes two load
control register instructions and two adds, as compared to a single add.
Cycle times:
	load fcr	fp add
80287	10		85
80387	19		24-32
88100	 1		 3
So on the 80287 and the 88100 interval add is clearly better than 3 times a
single add.  On the 80387 it is only slightly slower.  If we look at latency
for the 88100, a single add has a latency of 5 cycles, an interval add a
latency of 9 cycles.

So given proper support for change of rounding modes the factor of 3 can
easily be obtained for add.  Multiply requires more than 3 due to compares
and branches.  Still I would expect a factor of about 3 for especially the
80287 and the 88100.  The main point here is that those machines have a fpu
control register and a fpu status register.  Machines that combine the two
(SPARC, MIPS, RS6000, HPPA, 68k, 32k) will be at an disadvantage because
change of rounding mode implies fetching the current status/control register
(which in turn implies that the fp execute queue must be empty), changing a
single field and restoring the register.  The situation on the RS6000 is a
bit unclear.  There are instructions that modify only a subfield of the
FPSCR.  So it would have been simple to organize the FPSCR such that a single
instruction would modify only the rounding mode bits.  This ought to be
possible even if instructions are still in progress.  I do not know whether
this is done in reality (I do not even know the layout of the FPSCR, as the
stupid Assembler manual does not give it!).
 > 
 > Then I suppose that the next step is to compare the performance of
 > interval arithmetic with 128-bit arithmetic on a machine like the
 > RS/6000.
Note: this is *not* a replacement for interval arithmetic.  Granted, you
get better results in more cases; the guarantee on your result is lacking.

I am *not* an advocate for interval arithmetic (the people at Karlsruhe are).
I do not use it.  But I object to the way Shearer handles this:
a.  Shearer asks: what is the justification for the different rounding modes.
b.  Many responses come: interval arithmetic.
c.  Shearer asks: would it not be better helped with quad arithmetic?
d.  Response:  observed speed difference a factor 3 with hardware rounding
	modes, a factor 300 in software.
e.  Shearer questions the factor 3.  Apparently he believes the factor 300
	(does he?).
Even if the factor 3 would degrade on other machines to a factor of 5 or even
10, the difference with 300 is still striking.

I ask Shearer again: come with an interval add assuming the base arithmetic
is round to nearest only (or even worse, with truncating arithmetic, which
you advocate in another article).
--
dik t. winter, cwi, amsterdam, nederland
dik@cwi.nl

herrickd@iccgcc.decnet.ab.com (06/18/91)

In article <196@armltd.uucp>, dseal (David Seal) writes:
> A fair point, and I agree it's open to interpretation. I would favour (*not*
> "insist on") the 'subset of the unextended real numbers' interpretation
> because (a) the unextended real numbers are a mathematical system which is
> *comparatively* easy to work with; 

One of the problems with this conception (and it dates back to Backus'
team) is that any system of floating point numbers (ignoring new fangled
ideas like nan and inf) is just a finite set of rationals.  The REALs
have odd features like
    1)  When you add two of them together, you get another of them
    2)  Addition is associative
while the floating point number systems don't have such properties
(because the underlying set is finite).

dan herrick
herrickd@iccgcc.decnet.ab.com

tim@proton.amd.com (Tim Olson) (06/18/91)

In article <3710@charon.cwi.nl> dik@cwi.nl (Dik T. Winter) writes:
| Oh yes, they exist.  It depends on your programming model.  I take the
| 80287/80387 and the 88100.  They all have a fp control register that defines
| the rounding modes and the exception mask bits.  To change rounding modes
| you can load that register with precalculated values, as the exception mask
| bits change only rarely.  On those machines an interval add takes two load
| control register instructions and two adds, as compared to a single add.
| Cycle times:
| 	load fcr	fp add
| 80287	10		85
| 80387	19		24-32
| 88100	 1		 3
| So on the 80287 and the 88100 interval add is clearly better than 3 times a
| single add.  On the 80387 it is only slightly slower.  If we look at latency
| for the 88100, a single add has a latency of 5 cycles, an interval add a
| latency of 9 cycles.

One thing to remember, however, is that writes to control registers,
such as a floating-point control register, are usually "serializing
operations" in pipelined machines.  This means that the FP pipeline will
be flushed before the modification of the floating-point control
register takes place.

This doesn't make a difference in your timing analysis if result
dependencies already existed in the code stream (so that the FP adds
executed at their result rate, rather than their issue rate), but if
there was enough pipelining/parallelism to allow the FP adds to run at
full issue rate, then interval adds will be significantly more
expensive:

				FP add
	load fcr	latency		issue
Am29050     1              3	          1

Series of 5 dependent FADDs:
	normal:	  15 cycles
	interval: 40 cycles (2.67 X)

Series of 5 independent FADDs:
	normal:	   5 cycles
	interval: 40 cycles (8 X)

I suppose that careful analysis of the code could point out where you
can "batch" a series of independent calculations to reduce the
round-mode switching, and thus the serialization penalty, but the
penalty is still going to probably be much larger than 3X.

--
	-- Tim Olson
	Advanced Micro Devices
	(tim@amd.com)

dik@cwi.nl (Dik T. Winter) (06/18/91)

In article <1991Jun17.231640.3426@dvorak.amd.com> tim@amd.com (Tim Olson) writes:
 > One thing to remember, however, is that writes to control registers,
 > such as a floating-point control register, are usually "serializing
 > operations" in pipelined machines.  This means that the FP pipeline will
 > be flushed before the modification of the floating-point control
 > register takes place.
 > 
The keypoint here is that the 88k and 80x87 do have *separate* control and
status registers.  If they are separated there would (if the unit is well
designen) be no need for a write to the control register to be serializing.
And indeed, I do not think it is on those two (at least I have found in
nowhere in the documentation).  I do not know about the 29050; having no
docs for it.
--
dik t. winter, cwi, amsterdam, nederland
dik@cwi.nl

kenton@abyss.zk3.dec.com (Jeff Kenton OSG/UEG) (06/18/91)

In article <3736@charon.cwi.nl>, dik@cwi.nl (Dik T. Winter) writes:
|>
|> The keypoint here is that the 88k and 80x87 do have *separate* control and
|> status registers.  If they are separated there would (if the unit is well
|> designen) be no need for a write to the control register to be serializing.
|> And indeed, I do not think it is on those two (at least I have found in
|> nowhere in the documentation).  I do not know about the 29050; having no
|> docs for it.
|>

The 88k does not serialize the processor when you write to the FP control
register.  You could add extra code to force serialization before a write.

-----------------------------------------------------------------------------
==	jeff kenton		Consulting at kenton@decvax.dec.com        ==
==	(617) 894-4508			(603) 881-0011			   ==
-----------------------------------------------------------------------------

dgh@validgh.com (David G. Hough on validgh) (06/18/91)

jbs@ibm.com says:

> I believe IEEE chopped will be easier to implement and perform
> better than IEEE rounded. 

This is true, but only marginally.  The most important cost in IEEE (or VAX)
floating-point arithmetic is getting the correct unrounded result,
or at least enough of it to permit correct rounding.  This requires
provision of guard and (for chopping or unbiased rounding) sticky bits, 
postnormalization, 
development of full doubled-precision products,
careful division and sqrt, etc.

Once you've gotten the unrounded answer, rounding (as opposed to chopping) 
costs an extra add time and dealing with the possibility of a carry out
of the most significant bit.  Although these are real design costs, I doubt
they've affected performance of modern pipelined IEEE designs compared to
the cost of getting the unrounded result.

I suspect that the reason IBM 370 arithmetic isn't correctly chopped is 
that the cost of doing it correctly - maintaining a sticky bit for
subtraction - was deemed too high, and the computed result is more accurate
in those cases where rounding is done instead of chopping.  More accurate,
but not in a useful way, however - the size of the
error bounds you're likely to derive
will have to reflect chopping, but any algorithms that might like to depend
on correct chopping principles ( (x-z) < x if z is positive) will have to
do without.

In general, any process that involves repeated inexact conversions,
particularly binary->decimal->binary->decimal, or addition of many small
inexact quantities, will benefit from the better statistics of rounding
over chopping - drifting tendencies will be reduced, and error bounds
will be smaller.

For instance, during the 1970's one of the minor regional stock exchanges
noticed that its market stock average was declining in a bull market in which
most of the component stocks were rising in value.   There were a number
of factors at work in this phenomenon; using chopping arithmetic was
one of them.

Using enough precision can sometimes overcome deficiencies of chopping or
dirty arithmetic, otherwise Seymour Cray would have done things differently.
But it isn't always enough, as most Cray customers find out occasionally. 
-- 

David Hough

dgh@validgh.com		uunet!validgh!dgh	na.hough@na-net.ornl.gov

dseal@armltd.co.uk (David Seal) (06/18/91)

In article <9106150144.AA14442@ucbvax.Berkeley.EDU> jbs@WATSON.IBM.COM
writes:

>           David Seal (quoting someone quoting the fortran standard):
>>The missing rule is (ANSI X3.9-1978 sect 6.6):
>>    "Any numeric operation whose result is not mathematically defined is
>>    prohibited in the execution of an executable program. Examples are
>>    dividing by zero and raising a zero-valued primary to a zero-valued or
>>    negative-valued power."
>>Who knows whether IEEE infinity is a "mathematically defined result"? If you
>I would think it isn't: it doesn't correspond to a single mathematical real
>number in the way that ordinary IEEE numbers do.
>
>           Some comments:
>     1.  The result of multiplying 2 reals is always mathematically de-
>fined.  Therefore I don't see how this clause prohibits multiplies which
>will overflow or underflow.

It doesn't prohibit the multiply itself. However, the multiply is followed
by an implicit rounding operation. In the case of an overflowing multiply
(and assuming we are restricting our "mathematical" number system to be the
reals), the rounding error in this rounding operation generates a non-real:
see below for my argument on this.

IMHO this entitles an implementation to prohibit such multiplies. (Though I
would of course expect any implementation that did so to produce a run-time
"overflow" error rather than quietly ignoring it and producing rubbish
results...)

>     2.  Ordinary IEEE numbers don't really correspond to single mathema-
>tical real numbers, they correspond to intervals.  Inf and -inf are just
>the names of the intervals on the ends.
>                       James B. Shearer

I don't think it's quite as simple as this. IEEE numbers produced as results
of operations do indeed correspond to intervals. On the other hand, IEEE
numbers used as operands are treated as if they correspond to single real
numbers. The process of using a result as the source to some other operation
is essentially where rounding errors appear.

When one of the intervals at the end is produced, this "rounding error"
causes the IEEE system to treat the result (an unbounded real interval) as
something which is *not* a real number. This is where IEEE departs from the
real number system.

If you want IEEE numbers to genuinely correspond to intervals, you've got to
do interval arithmetic with them :-).

David Seal
I believe the "From" line problem here has been fixed now. In case it isn't,
my correct email address is "dseal@armltd.co.uk".

All opinions are mine only...

jbs@WATSON.IBM.COM (06/19/91)

         I said:
>I think that if a (legal) program behaves differently when compiled with
>and without optimization that then the compiler is broken.  Perhaps you
>disagree.
         Malcolm said:
      B = 5.0
      IF ( ( B .GT. 3.0 ) .OR. MYTEST(A) )
          PRINT some diagnotic messages
      LOGICAL FUNCTION MYTEST(X)
      REAL X
      MYTEST = some boolean work
      PRINT a few diagnostics on the screen
      RETURN
      END
         I modified this to:
      LOGICAL*4 MYTEST
      B = 5.0
      IF ( ( B .GT. 3.0 ) .OR. MYTEST(B) )THEN
       WRITE(6,*)' MAIN'
      ENDIF
      END
      LOGICAL FUNCTION MYTEST(X)
      REAL X
      WRITE(6,*)' MYTEST'
      MYTEST = .TRUE.
      RETURN
      END
         I checked the behavior with the IBM VS Fortran and Xlf compi-
lers.  In both cases the behavior was not changed by optimization.  The
behavior differs between the two compilers but that is a different is-
sue.  So this example will not convince me that I'm wrong.  (Actually
I later softened my original statement to say it is undesirable for
optimization to affect a program's behavior.)
                       James B. Shearer

jbs@WATSON.IBM.COM (06/19/91)

         Dik Winter said:
The origin of the discussion was a remark that interval arithmetic in software
had been observed to be 300 times as slow as standard hardware floating point.
While interval arithmetic had been observed to be 3 times as slow.  Shearer
questioned the number 3; I thought he questioned the order of magnitude but
apparently he wants to know the exact number.  Of course that can not be
given as it is entirely machine dependent.  Also the factor would be a function
of the mix of operations performed.

         I believe the original remark referred to estimates not observa-
tions.  I questioned in passing whether 3x was a realistic estimate.  I
continue to believe it is extremely optimistic and that 10x slower would
be more reasonable.
         Dik Winter:
About the multiplication routine, JBS:
 >      1.  Handling this by a subroutine call would seem to require 4
 > floating loads to pass the input arguments and 2 floating loads to re-
 > trieve the answers.  This is already expensive.
But you have in most cases to load these 4 numbers anyhow, so why would this
be more expensive?  Why loads to retrieve the answers?  Why not just let
them sit in the FP registers?

         If you are doing your operations memory to memory then this is
correct.  However if you are keeping intermediate results in registers
as will often be the case, then you must move your 4 operands from the
registers they are in to the registers the subroutine expects them in
and you must move your 2 answers out of the return registers into the
registers you want them in (if you just let them sit they will be wiped
out by the next call).  In general it will be possible to avoid doing
some of this movement but I don't see how to avoid all of it.
         Dik Winter:
Moreover, never it was said that there is an exact factor of 3 involved;
that factor was simply observed, for a set of programs.
 >      3.  Do you know of any machine where the above code will average
 > 3x (or less) the time of a single multiply?
So this is irrelevant.

         Who observed this?  Under what conditions?
         Dik Winter:
 >           On another topic Dik Winter said:
 > But you can get reasonable results without any pivoting if the condition
 > is very good!
I should have added the context, where not only the condition of the complete
matrix is very good, but also of all its principal minors.

         This still isn't right.  Consider e 1  e small (but not zero).
                                           1 e
         Dik Winter (in a later post):
I am *not* an advocate for interval arithmetic (the people at Karlsruhe are).
I do not use it.  But I object to the way Shearer handles this:
a.  Shearer asks: what is the justification for the different rounding modes.
b.  Many responses come: interval arithmetic.
c.  Shearer asks: would it not be better helped with quad arithmetic?
d.  Response:  observed speed difference a factor 3 with hardware rounding
        modes, a factor 300 in software.
e.  Shearer questions the factor 3.  Apparently he believes the factor 300
        (does he?).
Even if the factor 3 would degrade on other machines to a factor of 5 or even
10, the difference with 300 is still striking.

I ask Shearer again: come with an interval add assuming the base arithmetic
is round to nearest only (or even worse, with truncating arithmetic, which
you advocate in another article).

         Some comments:
         Regarding b if interval arithmetic is the only reason for the
different rounding modes then I think they may safely be junked.
         Regarding c what I actually said was I thought quad precision
would provide some support for interval arithmetic (not better support).
In any case upon further reflection I will withdraw this statement.
         Regarding d as I said above I believe these were estimates.
         Regarding e I don't really believe the 300x either and I part-
icularly don't believe the 100x ratio.
         Regarding how I would implement interval arithemtic it is not
particularly difficult to do without the rounding modes as long as you
don't insist on maintaining the tightest possible intervals.  There is
no great loss in being a little sloppy since an extra ulp only matters
for narrow intervals and the main problem with interval arithmetic is
that the intervals don't stay narrow even if you are careful.
                       James B. Shearer

dce@mitre.org (Dale Earl) (06/20/91)

 >
 >Henry Spencer wrote (a while back now):
 >
 >My understanding is that the funnier-looking features, in particular the
 >infinities, NaNs, and signed zeros, mostly cost essentially nothing.
 >
 >          I believe this statement is incorrect.  Infs and nans compli-
 >cate the design of the floating point unit.  Even if there is ultimately
 >no performance impact, this added design effort represents a very real
 >cost.  Perhaps some hardware designers could comment.

     With my previous employer I helped design several floating point
units and in my experience the most difficult part of the IEEE
specification to handle were the denormalized numbers.  The
infinities, NANs, signed zeros, and rounding modes do not make the
hardware substantially more complicated or significantly decrease
performance.  As has been pointed out in other related postings these
features can be valuable for many numerical algorithms.  The value vs.
cost for denomalized numbers though is not favorable IMHO.  I feel a
representation for an infinitely small non-zero value would be useful
and much easier to implement in hardware.  But if you have to be able
to say that your device is 100% IEEE floating point compatible you get
stuck implementing the entire standard no matter what the hardware
tradeoffs.

     A related method we also employed on another device that did not
have to be 100% compliant may be of interest.  This device performed
both single precision integer and floating point operations.
Therefore we had a full 32 bit adder and multiplier available.  For
this device we made the on board register file 40 bits wide, 8 bits
for exponent and 32 bits for sign and mantissa.  Without going into
all the details, the integer operands used the 32 bit field and were
processed in a normal manner (exponent field was passed unchanged).
Floating point was performed using the full 32 bit mantissa with
rounding only on output from the chip.  This carried an extra 8 bits
of precision from intermediate operations which remained within the
chip.  The chip also could extract the exponent field into an integer
so we could emulate multiple precision in software if desired.  Not
exactly IEEE standard but probably would work well for many real
applications where better precision is more important than rigorous
numerical error bounding.   

(Please excuse this resend if you have seen it before, distribution problems.)





--

Dale C. Earl                                                dce@mitre.org
These are my opinions, only my opinions, and nothing but my opinions! DCE 

dik@cwi.nl (Dik T. Winter) (06/20/91)

In article <9106190252.AA29755@ucbvax.Berkeley.EDU> jbs@WATSON.IBM.COM writes:
 >          Regarding how I would implement interval arithemtic it is not
 > particularly difficult to do without the rounding modes as long as you
 > don't insist on maintaining the tightest possible intervals.  There is
 > no great loss in being a little sloppy since an extra ulp only matters
 > for narrow intervals and the main problem with interval arithmetic is
 > that the intervals don't stay narrow even if you are careful.

This is were your opinion differs from the opinion of the Karlsruhe people.
In another posting someone asked how to do pivoting when interval arithmetic
is used.  Also that misrepresents the work of the Karlsruhe people.

To reiterate what the Karlsruhe people are doing to solve a set of linear
equation:
1.	Calculate an approximate technique using your favorite solver.  No
	interval arithmetic is done in this stage!
2.	Next find an enclosing interval for the solution.
3	Narrow the interval using iterative methods with the original matrix.
So we see that *no* interval pivoting is done.
Also to succesful narrow the interval in stage 3 it is important that the
arithmetic provides the narrowest interval for every operation.
In many cases they succeed in finding solutions were each component of the
solution is pinned between two successive machine numbers.

I have my doubts about it (many people have).  But this is the most senseful
approach to interval arithmetic I know of.  Most work (finding the initial
iteration) is done without any interval arithmetic.  You use it only to squeeze
out the last few bits.
--
dik t. winter, cwi, amsterdam, nederland
dik@cwi.nl

jpk@Ingres.COM (Jon Krueger) (06/20/91)

jbs@WATSON.IBM.COM writes:
>I checked the behavior [of Malcom's code sample] with the IBM VS
>Fortran and Xlf compilers.  In both cases the behavior was not changed
>by optimization ... So this example will not convince me that I'm wrong.

I see.  Therefore no standard FORTRAN compiler would be permitted
to do any differently, right?  These compilers define what is and
is not a legal optimization, is that your assertion?

-- Jon

Jon Krueger, jpk@ingres.com 

koenig@urz.unibas.ch (06/20/91)

In article <KHB.91Jun11185538@chiba.Eng.Sun.COM>, khb@chiba.Eng.Sun.COM (Keith Bierman fpgroup) writes:
> 
> In article <9106120054.AA19903@ucbvax.Berkeley.EDU> jbs@WATSON.IBM.COM writes:
> 
>    instructions into the IEEE standard and they have been implemented
>    in hardware in many machines.  Interval arithmetic still hasn't
>    gotten off the ground.  How long are we expected to wait until
>    concluding it's not going to fly.
> 
> Until vendors (other than apple and sun) provide hooks that people can
> actually use. Had such hooks been part of _any_ language standard, I
> suspect folks with such interests would have merrily coded up systems.
> 
>        3. The real problem with interval arithmetic is not that it's
>    slow but that on real applications its error estimates are gener-
>    ally so pessimistic as to be useless.
> 
> At last, something we agree on! But I confess that I have not used it
> extensively, so I am prepared to believe that it may have utility I am
> not personally cognizant of. Interval techniques will have to be
> deployed in large quantities before it can be counted out

Obviously, these discussions are about the *naive* approach to interval
arithmetic, replacing all floating-point numbers by intervals and re-
placing all floating-point operations of an algorithm by interval 
operations. - In consequence, a huge inflation occurs, if the number
of operations increases, because dependent quantities are handled like
independent quantities.

A simple example:  With X being an interval,  X - X  is not equal to 
zero(in general), but yields an interval, which lies symmetrically 
around zero and has a diameter twice as great as that of X. This is
because the resulting interval must include all differences between 
any two values out of X at interval subtraction. The second operand is
independently chosen from the first.

Hence, interval arithmetic must be applied to special algorithms for
achieving highly accurate results. Since more than one decade, mathe-
maticians at the University of Karlsruhe (Germany) designed such methods,
which are based on fixed point theorems (Brouwer) and yield highly
accurate verified solutions - even in ill-conditioned cases.

Example: The Hilbert matrix of dimension 14 has the condition number 
1.85e+19. The inverse of that matrix can be computed with maximum accuracy
(all components are intervals without any floating-point number between 
the lower and the upper bound!) using the IEEE double extended format and
an implementation of the scalarproduct with maximum accuracy. Furthermore
the algorithm yields not only the optimal verified enclosure of the solu-
tion, but proves its existence and its uniqueness, too.

Stefan Koenig

--------------------------------------------------------------------------

Here are some references for details.

@ARTICLE{BRUG87,
	AUTHOR = "G. Bohlender and L.B. Rall and Chr. Ullrich and J. Wolff von 
		Gudenberg ",
	TITLE     = "{PASCAL-SC: A Computer Language for Scientific 
		Computation}",
	JOURNAL = "Perspectives in computing",
	VOLUME = "17",
	YEAR       = "1987",
	NOTE	= "Orlando"}

@INPROCEEDINGS{Bohl90,
	AUTHOR = {G. Bohlender},
	TITLE     = "{What do we need beyond IEEE Arithmetic?}",
	BOOKTITLE = "{Computer Arithmetic and Self-Validating Numerical Methods}",
	PUBLISHER = "Academic Press",
	EDITOR = "{Chr. Ullrich}",
	ADDRESS = "Boston",
	YEAR       = "1990"}

@ARTICLE{FSH88,
	AUTHOR = {H.C. Fischer and G. Schuhmacher and R. Haggenm{\"u}ller},
	TITLE     = "{Evaluation of Arithmetic Expression with  Guaranteed High
		Accuracy}",
	JOURNAL = "Computing, Suppl.",
	VOLUME = "6",
	PAGES	= "149--158",
	YEAR       = "1988",
	NOTE	= "Springer, Wien"}

@BOOK{KM84,
	AUTHOR = "E. Kaucher and W.L. Miranker:",
	TITLE     = "{Self-Validating Numerics for Function Space Problems}",
	PUBLISHER = "Academic Press",
	ADDRESS = "New York",
	YEAR           = "1984"}

@BOOK{KM81,
	AUTHOR = "U. Kulisch and W.L. Miranker",
	TITLE     = "{Computer Arithmetic in Theory and Practice}",
	PUBLISHER = "Academic Press",
	ADDRESS = "New York",
	YEAR           = "1981"}

@BOOK{KM83,
	EDITOR = "U. Kulisch and W.L. Miranker",
	TITLE     = "{A New Approach to Scientific Computation}",
	PUBLISHER = "Academic Press",
	ADDRESS = "New York",
	YEAR           = "1983"}

@ARTICLE{KS88,
	AUTHOR = "U. Kulisch and H.J. Stetter",
	TITLE     = "{Scientific Computation with Automatic Result 
		Verification}",
	JOURNAL = "Computing Supplementum",
	VOLUME = "6",
	YEAR       = "1988",
	NOTE   =  "Springer, Wien"}

@INPROCEEDINGS{Kauc85,
	AUTHOR =	"E. Kaucher",
	TITLE =		"Self-Validating Methods for Special Differential-Equation-Problems",
	BOOKTITLE =	"11th IMACS World Congress on System Simulation and Scientific Computation",
	YEAR =		"1985",
	PAGES =		"115",
	ORGANIZATION =	"IMACS",
	ADDRESS =	"Oslo, Norway"	}

@INPROCEEDINGS{Rall85,
	AUTHOR =	"L.B. Rall",
	TITLE =		"Computable Bounds for Solutions of Integral Equations",
	BOOKTITLE =	"11th IMACS World Congress on System Simulation and Scientific Computation",
	YEAR =		"1985",
	PAGES =		"119",
	ORGANIZATION =	"IMACS",
	ADDRESS =	"Oslo, Norway"	}

@INPROCEEDINGS{Rump82,
	AUTHOR =	"S.M. Rump",
	TITLE =		"Solving Algebraic Problems with High Accuracy",
	BOOKTITLE =	"10th IMACS World Congress on System Simulation and Scientific Computation",
	YEAR =		"1982",
	PAGES =		"389",
	ORGANIZATION =	"IMACS",
	ADDRESS =	"Montreal, Canada"	}


@INPROCEEDINGS{Mark89,
	AUTHOR =	"S. Markov",
	TITLE =		"Least-square Approximation under Interval Data",
	BOOKTITLE =	"International Symposium on Computer Arithmetic and Self-Validating Numerical Methods",
	YEAR =		"1989",
	PAGES =		"",
	ORGANIZATION =	"IMACS-GAMM-GI",
	ADDRESS =	"Basel, Switzerland"	}

@INPROCEEDINGS{Lohn89,
	AUTHOR =	"R. Lohner",
	TITLE =		"Computing Enclosures for the Solution of Nonlinear Parameter Dependent Systems 
				and Ordinary Boundary Value Problems",
	BOOKTITLE =	"International Symposium on Computer Arithmetic and Self-Validating Numerical Methods",
	YEAR =		"1989",
	PAGES =		"",
	ORGANIZATION =	"IMACS-GAMM-GI",
	ADDRESS =	"Basel, Switzerland"	}

@INPROCEEDINGS{Corl89,
	AUTHOR =	"G.F. Corliss",
	TITLE =		"How Can You Apply Interval Techniques in an Industrial Setting?",
	BOOKTITLE =	"International Symposium on Computer Arithmetic and Self-Validating Numerical Methods",
	YEAR =		"1989",
	PAGES =		"",
	ORGANIZATION =	"IMACS-GAMM-GI",
	ADDRESS =	"Basel, Switzerland"	}

jbs@WATSON.IBM.COM (06/21/91)

          I said:
>         I checked the behavior with the IBM VS Fortran and Xlf compi-
>lers.  In both cases the behavior was not changed by optimization.  The
>behavior differs between the two compilers but that is a different is-
>sue.  So this example will not convince me that I'm wrong.  (Actually
>I later softened my original statement to say it is undesirable for
>optimization to affect a program's behavior.)
          (My original statement was a compiler which compiles a legal
program so that it behaves differently when compiled with optimization
is "broken".)
          Jon Krueger selectively quoted this as:
>I checked the behavior <of Malcom's code sample> with the IBM VS
>Fortran and Xlf compilers.  In both cases the behavior was not changed
>by optimization ... So this example will not convince me that I'm wrong.
          and commented:
:I see.  Therefore no standard FORTRAN compiler would be permitted
:to do any differently, right?  These compilers define what is and
:is not a legal optimization, is that your assertion?
          I have the following comments:
    1.  This is not my position as I believe the final sentence in my
post (which you tendentiously failed to quote) makes clear.  Do you
disagree that it is undesirable for optimization to affect a program's
behavior?
    2.  I believe a compiler can obey the Fortran standard and still be
properly called broken.  For example it can take 100 hours to compile a
trivial program.
    3.  The person who posted the code example didn't say what the exam-
ple was supposed to show.  If it was supposed to show any compiler would
produce code that behaved differently when compiling with optimization
then it is incorrect.
                       James B. Shearer

prener@watson.ibm.com (Dan Prener) (06/22/91)

In article <9106190124.AA27410@ucbvax.Berkeley.EDU>, jbs@WATSON.IBM.COM writes:
|> 
|>          I said:
|> >I think that if a (legal) program behaves differently when compiled with
|> >and without optimization that then the compiler is broken.  Perhaps you
|> >disagree.
|>          Malcolm said:
|>       B = 5.0
|>       IF ( ( B .GT. 3.0 ) .OR. MYTEST(A) )
|>           PRINT some diagnotic messages
|>       LOGICAL FUNCTION MYTEST(X)
|>       REAL X
|>       MYTEST = some boolean work
|>       PRINT a few diagnostics on the screen
|>       RETURN
|>       END
|>          I modified this to:
|>       LOGICAL*4 MYTEST
|>       B = 5.0
|>       IF ( ( B .GT. 3.0 ) .OR. MYTEST(B) )THEN
|>        WRITE(6,*)' MAIN'
|>       ENDIF
|>       END
|>       LOGICAL FUNCTION MYTEST(X)
|>       REAL X
|>       WRITE(6,*)' MYTEST'
|>       MYTEST = .TRUE.
|>       RETURN
|>       END
|>          I checked the behavior with the IBM VS Fortran and Xlf compi-
|> lers.  In both cases the behavior was not changed by optimization.  The
|> behavior differs between the two compilers but that is a different is-
|> sue.  So this example will not convince me that I'm wrong.  (Actually
|> I later softened my original statement to say it is undesirable for
|> optimization to affect a program's behavior.)
|>                        James B. Shearer

As immodest as it might be for me to agree with my colleague, I must
agree with James Shearer.  Indeed, the spirit of FORTRAN is, overwhelmingly,
that a program whose behavior is changed by optimization is not a legal
program.  In particular, I quote from the FORTRAN '77 standard

   It is not necessary for a processor to evaluate all of the operands
   of an expression if the value of the expression can be determined
   otherwise.  This principle is most often applicable to logical
   expressions, but it applies to all expressions.  For example, in
   evaluating the logical expression    X  .GT.  Y  .OR.  L(Z)
   where X, Y, and Z are real, and L is a logical function, the
   function reference L(Z) need not be evaluated if X is greater than
   Y.

to show that the compiler may do it either way.  And I cite another
passage from the same standard

   For example the statements

          A(I) = F(I)
          Y = G(X) + X

   are prohibited if the reference to F defines I or the reference to G
   defines X.

which suggests that programs that admit more than one correct interpretation
are illegal.  

I would say that, since the program(s) offered above are, according to
the language standard, non-deterministic, then they are illegal.

Note that optimization has nothing to do with this.  According to the
standard, a compiler could correctly produce a program which would call
the function MYTEST or not depending upon the time of day at which it
was run.
-- 
                                   Dan Prener (prener @ watson.ibm.com)

preston@ariel.rice.edu (Preston Briggs) (06/23/91)

prener@prener.watson.ibm.com (Dan Prener) writes:

>Indeed, the spirit of FORTRAN is, overwhelmingly, that a program 
>whose behavior is changed by optimization is not a legal program.

A wonderfully concise, quotable statement.
So much nicer than Shearer's assertion that the optimizer must be broken.

ok@goanna.cs.rmit.oz.au (Richard A. O'Keefe) (06/24/91)

In article <1991Jun22.003029.16748@watson.ibm.com>,
prener@watson.ibm.com (Dan Prener) writes:
> As immodest as it might be for me to agree with my colleague, I must
> agree with James Shearer.  Indeed, the spirit of FORTRAN is, overwhelmingly,
> that a program whose behavior is changed by optimization is not a legal
> program.

In that case, a subroutine like	
	SUBROUTINE FOO(X, A, B, C)
	    REAL X, A, B, C
	    X = A*B + A*C
	END
must not be legal.  Darn it, I used to _like_ Fortran.  (Yes, optimisation
_may_ change the behaviour of this program in a signficant way.)  Perhaps
more obviously,
	SUBROUTINE FOO(X, A, B, C)
	    REAL X, A, B, C
	    X = A*B
	    X = X + A*C
	END
may _also_ have its behaviour changed by certain optimisations.  (Hint:
Sun's compilers have a command-line flag related to this one.)  The use
of parentheses won't help here.

The moral of the story is that the values and operations provided by
floating-point hardware _apprximate_ that values and operations of
real arithmetic, but do not obey the same laws, so that a language which
licences transformations which are valid for the reals but invalid for
the floats is a language which licences a compiler to produce different
numeric results (possibly HUGELY different) depending on optimisation
level.
-- 
I agree with Jim Giles about many of the deficiencies of present UNIX.

jpk@Ingres.COM (Jon Krueger) (06/25/91)

jbs@WATSON.IBM.COM (James B. Shearer) writes:
> Do you disagree that it is undesirable for optimization to affect
> a program's behavior?

Yes.

It is a good and joyful thing, always and everywhere, for
compilers to expose broken programs.

Correct programs can be written in available languages.
The meaning of such programs does not depend on whether
compilers apply permissible optimizations.

-- Jon

Jon Krueger, jpk@ingres.com 

jbs@WATSON.IBM.COM (06/26/91)

         Dan Prener said:
>I would say that, since the program(s) offered above are, according to
>the language standard, non-deterministic, then they are illegal.
         But any program with floating point arithmetic in it is,
according to the language standard, non-deterministic (since the lang-
uage standard does not specify the floating point model).  Surely you
don't believe all such programs are illegal.
         Preston Briggs quotes Dan Prener:
:>Indeed, the spirit of FORTRAN is, overwhelmingly, that a program
:>whose behavior is changed by optimization is not a legal program.
         and adds:
>A wonderfully concise, quotable statement.
>So much nicer than Shearer's assertion that the optimizer must be broken.
         Isn't it however logically equivalent to my statement:
>I think that if a (legal) program behaves differently when compiled with
>and without optimization that then the compiler is broken.
         I asked
> Do you disagree that it is undesirable for optimization to affect
> a program's behavior?
         Jon Krueger replied:
>Yes.
>It is a good and joyful thing, always and everywhere, for
>compilers to expose broken programs.
>Correct programs can be written in available languages.
>The meaning of such programs does not depend on whether
>compilers apply permissible optimizations.
         My comments:
    1.  I would not buy a compiler from someone who enjoys breaking
user code.
    2.  As I pointed out above the behavior of any Fortran program
with floating point operations is not specified by the standard.  If
you believe optimizers may make any transformation permitted by the
standard then the meaning of any program with floating point opera-
tions may depend on the optimization level.  Hence the above quoted
statement would imply all such programs are not correct.
         In general I am disturbed by the attitude that just because
the fortran standard permits some atrocity (such as evaluating a*x+
a*y as a*(x+y)) it is improper to criticize a compiler for doing it.
In my view the standard represents a least common denominator set of
minimal requirements and any quality compiler will obey a stronger
set of requirements.  In this regard, I am not a language lawyer but
it is my understanding that since the standard says nothing about the
accuracy of intrinsic functions a compiler could replace sin(x) with
0 and cos(x) with 1 and be standard compliant.  Is this the case?
                    James B. Shearer

states@artemis.nlm.nih.gov (David States) (06/26/91)

In article <9106252305.AA27299@ucbvax.Berkeley.EDU>, jbs@WATSON.IBM.COM writes:

|>          In general I am disturbed by the attitude that just because
|> the fortran standard permits some atrocity (such as evaluating a*x+
|> a*y as a*(x+y)) it is improper to criticize a compiler for doing it.
|> In my view the standard represents a least common denominator set of
|> minimal requirements and any quality compiler will obey a stronger
|> set of requirements.  

A compiler can optimize for many things, (speed, space, I/O etc.), and
the use of feedback in optimization through profiling is established if 
not necessarily wide spread, so why not allow a compiler to optimize for 
precision?  Most readers of this group have undoubtedly seen their share
of sloppy/ill considered/hasty code.  Wouldn't a little help from the 
compiler be welcome?

Issues:
	What is the appropriate measure?  average error, worst case
	error, worst observed error, some combination of the above?

	Is the simple rearrangement of arithmetic sufficient, or
	would the compiler need to "know" trig identities and the
	properties of transcendtal functions to be effective?

	Is it OK to break a program which depends on imprecision/error
	to run as expected (correctly???)?

	How effective could this be without profiling and feedback?
	What would you need to save in profiling if you used it?

|>                     James B. Shearer

David States

slackey@bbn.com (Stan Lackey) (06/27/91)

In article <1991Jun24.211407.17389@ingres.Ingres.COM> jpk@Ingres.COM (Jon Krueger) writes:
>jbs@WATSON.IBM.COM (James B. Shearer) writes:
>> Do you disagree that it is undesirable for optimization to affect
>> a program's behavior?
>Yes.
>It is a good and joyful thing, always and everywhere, for
>compilers to expose broken programs.

Many vector machines do summations in different than program order; thus
optimization on vs. off can and does produce different answers.  They
can be very different depending upon the actual numbers.

-Stan

jgreen@Alliant.COM (John C Green Jr) (06/28/91)

This discussion involves non `standard conforming' programs which get different
answers with different compilers or with different optimization options on the
same compiler.

My favorite program of this type in genuine upper case FORTRAN IV was
constructed by Stanley Rabinowitz in 1969. He got paid $50 by the magazine
"Software Age" which published it in their column "X-Tran's Adventures In
Troubletran".

      LOGICAL FOO
      I = 0
      IF ( I .EQ. 0 .OR. FOO ( I ) ) I = I + 2
      WRITE ( 6 , 99 ) I
      STOP
   99 FORMAT ( 1X , 1I1 )
      END

      LOGICAL FUNCTION FOO ( J )
      J = J + 1
      FOO = .FALSE.
      RETURN
      END

This was run on an IBM System/360 Model 50 under three of the world's most
popular compilers:

    Fortran G   Answer: 1
    Fortran H   Answer: 2
    Watfor      Answer: 3

Watfor checked to see if I was 0 then called the function.
Fortran H checked to see if I was 0 and then did not call the function.
Fortran G called the function and then checked to see if I was 0.

I found it amazing that such a small program can get three different answers
with no compiler diagnostics and no runtime diagnostics on three of the most
popular compilers on the same machine!

The paragraph in the Fortran '66 standard says approximately:

    If a subroutine subprogram or a function subprogram changes any of its
    arguments or any element in COMMON then that argument or element of COMMON
    cannot appear anywhere else in any statement that calls the subroutine or
    invokes the function.

In line 3 of the program the invocation of the function FOO changes I and I
appears no less than 3 other times in this statement.

The paragraph in the Fortran '77 standard section 6.6, `Evaluation of
Expressions' says in part:

    The execution of a function reference in a statement may not alter the
    value of any other entity within the statement in which the function
    reference appears. The execution of a function reference in a statement may
    not alter the value of any entity in common that affects the value of any
    other function reference in that statement. However, execution of a
    function reference in the expression `e' of a logical IF statement is
    permitted to affect entities in the statement `st' that is executed when
    the value of the expression `e' is true.

In line 3 of the program the invocation of the function FOO changes I and I
appears in the expression `e' of the logical IF statement. In Fortran '77 there
is no problem with the I = I + 2.

jpk@Ingres.COM (Jon Krueger) (06/28/91)

>    1.  I would not buy a compiler from someone who enjoys breaking
>user code.

Neither would I.

But a compiler which exposes broken user code is a different animal.

-- Jon

Jon Krueger, jpk@ingres.com 

jbs@WATSON.IBM.COM (06/29/91)

         I said:
:>    1.  I would not buy a compiler from someone who enjoys breaking
:>user code.
         Jon Krueger replied:
>Neither would I.
>But a compiler which exposes broken user code is a different animal.
         User code which complies with a variant or extension of the
Fortran 77 (or any other) standard may not be portable but it is not
"broken".
         The DO, ENDDO construct is not in the Fortran 77 standard.
Nevertheless many Fortran 77 compilers recognize it and many users use
it.  I would not advise omitting it from your compiler in order to
"expose" "broken" user code.
                          James B. Shearer

jpk@Ingres.COM (Jon Krueger) (06/29/91)

I wrote:
>>It is a good thing for compilers to expose broken programs.

Stan wrote:
>Many vector machines do summations in different than program order; thus
>optimization on vs. off can and does produce different answers.  They
>can be very different depending upon the actual numbers.

Not what I was referring to.  Such compilers are broken.

The meaning of some programs depends on the ordering of some of
their steps.  To change the ordering is to change the meaning --
an impermissible optimization.

A program in a sequential language may reasonably depend on the
sequential ordering of its steps.  Such programs are not broken.

A program in a language where order of expression evaluation is
undefined may not reasonably depend on left-to-right evaluation.
Such programs are broken.

-- Jon

Jon Krueger, jpk@ingres.com 

mccalpin@perelandra.cms.udel.edu (John D. McCalpin) (06/29/91)

>>>>> On 28 Jun 91 23:13:12 GMT, jbs@WATSON.IBM.COM said:

jbs>          The DO, ENDDO construct is not in the Fortran 77 standard.
jbs> Nevertheless many Fortran 77 compilers recognize it and many users use
jbs> it.  I would not advise omitting it from your compiler in order to
jbs> "expose" "broken" user code.

The word "broken" is clearly being used in this context to refer to
code whose behavior is undetermined under the standard --- therefore
*logically* "broken".  The examples which have been the topic of
discussion here depend on either subtle issues of floating-point code
re-arrangement or (in the other thread) on a particular
implementation's treatment of functions with side effects.

The use of extensions is an entirely different issue.  The resulting
codes are *syntactically* incorrect FORTRAN-77 --- their *logical*
interpretation under a *superset* of the FORTRAN language is
independent of the fact that extensions are used.

I certainly appreciate any efforts that a compiler makes to tell me my
code is either broken or non-portable.  Most major vendors provide
compiler switches to flag ANSI non-compliance.  Very few provide
robust and comprehensive extensions to detect common *logical* errors,
such as:
	-- use of uninitialized variables
	-- argument mismatch in calling sequences
	-- use of undeclared variables (mispellings)
	-- out-of-bounds array references in called routines
		(often a result of an argument mismatch)
--
John D. McCalpin			mccalpin@perelandra.cms.udel.edu
Assistant Professor			mccalpin@brahms.udel.edu
College of Marine Studies, U. Del.	J.MCCALPIN/OMNET

chuck@pluto.Harris-ATD.com (Chuck Musciano) (06/30/91)

In article <1991Jun27.215747.5329@ingres.Ingres.COM>, jpk@Ingres.COM (Jon Krueger) writes:
> >    1.  I would not buy a compiler from someone who enjoys breaking
> >user code.
> 
> Neither would I.
> 
> But a compiler which exposes broken user code is a different animal.

     Oh, people who can break user code make the very best compiler
writers.  As every compiler writer knows (and I'm proud to call myself
one), the easy cases are the obvious cases.  The real work in a compiler
is in the borderline stuff, the strange combinations of language features,
the weird usages.  A person that is always thinking of the bizarre
situation that can break user code is just the person to work on a 
compiler, because they are thinking of those situations and making sure
the compiler can handle them.

     I can recall cases where we were designing a multiprocessor version
of Pascal, and someone wanted list processing primitives in the langauge.
I opposed them, favoring instead more atomic primitives that could be used
to construct list primitives.  Only by coming up with strange and unusual,
yet perfectly valid, usages could I convince the others that the atomic
way was the way to go.

-- 

Chuck Musciano				ARPA  : chuck@trantor.harris-atd.com
Harris Corporation 			Usenet: ...!uunet!x102a!trantor!chuck
PO Box 37, MS 3A/1912			AT&T  : (407) 727-6131
Melbourne, FL 32902			FAX   : (407) 729-3363

A good newspaper is never good enough,
	but a lousy newspaper is a joy forever.		-- Garrison Keillor