[comp.arch] Success of IEEE Floating Point Standard

sjl@amdahl.UUCP (Steve Langdon) (01/23/87)

In article <11939@sun.uucp> guy%gorodish@Sun.COM (Guy Harris) provides (as
usual) sensible comments on this topic.  However, he did not cover all of
the reasons for opposition to the IEEE Floating Point Standard.  As an
active participant in ANSI and ISO I clearly believe that standards are
valuable.  However, those opposed to IEEE 754 had some legitimate arguments.

My only source of information comes from conversations with other members
of the system architecture group I worked for at Control Data.  My responsibility
at Control Data, and now at Amdahl, is network architecture.  However, others
in the group had to worry about *very* fast floating point.  While they saw
the appeal of many features of the IEEE standard, it is very difficult to
support in a high end machine.  I am not qualified to provide a detailed
explanation, but the major problems appear to be in the various rounding modes,
and gradual underflow.

IEEE floating point has certainly captured the low end, it will be interesting
to see what happens at the high end.

-- 
Steve Langdon  ...!{decwrl,sun,hplabs,ihnp4,cbosgd}!amdahl!sjl  +1 408 746 6970

[I speak for myself not others.]

root@ucsfcca.UUCP (01/26/87)

In article <5307@amdahl.UUCP>, sjl@amdahl.UUCP (Steve Langdon) writes:
> 
> However, others
> in the group had to worry about *very* fast floating point.  While they saw
> the appeal of many features of the IEEE standard, it is very difficult to
> support in a high end machine.

Scenario: Someone (mis)designs a jetliner using `very fast' floating
point because it does not correctly handle an arithmetic condition
which would be correctly handled by an IEEE conforming system.
The liner crashes because of the defect and 300 people are killed.

Consequences: The computer company is bankrupted by the judgments
against it and the engineers who ignored existing standards of
professional practice never work again (even after they get out
of jail).

Thos Sumner       (thos@cca.ucsf.edu)
(The I.G.)        (...ucbvax!ucsfcgl!cca.ucsf!thos)

When ideas fail, words come in very handy. -- Goethe (via L. Creighton)
When ideas fail, computations come in very handy. -- Sumner (via USENET)

#include <disclaimer.std>

sjl@amdahl.UUCP (02/07/87)

In article <825@cca.ucsf.edu> root@cca.ucsf.edu (Computer Center)
[actually (thos@cca.ucsf.edu) (Thos Sumner)] replies to my posting which
indicated that my former colleagues at Control Data believed IEEE 754
was difficult to implement in a high end machine.

Thos speculated that design flaws caused by incorrect results from a CAD
program resulted in a catastrophic accident.  The CAD program relied on
the properties of IEEE 754, but produced incorrect results because it
was run on a computer with a different floating point format.  The
computer manufacturer and designers were held responsible and punished.

This is a nice piece of melodrama, but it is about as sensible as the
construction company in Florida who sued Lotus because 1-2-3 failed to
guess which numbers to add in a spreadsheet.  I will leave it to readers
to find the holes in the scenario.  The interesting issue behind the
fiction is that in the past, bad numerical techniques made the results
of many important programs suspect at best.  I can remember a number of
horror stories about nuclear reactor design codes that gave
significantly different answers when moved from a 6600 to a 7600.  My
memory tells me that the least significant bit (these machines used a 60
bit word) of a result was sometimes different on the 7600 from the same
operation on a 6600.

I am unqualified to judge the merits of IEEE 754, but the skill of the
designers makes me confident that it is the best format available.  A
good floating point format will not, however, compensate for badly
chosen algorithms.  IEEE 754 cannot prevent bad programs from giving
the wrong answer, although it will help an informed user to detect many
problems.

I received a couple of good replies to my posting by mail which made me
conclude that I had not clearly explained "*very* fast floating point".
This phrase was intended to imply supercomputer performance.  One reply
mentioned that ELXSI offers IEEE 754 compatibility, but the per
processor performance of the ELXSI machine is well below supercomputer
speeds.  I hesitate to attempt to define a supercomputer, but my
comments were based on the design of machines with 80+ MFLOP peak
rates.

I do not need to be convinced that standards are valuable - the time I
spend at ANSI and ISO meetings is based on this belief.  Standards are
not, however, without their limitations.  One of the important clauses
in an ISO standard is the "Scope and Field of Application".  My comments
on IEEE 754 were intended to remind readers that failure to conform
might be based on technical reasons, rather than a desire to deny users
the advantages offered by the standard.
-- 
Steve Langdon  ...!{decwrl,sun,hplabs,ihnp4,cbosgd}!amdahl!sjl  +1 408 746 6970

[I speak for myself not others.]

oster@lapis.berkeley.edu.UUCP (02/07/87)

In article <5592@amdahl.UUCP> sjl@amdahl.UUCP (Steve Langdon) writes:
>[actually (thos@cca.ucsf.edu) (Thos Sumner)] replies to my posting which
>indicated that my former colleagues at Control Data believed IEEE 754
>was difficult to implement in a high end machine.
>
>Thos speculated that design flaws caused by incorrect results from a CAD
>program resulted in a catastrophic accident.  The CAD program relied on
>the properties of IEEE 754, but produced incorrect results because it
>was run on a computer with a different floating point format.  The
>computer manufacturer and designers were held responsible and punished.
>
>This is a nice piece of melodrama, but it is about as sensible as the
>construction company in Florida who sued Lotus because 1-2-3 failed to
>guess which numbers to add in a spreadsheet.  I will leave it to readers
>to find the holes in the scenario.

Whether the engineers responsible go to jail or not, whether building a system
that gives wrong answers is criminal or not, it is WRONG to build a
system that gives wrong answers. It is just plain morally wrong. They are
making a marketing decision that means that innocent people will die as a
direct result of their actions.

A system that gets the wrong answer fast is not what I need. In fact, it
has negative worth, because it lulls me into believing I know something
that in fact I do not know.

The Lotus 123 case involved an architect who told his spreadsheet to sum
only rows 1-10 and then tried to sue the manufacturer because the spead
sheet didn't ignore his express orders and sum rows 1-11 (what he meant,
not what he said.) A very different issue.

Now, I have no objection to building a _component_ that produces
in-accurate answers, and I would not force people to throw away their
rulers and use only verier calipers. Sometimes you don't need a lot of
accuracy. But if you are going to use that component in a system, you as
the builder of the system, have a duty to see that the system doesn't
pretend to more accuracy than it has. You must be aware of the error
propagation behavior of all your algorithms, both on average and on
extreme data, and you must put in checks, and compensate. Maybe not in a
video game, but in a serious tool, like a flight simulator, on which
peoples lives may well depend (Pilots expect the real place to
behave like the simulator after all.)

The trouble is, many people who make decisions about purchasing
machines actually compare machines based on MFLOPS. If one is running IEEE
standard and the other is not, then it is not a fair comparison, if only
in the human cost to write the software to compensate for the inaccuracy
of the non-IEEE flavor. Like the word "mayonaise", "floating point"
should be a term reserved for things that meet the standard. If your
machine does something else, you should be required to call it "imitation
floating point" in your marketing brochures.

--- David Phillip Oster		-- "The goal of Computer Science is to
Arpa: oster@lapis.berkeley.edu  -- build something that will last at
Uucp: ucbvax!ucblapis!oster     -- least until we've finished building it."

rcd@ico.UUCP (02/09/87)

The discussion revolves around the difficulty of supporting IEEE 754 in a
very fast machine...an objection:

> Scenario: Someone (mis)designs a jetliner using `very fast' floating
> point because it does not correctly handle an arithmetic condition
> which would be correctly handled by an IEEE conforming system.
> The liner crashes because of the defect and 300 people are killed.

Possible?  Sure, but the scenario requires several assumptions which aren't
necessarily so in the real world:
	- The `very fast' non-754 FP has to be misdesigned in some sig-
	  nificant way.  (This is pretty likely if the "fast" machines I've
	  seen in the past are any indication!:-)
	- The programmer has to be unaware of the fact that he's failing
	  to catch a potential error.
	- The condition has to be one which is caught by a 754-conforming
	  system.  The detection of the condition must either always occur
	  or be pretty difficult to disable.
What I'm really saying is that 754 isn't perfect, let alone foolproof (!)
and non-754 is not inherently flawed.

Slight digression:

I spoke to someone who's taking a numerical analysis course, doing some of
the standard see-how-the-errors-work problems.  One of the exercises
involves determining "machine epsilon".  Running this on a machine with a
processor like a 287, which as far as I know is supposed to meet the 754
standard, the answer you get depends on how the compiler works, since the
FPU has 80-bit internal regs but values are stored in memory in the 64-bit
format.  (It's not a full 16 bits more precision since the exponents are
different for the two formats, but it's certainly enough to notice.)

Mostly I avoid numerical work, but I've been told that there are various
commonly-used programs which determine machine precision and then use it to
control other aspects of the program.  Suppose that these programs are run
on a machine with 754 FP which manages to do the epsilon computation (which
is pretty simple) in registers, but ends up dumping a lot of the real
computation results into memory, losing precision in the process.  I'm
guessing that if this caused the program to fail without notice and it
ended up causing a liability case, it would be about a four-way tossup as
to who was at fault.
-- 
Dick Dunn    {hao,nbires,cbosgd}!ico!rcd	(303)449-2870
   ...If you plant ice, you're gonna harvest wind.

turk@apple.UUCP (02/11/87)

In previous articles:
>> Scenario: Someone (mis)designs a jetliner using `very fast' floating
>> point because it does not correctly handle an arithmetic condition
>> which would be correctly handled by an IEEE conforming system.
>> The liner crashes because of the defect and 300 people are killed.

The blame should not be left on the floating-point implementation.
The engineer who did the analysis should also have done a numerical
analysis to determine correctness of his solution.

Not even the IEEE floating point will correctly compute the solution to
an ill-conditioned problem or algorithm.
-- 
Ken Turkowski @ Apple Computer, Inc., Cupertino, CA
UUCP: {sun,nsc}!apple!turk
CSNET: turk@Apple.CSNET
ARPA: turk%Apple@csnet-relay.ARPA