[comp.arch] Compaq finds problem with chip

rcd@ico.isc.com (Dick Dunn) (11/02/89)

mslater@cup.portal.com (Michael Z Slater) writes:
> The bug 
> Compaq referred to was in the FPU on the 486.  It occurs only under very
> specific and rare conditions...  An indication of
> how obscure the bus is is that the 486 passes all of Intel's test suites
> for the 387.

It might be obscure, but that's not a convincing argument.  How extensive
are the test suites for the 387? <<enter tired-aphorism mode>>  Testing can
only reveal the presence of bugs, not their absence.  <<exit tedium>>

A better measure of the problem is what it takes to avoid triggering it.
What's the cost in software to dodge the problem?  Actually, the 486 is
early enough in its lifetime that hopefully the bug can be banished by
replacing chips instead of hacking around it.

The multiply bug in the 386 is an instructive example here...it was a
relatively rare problem, and went a fair while without being known, yet the
effect was such that you couldn't reasonably use the bugged processors as
32-bit CPUs.
-- 
Dick Dunn     rcd@ico.isc.com    uucp: {ncar,nbires}!ico!rcd     (303)449-2870
   ...Worst-case analysis must never begin with "No one would ever want..."

mslater@cup.portal.com (Michael Z Slater) (11/04/89)

Dick Dunn writes:
>It might be obscure, but that's not a convincing argument.  How extensive
>are the test suites for the 387? <<enter tired-aphorism mode>>  Testing can
>only reveal the presence of bugs, not their absence.  <<exit tedium>>
>
>A better measure of the problem is what it takes to avoid triggering it.
>What's the cost in software to dodge the problem?  Actually, the 486 is
>early enough in its lifetime that hopefully the bug can be banished by
>replacing chips instead of hacking around it.

Hacking around it isn't an option, since the 486 is intended to be used
to run all those billions of bytes of 386/387 software.  The bugs wouldn't
be hard to work around if changing the compiler was an option, but it
isn't.

There are two separate problems:

1. If the FTAN or FSINCOS instructions are executed, in particular rounding
modes, and the result causes a carry out from the mantissa (which occurs if
the result is an integer power of two), AND another FP instruction occurs
within a certain number of clock cycles, AND that instruction is one of a
certain group, then the floating-point stack gets corrupted.

2. If an unmasked FP exception occurs, and then a WAIT or FWAIT instruction
is encountered that is followed by one of a certain group of instructions,
then the error is not properly handled.

The first problem will obviously never occur if the FTAN and FSINCOS
instructions are used, and even if they are, the data patterns that
are required are statistically unlikely -- though of course, it would be
easy to contrive a program that would generate the error repeatedly.

The second problem apparently isn't an issue for most DOS applications,
because they don't unmask FP exceptions.  I don't know if Unix apps do.

Michael Slater, Microprocessor Report    mslater@cup.portal.com
415/494-2677   fax: 415/494-3718
550 California Ave., Suite 320, Palo Alto, CA 94306