[comp.sys.ibm.pc] 80386 versus 80387

felsenst@uw-entropy.ms.washington.edu (Joe Felsenstein) (05/12/89)

I have heard that there is a sporadic nasty interaction between 80386's
and 80387 numeric co-processors, owing to a design flaw in the 80386,
and that it is not cleared up yet.

On a Zenith Z-386 I may be having this problem; when the 80387 is used
there are sporadic unpredictable crashes or operating system
paralysis (I happen to be using Unix, but if this is the problem that
may be irrelevant).

Question: are there 80287's fast enough to work with a 16 MHz 80386
(I think the answer is yes)?  If so, would they suffer the same
nasty interaction (i.e. can I cure this by replacing the 80386 by an
80387, or is that just jumping from the frying pan into the fire)?
--------
Joe Felsenstein, Dept. of Genetics SK-50, Univ. of Washington, Seattle WA 98195
 BITNET:    FELSENST@UWALOCKE
 INTERNET:  uw-evolution!joe@entropy.ms.washington.edu
 UUCP:      ... uw-beaver!uw-entropy!uw-evolution!joe 

caromero@phoenix.Princeton.EDU (C. Antonio Romero) (05/12/89)

In article <1427@uw-entropy.ms.washington.edu> joe%uw-evolution@entropy.ms.washington.edu writes:
>I have heard that there is a sporadic nasty interaction between 80386's
>and 80387 numeric co-processors, owing to a design flaw in the 80386,
>and that it is not cleared up yet....
> (I happen to be using Unix, but if this is the problem that
>may be irrelevant).

I'm not 100% sure that the problem you're having is the one I'm thinking
of, but as I recall there was a floating-point problem that plagued
early 386 machines, which did show up when using unix.  I believe
operating system paralysis under Unix as you describe was the symptom.

To my knowledge this has been corrected as of a couple of years ago.
If you've had your machine that long, you may just need a new 386
chip.  The new chip has a double-sigma stenciled on it somewhere; the
old chip has, I think, just one sigma, or no markings at all.

(I had to check my father's machine out under Xenix to make sure it had
the newer chip-- he got one of the relatively early Compaq 386's,
before 20MHz etc. was ready.  It did have a new chip, so I never saw the
problem actually occur; but the symptoms as described to me sound like
what you're running into.)

Who knows?  Maybe you can badger Zenith into sending you a new 386 chip,
if you're nasty-- err, insistent enough about it... ;)

Also, a 287 won't just drop in unless the Zenith is designed to handle
one... (having never popped the top on a Zenith I don't know if it is).

>Question: are there 80287's fast enough to work with a 16 MHz 80386
>(I think the answer is yes)?

Many machines early on (and even now, I think) can accomodate a 287.
I believe the faster 287 part exists, but I don't think using it will
solve your problem.

-Antonio Romero      romero@confidence.princeton.edu

walter@cat27.CS.WISC.EDU (Walter Stewart) (05/13/89)

    I've been using a 80287 on a Z-386 and I have never exprienced
the hardware troubles that you have described.  I'm using the 80287
because my pascal compiler doen't support the 80387.  Will Turbo 
Pascal 5.0 support the 80287?

mcdonald@uxe.cso.uiuc.edu (05/14/89)

>I have heard that there is a sporadic nasty interaction between 80386's
>and 80387 numeric co-processors, owing to a design flaw in the 80386,
>and that it is not cleared up yet.

This appears to be the straight poop on this - I got a FAX of a
genuine Intel erratum sheet from the kind folks at Phar Lap or
MicroWay, I forget which.

There is a bug in the silicon of early 80386's that causes a total
machine hang at random times if the following things all are true:

1) You have a 80386 and 80387.

2) The 80386 is earlier than DX step (DX step 386's say "DX" right
   on top. The 386's with the "double sigma" ARE sick.

3) You run in either 386 native 32-bit mode or virtual 8086 mode.

4) Paging is enabled. Note that Windows 386 and Desqview 386 meet
   conditions 3 and 4, as do most programs especially written for native
   386 mode using a 386 runtime system on top of DOS.

5) Your motherboard is of an afflicted design. Certain boards seem
   to have timings that prevent the problem from occurring. IBM
   Model 80's are afflicted.

6) The phase of the moon is wrong. This is very important. It
   really doesn't happen often.

My Model 80 was indeed afflicted. A new motherboard with a DX chip
fixed the problem. IBM apparently didn't just want to replace the
chip itself. Microway and Phar Lap claim that a DX chip alone will
fix the problem, but I never tried this.

Note that the symptom of this bug is not wrong results, but rather
a totally hung machine.

Doug McDonald

afg@cbnewsl.ATT.COM (andrew.goldberg) (05/15/89)

In article <45900232@uxe.cso.uiuc.edu>, mcdonald@uxe.cso.uiuc.edu writes:
> 
> My Model 80 was indeed afflicted. A new motherboard with a DX chip
> fixed the problem. IBM apparently didn't just want to replace the
> chip itself. Microway and Phar Lap claim that a DX chip alone will
> fix the problem, but I never tried this.
> 

Who paid for the new motherboard - you or IBM?

Andy Goldberg

toma@tekgvs.LABS.TEK.COM (Tom Almy) (05/15/89)

In article <45900232@uxe.cso.uiuc.edu> mcdonald@uxe.cso.uiuc.edu writes:
>There is a bug in the silicon of early 80386's that causes a total
>machine hang at random times if the following things all are true:

[ Lots of conditions ]

>6) The phase of the moon is wrong. This is very important. It
>   really doesn't happen often.

Actually, it can happen *very* often.  The other failure requirement which
you did not mention was 7) executing a floating point instruction when an
interrupt occurs.  All it takes is a heavily floating point intensive program.
In my case, I have Spice compiled running under PharLap DOS/EXTENDER and
one particular circuit simulation will crash an afflicted computer *every
time*.  I use it to check computers!

>
>My Model 80 was indeed afflicted. A new motherboard with a DX chip
>fixed the problem. IBM apparently didn't just want to replace the
>chip itself. Microway and Phar Lap claim that a DX chip alone will
>fix the problem, but I never tried this.

Also, ironically, seemingly all Intel motherboards (301, 301ATZ, and 302)
are afflicted.


Tom Almy
toma@tekgvs.labs.tek.com
Standard Disclaimers Apply

dts@cloud9.Stratus.COM (Daniel Senie) (05/18/89)

In article <1427@uw-entropy.ms.washington.edu>, felsenst@uw-entropy.ms.washington.edu (Joe Felsenstein) writes:
> 
> I have heard that there is a sporadic nasty interaction between 80386's
> and 80387 numeric co-processors, owing to a design flaw in the 80386,
> and that it is not cleared up yet.
> 
> On a Zenith Z-386 I may be having this problem; when the 80387 is used
> there are sporadic unpredictable crashes or operating system
> paralysis (I happen to be using Unix, but if this is the problem that
> may be irrelevant).

This is a well known problem which Intel referred to as Errata 21. Go
see your Zenith dealer. He will give you a new PAL for your CPU board
which completely remedies the situation. It is a bug in the 386, but
a very simple PAL change cures the problem. The DX step of the 386
has this problem fixed.

This problem only occurs when running in 32 bit protected mode. Effectively
the CPU and Co-Processor sit there waiting for each other forever.

-- 
Daniel Senie               UUCP: harvard!ulowell!cloud9!dts 
Stratus Computer, Inc.     ARPA: anvil!cloud9!dts@harvard.harvard.edu
55 Fairbanks Blvd.         CSRV: 74176,1347
Marlboro, MA 01752	   TEL.: 508 - 460 - 2686

waynec@hpnmdla.HP.COM (Wayne Cannon) (05/18/89)

There have been at least two floating point problems.  The
double-sigma fixed one of them (I believe it had a work-around
involving initializing some floating point operations, or
something).  The other is fixed by the DX step 80386 chips as
outlined very nicely in another reply.  It can happen very often
-- enough to make programs run successfully only one out of ten
attempts -- running UNIX with heavy floating point operations.

There are a couple of workarounds.  One involves bypassing your
80387 and using only software emulation [great, huh?!].  The
other is a board from Bell Technologies that goes between your
80386 and its socket.  The best, of course, is to get a chip with
the DX step.  The DX chips have been available for some time (at
least 6 months, maybe a year), but still very few are actually
showing up in packaged units.  I guess vendors are flushing their
inventory of the old chips, or maybe Intel is still shipping the
old chips.

schuster@dasys1.UUCP (Michael Schuster) (05/19/89)

In article <1427@uw-entropy.ms.washington.edu> joe%uw-evolution@entropy.ms.washington.edu writes:
>
>I have heard that there is a sporadic nasty interaction between 80386's
>and 80387 numeric co-processors, owing to a design flaw in the 80386,
>and that it is not cleared up yet.
>
>On a Zenith Z-386 I may be having this problem; when the 80387 is used
>there are sporadic unpredictable crashes or operating system
>paralysis (I happen to be using Unix, but if this is the problem that
>may be irrelevant).

Nope, it is not your imagination. It is a FATAL BUG in the early (pre
July, 1988) steppings of the 80386. The current "D" step (80386DX)
has licked the problem, as well as eliminated 80287 support.

The problem arises when the 80387 coproccessor is active, DMA is active,
and a page fault occurs. I believe this is due to a conflict between DMA
and the co-processor. Unix, QEMM-386 and other page-mode operating systems
for the 386 may bring this out. 

There are patches for the Unix software, but the hardware fix was put out
by Intel long ago. It involves adding a PAL to the motherboard which
settles the DMA/387 conflict by giving one of them priority.

Intel claims this was much publicised, and that any responsible board maker
will have incorporated a hardware fix. Interestingly, the Intel iSBC-386
and Inboard/386 boards seem to have the MOST PROBLEM with this bug :-)

I suggest that you contact Zenith; there may be a hardware upgrade for your
board. Also, I read somewhere that a company is producing a plug-through
daughterboard for 386 that contains the necessary PAL. 

As for whether the 80287 suffers form this bug as well ... dunno. Perhaps
someone on the net can help. There are 12 mHz 287 chips around these days.

-- 
l\  /l'   _  Mike Schuster          ...!dasys1!schuster
l \/ lll/(_  Big Electric Cat       schuster@dasys1.UUCP
l    lll\(_  New York, NY USA       DELPHI,GEnie:MSCHUSTER  CIS:70346,1745 

keithe@tekgvs.LABS.TEK.COM (Keith Ericson) (05/22/89)

In article <9712@dasys1.UUCP> schuster@dasys1.UUCP (Michael Schuster) writes:
<There are patches for the Unix software, but the hardware fix was put out
<by Intel long ago. It involves adding a PAL to the motherboard which
<settles the DMA/387 conflict by giving one of them priority.
<
<Intel claims this was much publicised, and that any responsible board maker
						 ^^^^^^^^^^^^^^^^^^^^^^^^^^^
<will have incorporated a hardware fix. Interestingly, the Intel iSBC-386
 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
<and Inboard/386 boards seem to have the MOST PROBLEM with this bug :-)
<

Our recently-received INTEL 302 box (386/25MHz), which had been "returned
for re-grooving" (i.e., it's a replacement for an early, early version) was
received with a non-D-step part in it.  Crashamundo!

The Everex STEP/25's we've received DO have D-step parts.

kEITHe