[comp.arch] unexpected CPU behavior

lindsay@MATHOM.GANDALF.CS.CMU.EDU (Donald Lindsay) (06/09/90)
In article <1028@s6.Morgan.COM> amull@Morgan.COM (Andrew P. Mullhaupt) writes:
>It looks like chip bugs are more economic in origin to me than 
>the result of ignorant engineering. Any comments?

Design bugs - in anything - are simply a function of complexity
versus tooling.  A bigger program - or bigger chip - done in the same
old way, turns into the kind of disaster that National had with the
'16, and HP had with the HP 3000 OS.

The same old chip, done with better tooling, is a joy.  Lately, gate
arrays often are right the first time.  This partly because the size
of the average gate array comes from the size of the old logic it
replaces, rather than from the size that the chip vendor is able to
make.

Bad practices can magnify the problem.  Read "The Soul of a New
Machine", and note where they started each design stage before the
previous design stage was signed off.  This is known sourly in many
companies as "no time to do it right, but always time to fix it
later".

If one studies the bugs found at various stages, one can make
statistical predictions about how many are left.  One software
product was consistently shipped with 89% of its bugs found and
fixed.  Suppose that testing has found N bugs in the upcoming
release:  what are the chances of a bug-free ship?  If N is 5,
they're good.  If N is 50, they aren't.  What was I to do when it
turned out to be 10?

I have data from 1986 for the Nautilus CPU found in the VAX 8700/8800: 

bugs found by:
simulation		594	(peak of 50 per month)
reviews			233
timing verifier		 83
on the hardware		 46

Number of manufacturing revisions, per Nautilus gate array design:

48	never changed
 6	1 change
 3	2 changes
 2	3 changes

The 46 fixes done to hardware break down as:
PLA	10
wire	18
arrays	18 (see above)

So, what are the chances that it was shipped bug free?  Does anyone
out there know the subsequent history?
-- 
Don		D.C.Lindsay 	leaving CMU .. make me an offer!