[comp.arch] Testability Features

lethin@athena.mit.edu (Richard A Lethin) (12/15/88)

I'm curious about how people designing RISC chips and microprocessors
test them.  We're designing a small VLSI processor here and would like
to do it right.

There seem to be lots of great algorithms for strictly combinational
circuits, but when state gets added, the answer seems to be "make
every register shiftable" to allow the circuit to be analyzed 
as a combinational circuit.

A RISC chip, with a bunch of on-chip registers, a TLB, pipeline
registers, a limited number of IO pins, and irregular logic would seem
to be a testability nightmare.  But gunking up a pipeline register to
make it shiftable seems drastic.  The state's not regular enough, like
a DRAM, to just run patterns, so, how is it done?

How about testing the small, on-chip data cache?  That certainly
isn't going to be made shiftable...

What specific testability features are added to the chip? LSSD?
OCMS? Special opcodes?  RESET? Special test pins or pads?

Are test vectors generated by hand?  If so, how long does that take?
And who does it?  If not, how are they generated?

How much coverage do you get?  Is the coverage satisfactory?

How long does it take to run the vectors on the chip?  What hardware
do you use?

-- Rich

mark@mips.COM (Mark G. Johnson) (12/16/88)

In article <8453@bloom-beacon.MIT.EDU>,
    lethin@wheaties.ai.mit.edu (Richard A Lethin) writes:
$ A RISC chip, with a bunch of on-chip registers, a TLB, pipeline
$ registers, a limited number of IO pins, and irregular logic would seem
$ to be a testability nightmare.  But gunking up a pipeline register to
$ make it shiftable seems drastic.  The state's not regular enough, like
$ a DRAM, to just run patterns, so, how is it done?
$ 
$ What specific testability features are added to the chip? LSSD?
$ OCMS? Special opcodes?  RESET? Special test pins or pads?

Several of the RISC chips under construction in bipolar ECL technology
are using LSSD-like scan paths.

And people have this belief, founded or unfounded, that however difficult
it is to test a RISC chip, it's *more* difficult to test a CISC chip
having an equal number of circuits.  You know, simplicity breeds
observability. :-)
-- 
 -- Mark Johnson	
 	MIPS Computer Systems, 930 E. Arques, Sunnyvale, CA 94086
	...!decwrl!mips!mark	(408) 991-0208

aglew@mcdurb.Urbana.Gould.COM (12/17/88)

..> Richard A. Lethin of MIT asks about testing of VLSI RISCs
..> and Mark Johnson of MIPs makes some comments.

Talking about testing, I finally think that I have figured out
one of the things that has been bothering me about hardware
testability methodology.
    Most VLSI test methodology seems oriented towards detecting
*implementation* or *fabrication* errors, not *design* errors.
Ie. the tests look for bad transistors, or mis-wirings; they
don't look for adherence to higher level specs.
    When people talk about test coverage, they mean test coverage
over a limited space of implementation and fabrication errors,
not over the much larger space of design errors.

seeger@beach.cis.ufl.edu (F. L. Charles Seeger III) (12/17/88)

In article <28200252@mcdurb> aglew@mcdurb.Urbana.Gould.COM writes:
|
|..> Richard A. Lethin of MIT asks about testing of VLSI RISCs
|..> and Mark Johnson of MIPs makes some comments.
|
|Talking about testing, I finally think that I have figured out
|one of the things that has been bothering me about hardware
|testability methodology.
|    Most VLSI test methodology seems oriented towards detecting
|*implementation* or *fabrication* errors, not *design* errors.
|Ie. the tests look for bad transistors, or mis-wirings; they
|don't look for adherence to higher level specs.
|    When people talk about test coverage, they mean test coverage
|over a limited space of implementation and fabrication errors,
|not over the much larger space of design errors.

Why does that bother you?  Design errors are supposed to be caught
during the design phase, where the design can be simulated with all
nodes equally observable.  A physical test is only concerned with
testing that individual part, not the design of the part.  Very
few nodes of the physical part are directly observable, which makes
this kind of test most unattractive for testing/debugging the design.

Test coverage is usually computed based on the percentage of nodes where
Stuck-At faults can be detected.  Of course, these are only the grossest
faults, and there is a much larger space of more subtle physical faults.
However, more extensive testing of parts is extremely expensive.  This
sort of thing is done when new processing technology is developed and
is to some extent done during development of new parts, but production
testing and characterization of parts must be affordable and fast.

Apologies in advance, if I have misunderstood your posting.


--
  Charles Seeger            216 Larsen Hall
  Electrical Engineering    University of Florida
  seeger@iec.ufl.edu        Gainesville, FL 32611

lethin@athena.mit.edu (Richard A Lethin) (12/17/88)

In article <28200252@mcdurb> aglew@mcdurb.Urbana.Gould.COM writes:
>
>..> Richard A. Lethin of MIT asks about testing of VLSI RISCs
>..> and Mark Johnson of MIPs makes some comments.
>
>Talking about testing, I finally think that I have figured out
>one of the things that has been bothering me about hardware
>testability methodology.
>    Most VLSI test methodology seems oriented towards detecting
>*implementation* or *fabrication* errors, not *design* errors.
>Ie. the tests look for bad transistors, or mis-wirings; they
>don't look for adherence to higher level specs.
>    When people talk about test coverage, they mean test coverage
>over a limited space of implementation and fabrication errors,
>not over the much larger space of design errors.

Doesn't that depend on the context?  If you're designing a new computer,
you've got at least three sets of tests to develop:

 "Manufacturing Tests"
 "Validation Tests"
 "Diagnostic Tests"

with different time and coverage constraints.

Most VLSI test methodologies seem geared toward getting the most
coverage with the smallest set of test vectors, primarily because VLSI
test equipment is very expensive -- they're manufacturing tests.

Diagnostics and validation suites are run in a context where time is
free, but coverage has to be TOTAL.  Diagnostics have the additional
constraint that they ought to be able to identify the faulty
component.  Are there some principles people follow when designing
diagnostics and validation tests, or is everything bound to be ad-hoc?

And how do people go about finding design errors anyway?

Would any real-life diagnostic engineers care to comment?

aglew@mcdurb.Urbana.Gould.COM (12/18/88)

>/* Written  6:39 pm  Dec 16, 1988 by seeger@beach.cis.ufl.edu in mcdurb:comp.arch */
>In article <28200252@mcdurb> aglew@mcdurb.Urbana.Gould.COM writes:
>|
>|..> Richard A. Lethin of MIT asks about testing of VLSI RISCs
>|..> and Mark Johnson of MIPs makes some comments.
>|
>|Talking about testing, I finally think that I have figured out
>|one of the things that has been bothering me about hardware
>|testability methodology.
>|    Most VLSI test methodology seems oriented towards detecting
>|*implementation* or *fabrication* errors, not *design* errors.
>|Ie. the tests look for bad transistors, or mis-wirings; they
>|don't look for adherence to higher level specs.
>|    When people talk about test coverage, they mean test coverage
>|over a limited space of implementation and fabrication errors,
>|not over the much larger space of design errors.
>
>Why does that bother you?  Design errors are supposed to be caught
>during the design phase, where the design can be simulated with all
>nodes equally observable.  A physical test is only concerned with
>testing that individual part, not the design of the part.  Very
>few nodes of the physical part are directly observable, which makes
>this kind of test most unattractive for testing/debugging the design.
>
> ...
>
>Apologies in advance, if I have misunderstood your posting.
>
>  Charles Seeger            216 Larsen Hall
>  Electrical Engineering    University of Florida
>  seeger@iec.ufl.edu        Gainesville, FL 32611

No need to apologize - you have understood the essence of my posting.

Why does a testing methodology that tests for implementation errors rather
than design errors bother me?

First, because when I am designing a circuit I would like guidance as to what
test vectors to simulate to test design correctness. Usually, of course, I
have a separate software model to compare the simulation results to, but I
would still like a methodology to help me choose the conditions that I run
through both simulations.
    Testability tools that I know of take my circuit implementation, and give
me vectors that cover it. Usually, running these through the circuit simulator
and the higher level simulator catches trivial design errors well enough, but
I know of at least one situation where they did not.
    So, I am left with all the standards: test boundary conditions, a few
random patterns, etc., much like in software testing (see next point) and with
no idea of "sufficiency".
    I suppose what I really want is a test generator that can take a high level
spec and a low level spec, and generate tests that make you reasonably 
confident that the low level deesign actually implements the high level design.
Whereas the test tools that I know of take a low level schematic and
generate tests that make you reasonably certain that a physical implementation
actually implements that schematic.
    You could get picky, and say that the low level design is actually
an "implementation" of the high level design, but how does this help me gain
any more confidence in the correctness of my design?
    
Second, I get tired of hearing people say that VLSI hardware test methodology
is far more advanced than software test methodolgy. Software "implementation"
is what would be called "design" in a VLSI design system, so software testing
is mainly design testing, not implementation testing. Implementation testing 
would correspond to testing for correct generation of code by the compiler;
this needs to be done, but software testing goes much further than that.
VLSI design correctness testing seems to be at the same level (I would almost
say a little bit behind) software design testing methodology.
    I hope that I am terribly mistaken. Can anyone give me references for
VLSI testing that cover design correctness, not fabrication?


Andy "Krazy" Glew   aglew@urbana.mcd.mot.com   uunet!uiucdcs!mcdurb!aglew
   Motorola Microcomputer Division, Champaign-Urbana Design Center
	   1101 E. University, Urbana, Illinois 61801, USA.
   
My opinions are my own, and are not the opinions of my employer, or
any other organisation. I indicate my company only so that the reader
may account for any possible bias I may have towards our products.

friedl@vsi.COM (Stephen J. Friedl) (12/18/88)

In article <28200252@mcdurb>, aglew@mcdurb.Urbana.Gould.COM writes:
> Most VLSI test methodology seems oriented towards detecting
> *implementation* or *fabrication* errors, not *design* errors.

Design errors are meant to be caught by customers :-)

     Steve

-- 
Stephen J. Friedl        3B2-kind-of-guy            friedl@vsi.com
V-Systems, Inc.                                 attmail!vsi!friedl
Santa Ana, CA  USA       +1 714 545 6442    {backbones}!vsi!friedl
Nancy Reagan on my new '89 Mustang GT Convertible: "Just say WOW!"

kenm@sci.UUCP (Ken McElvain) (12/18/88)

In article <8453@bloom-beacon.MIT.EDU>, lethin@athena.mit.edu (Richard A Lethin) writes:
> 
> I'm curious about how people designing RISC chips and microprocessors
> test them.  We're designing a small VLSI processor here and would like
> to do it right.
> 
> There seem to be lots of great algorithms for strictly combinational
> circuits, but when state gets added, the answer seems to be "make
> every register shiftable" to allow the circuit to be analyzed 
> as a combinational circuit.

There are a few available programs that can cope with sequential circuit
test generation.   We supply one, and to be fair, Zycad and 
HHB(->Cadnetix->Daisy???)  also have products for sequential test
generation.  Research in this area is much lighter than in
combinational test generation, but the approaches are much more
varied.  Differences in capablity of available programs lie in
the size of the circuits handled, effectiveness in dealing with
sequential behaviour, available modeling primitives (usually gates
and latches, we support tristate and precharge bus drivers and
RAM and ROM primitives.).  I like to believe that mine is the best.

> 
> A RISC chip, with a bunch of on-chip registers, a TLB, pipeline
> registers, a limited number of IO pins, and irregular logic would seem
> to be a testability nightmare.  But gunking up a pipeline register to
> make it shiftable seems drastic.  The state's not regular enough, like
> a DRAM, to just run patterns, so, how is it done?

The way we do it is to turn the test generator loose on the chip
overnight with no design for test.  In the morning we look at what
parts of the chip weren't tested.  We then determine via a combination
of coverage percentages, testablility analysis, and brain power,
what the problem was and then we set up a control file that
tells the test generation program which internal nodes to treat
as external inputs and outputs.  This is far less overhead than
making every latch scanable.  You can usually keep the intrusion
off of the critical path as well.

> 
> How about testing the small, on-chip data cache?  That certainly
> isn't going to be made shiftable...

You should be able to isolate it from the rest of the chip with
a boundry scan path .

> 
> What specific testability features are added to the chip? LSSD?
> OCMS? Special opcodes?  RESET? Special test pins or pads?
> 
> Are test vectors generated by hand?  If so, how long does that take?
> And who does it?  If not, how are they generated?

Generation of test vectors by hand has been known to take longer
than the design.  Especially if you want high test coverage.
Fault simulation time can eat you alive.

> 
> How much coverage do you get?  Is the coverage satisfactory?

One thing to remember is that 99% fault coverage (ignoring the question
of fault models) does not mean that 99% of the chips that pass
your test are working parts.   Chip size and defect density also affect
the answer.  To see this, think of the remaining %1 of the faults
as an area A.  If the defect density D is such that A * D = .1
then 10% of the passing chips will have a fault in the untested
area.  Realisticly then, large chips need to have better fault
coverage than small ones to achieve the same defect rate in 
tested chips.

> 
> How long does it take to run the vectors on the chip?  What hardware
> do you use?
> 
> -- Rich

A couple of references if you are curious.

The basic idea in [1] is to analyize the circuit and derive a partial
state transition table.  [2] is the basis for both Zycad's and HHB's
products.  The idea is to start at the outputs and work back towards
the fault, maintaining a current and previous instance of the circuit.
EBT stands for Extended Back Trace.

[1] Hi-Keung Tony Ma, Srinivas Devadas, A. Richard Newton,
Alberto Sangiovanni-Vincentelli, "Test Generation for Sequential
Circuits," IEEE Trans. Computer-Aided Design, Oct 1988

[2] R. Marlett, "EBT: A comprehensive test generation technique for
highly sequential circuits," in Proc. 15th Design Automation Conference,
June 1978, pp 332-338

wad@houxv.UUCP (R.WADSACK) (12/20/88)

> you've got at least three sets of tests to develop:
> 
>  "Manufacturing Tests"
>  "Validation Tests"
>  "Diagnostic Tests"
> 
> with different time and coverage constraints.
> 
> And how do people go about finding design errors anyway?
> 
> Would any real-life diagnostic engineers care to comment?
> 
 
I wrote up my experiences with the AT&T WE 32100 CPU's in an
article in the August 1984 issue of "IEEE Design and Test of Computers"
(pp. 66 - 75).
 
It's titled "Design Verification and Testing of the WE 32100 CPUs".
It treates the differences between DV tests, silicon tests, and
diagnostic tests.
 
I have also done equivalent work on the Bell Labs Crisp "RISC"
CPU.  The concepts and approaches are much the same regardless
of whether the chip at hand is CISC or RISC.
 
		Ronald L. Wadsack
		AT&T Bell Labs
		Holmdel, NJ

fwb@demon.siemens.com (Frederic W. Brehm) (12/20/88)

In article <28200252@mcdurb> aglew@mcdurb.Urbana.Gould.COM writes:
>Talking about testing, I finally think that I have figured out
>one of the things that has been bothering me about hardware
>testability methodology.
>...
>    When people talk about test coverage, they mean test coverage
>over a limited space of implementation and fabrication errors,
>not over the much larger space of design errors.

Hmmmm.  Are you suggesting that each and every part out of the fab be
tested for DESIGN errors?  I hope not.  Tests for design errors should be
done before the design is committed to high volume production.

Design tests are usually done by running simulation software (e.g. Spice)
and testing the first samples off the production line in high-end testers
and prototype circuits.

Fred
----------------------------------------------------------------------------
Frederic W. Brehm		phone:	    (609)-734-3336
Siemens Corporate Research	FAX:	    (609)-734-6565
755 College Road East		uucp:	    princeton!siemens!demon!fwb
Princeton, NJ  08540		internet:   fwb@demon.siemens.com
    "From there to here, from here to there, funny things are everywhere."
						- Dr. Seuss