[comp.binaries.ibm.pc.d] memory test algorithms

knop@dutesta.UUCP (Peter Knoppers, Delft Univ. of Technology) (11/22/88)

>From: bill@bilver.UUCP (Bill Vermillion)
>Can anyone recommend a reliable memory test.  (Talking '286 and '386 
>machines here).
Yes, below I will give a summary of part of a tutorial on memory
testing presented by Prof. A.J. van de Goor at the Internation Test
Conference 1988. It is not very likely that you can obtain copies of
the documents used in that tutorial. We are in the process of 
converting the information to a book that will be obtainable trough
the normal channels. This will become during in 1989.

Faults in memory arrays come in several flavors:
SA: Stuck-At faults in the memory array. This means that one or more
    cells contain a constant value.
AF: Address decoder faults. This means that several addresses access
    the same memory cell, or some addresses access no cell at all.
TF: Transition faults. This means that a cell changes value when 
    another cell is written to.
CF: Coupling faults. This means that a cell shows a stuck-at fault 
    or transition fault only when a certain pattern is present in 
    neighboring cells.

Traditionally, memory test algorithms were ad hoc designs, no one knew
what fault coverage was obtained by them. 
To describe some memory tests I will present a language that was
invented when the so-called march tests were introduced.
A march test consists of a number of march elements. Each march
element tells whether the full address range is accessed from lowest
to highest address, of vice-versa. Furthermore the march element
tells what read and write actions must be performed at each address.

The simplest (ad-hoc) memory test is the Zero-One test. The march
notation for Zero-One is { up(w0); up(r0); up(w1); up(r1) }.

"up" is usually depicted by an up-arrow, but I don't have that on my
keyboard/display. The opposite of up is down, my keyboard has no key
for that either.
"w0" means to write 0 in the memory location.
"r0" means to read the memory location and assert that it contains 0.

This march can be translated to an equivalent piece of a C program:
	for(address = lowest; address <= highestl; address++)
		*address = 0;
	for(address = lowest; address <= highestl; address++)
		if(*address != 0)
			error(...);
	for(address = lowest; address <= highestl; address++)
		*address = 1;
	for(address = lowest; address <= highestl; address++)
		if(*address != 1)
			error(...);
The Zero-one test is extremely simple and it only detects SA faults
in the memory array. Every memory location is accessed 4 times,
therefore the time it takes to execute Zero-one is proportional to
4n (n is the number of addresses).
The Zero-one test is include here for its simplicity. It has very
bad fault coverage, for instance, it will not detect that a 64 kb
chip was inserted in a socket designed for a 256 kb chip. (Such faults 
are detected by memory tests that detect address decoder faults.)

A slightly better test is called MATS. The march description for
MATS is { up(w0); up(r0,w1); up(r1) }.

In addition to all SA faults, MATS detects all address decoder faults 
if an additional condition is met. Address decoder faults come in 
several types:
1: no cell is accessed
2: a different cell is accessed
3: several cells are accessed, the output value is the logical OR, or
   the logical AND of the contents of all selected cells. Which logical
   function is applicable depends on the technology of the memory 
   chips.
Cases 1 and 2 are always covered by MATS. Case 3 is only covered if
the resulting value is the logical OR of the contents of the memory
cells is output. If the logical AND is output, the march can be
replaced by { up(w1; up(r1,w0); up(r0) } to obtain complete coverage
of all address decoder faults. 
The example of the 64 kb chip in the 256 kb chip's socket would be
detected, as this is equivalent to a type 2 address decoder fault.

MATS+ is an improved version, that detects all address decoder faults,
independent of the way case 3 faults operate. It is described by the
march { up(w0); up(r0,w1); up(r1,w0) }.

MATS+ takes 5n accesses. The equivalent C code is:
	for(address = lowest; address <= highestl; address++)
		*address = 0;
	for(address = lowest; address <= highestl; address++)
	{
		if(*address != 0)
			error(...);
		*address = 1;
	}
	for(address = lowest; address <= highestl; address++)
	{
		if(*address != 1)
			error(...);
		*address = 0;
	}

There exists an even more improved version of MATS, called MATS++.
It is described as the march { up(w0); up(r0,w1,r1); down(r1,w0,r0) }.

MATS++ takes 7n accesses. In addition to detecting all SA faults and
all AF faults it detects all transition faults.

An algorithm that detects the simple coupling faults where only two
cells are involved is MARCH-B: { up(w0); up(r0,w1,r1,w0,r0,w1); 
up(r1,w0,w1); down(r1,w0,w1,w0); down(r0,w1,w0) }

This test takes 17n accesses. As you can see, several writes to the
same memory location are performed, without any read operation, or
accesses to other memory locations.

There are tests that can detect even more complex coupling faults, 
but these cannot be described as a march. These tests are of order
n*log(n) (or worse). I won't describe them here.

Although MARCH-B does not detect all complex coupling faults, it will
detect the vast majority of them. 

The algorithms described above apply to 1-bit wide memory. Computers
usually access memory in bytes, words, or longwords. For these cases
the same algorithms can be used, if you replace 1 in the C code by
-1 (which expands to a byte/word/longword with all bits set).

To detect shorts between the bits in a byte/word/longword you can
precede the test with a few writes and reads to one memory location
with a few different values. A good set of values to use for byte
oriented memories is:
binary    hex
11110000  F0
11001100  CC
10101010  AA
These patterns detect shorts between any two data lines.
For word or longword oriented memories you must double or quadruple
the patterns and add 1 or 2 patterns at the top of the list.
To detect all shorts to ground or vcc you must add a pattern
consisting of all 1's and a pattern consisting of all 0's.

This should be enough to get you started writing your own memory test
programs. Only this time you should KNOW the faults it detects. You
can even write a program that interprets marches in the notation used
in this article.

Although I tried to be very careful, it is possible that this article
contains errors. Use the algorithms in this article at your own risk.
-- 
 _____      __  __  __  __   ____   _____   _____   ______  _____    _____ 
|  _  \    |  |/  ||  \|  | /    \ |  _  \ |  _  \ |   ___||  _  \  /  ___|
|   __/ _  |     < |      ||  ||  ||   __/ |   __/ |   >__ |     <  \__  \
|__|   |_| |__|\__||__|\__| \____/ |__|    |__|    |______||__|\__||_____/
P. Knoppers, Delft Univ. of Technology, The Netherlands - knop@dutesta.UUCP