[comp.binaries.ibm.pc.d] Brown Bag Memory Test--worthless?

tes@whutt.ATT.COM (STERKEL) (11/16/88)

<*>
I see that Brown Bag's Shareware Ramtest 3.0 is being
distributed.  In response to an urgent request to the
network last month, I got two copies.  (BTW, thanks!)

I threw away one after comp-ing and diff-ing to verify
that they were the same, and ran the test on memory
for 1 iteration.  No error.  This is impossible, as I
frequently get memory parity errors.  I then ran the
test 100 iterations--no error.  Finally, I ran the
test for *10,000* iterations.  (My wife thought I was
crazy--testing for over 12 hours for an error I can
force in less than 2 hours of normal use).  Again,
RAMTEST FLUNKED; finding NO-ERROR.

Does this program *really* test memory?

terry

dhesi@bsu-cs.UUCP (Rahul Dhesi) (11/17/88)

In article <4022@whutt.ATT.COM> tes@whutt.ATT.COM (STERKEL) writes:
>I ran the
>test for *10,000* iterations....
>Again,
>RAMTEST FLUNKED; finding NO-ERROR.

Dynamic RAMs of marginal quality can have defects that show up only
when a certain combination of bits is stored in adjacent cells.  To
test memory thoroughly one would have to store all possible bit
patterns.  This could take forever.  A second-best solution is to store
enough random patterns that there is a good (but not 100%) chance of
finding defects.  Perhaps in your case the program didn't test with
enough random patterns.  It might be repeating the same limited set of
patterns again and again instead of generating new ones.

I personally would be suspicious of a memory testing program that does
not document what bit patterns it uses.  Unfortunately, none of them
do.

Another possibility is that you had to disable parity checking to run
the test successfully, and so did not detect an error in the parity
bit.  (Or the program somehow disabled parity checking.  It seems to
know how to do, since it lets you do it from the screen without having
to flip a switch on the memory board.)

Aside:  There is a rumor that the reason IBM created a BCD character set
with holes was because not all combinations of on and off states in the
old transistor-based logic circuits were equally error-free.
-- 
Rahul Dhesi         UUCP:  <backbones>!{iuvax,pur-ee}!bsu-cs!dhesi

tneff@dasys1.UUCP (Tom Neff) (11/18/88)

In article <4022@whutt.ATT.COM> tes@whutt.ATT.COM (STERKEL) 
wonders whether Brown Bag's Memory Test 3.0 isn't worthless,
since it found his RAMs error free after 10,000 iterations
even though he "frequently" gets parity errors in "2 hours"
of use or less.

I wonder if he is aware that there's a difference between
"Parity Error 1" and "Parity Error 2".  Error 1 pertains to
motherboard RAM and virtually always means a bad chip.  Error
2 signals bad parity over the XT/AT bus, and can emanate from
a faulty peripheral card (and often does) rather than from
expansion memory itself.  When I get "Parity Error 2" I usually
start pulling interface cards until I isolate the problem.
Memory tests like the aforementioned BagWare :-) won't detect
these conditions.
-- 
Tom Neff			UUCP: ...!cmcl2!phri!dasys1!tneff
	"None of your toys	CIS: 76556,2536	       MCI: TNEFF
	 will function..."	GEnie: TOMNEFF	       BIX: t.neff (no kidding)

bill@bilver.UUCP (Bill Vermillion) (11/19/88)

In article <4022@whutt.ATT.COM> tes@whutt.ATT.COM (STERKEL) writes:
><*>
>I see that Brown Bag's Shareware Ramtest 3.0 is being
>distributed.  In response to an urgent request to the
>network last month, I got two copies.  (BTW, thanks!)
..... deleted stuff ....
>
>Does this program *really* test memory?
 
I am looking for a good memory test routine also. In one of the groups I am in
one member has volunteered to write one.  If it gets done, it will be posted
here.

Re: memory testing. I got a call about a memory problem. Program dumped core. 
('286 running Xenix). Dump showed text portions jumbled.  Close look showed that
only every other character bad, and it was off by one bit.   L was M, etc.

Board in question had no schematics, but since it was a recent problem, I 
checked the last back of memory, and the 4 corner chips in that bank, since it
appeared that a low bit was stuck.  I found ONE 64k chip where a 256 should be.
System memory test and the board mfrs memory test both passed

Talking with a local repair tech he mentioned the memory test supplied for his
machines by the machine mfr would pass memory when one of the memory chips
legs was bent out from the socket.

It appears no one has a good test program.  Or at least the people I have
checked with, and there are some very knowledgeable, don't know of any either.

Years ago no one minded if it took 10 minutes to check memory in a 64k machine.
So if it takes 24 hours to thoroughly check a multi-meg machine, why not do it,
and do it right.

Can anyone recommend a reliable memory test.  (Talking '286 and '386 machines
here).

-- 
Bill Vermillion - UUCP: {uiucuxc,hoptoad,petsd}!peora!rtmvax!bilver!bill
                      : bill@bilver.UUCP

dold@mitisft.Convergent.COM (Clarence Dold) (11/22/88)

in article <301@bilver.UUCP>, bill@bilver.UUCP (Bill Vermillion) says:
> appeared that a low bit was stuck. I found ONE 64k chip where a 256 should be.
> System memory test and the board mfrs memory test both passed
> 
> Talking with a local repair tech he mentioned the memory test supplied for his
> machines by the machine mfr would pass memory when one of the memory chips
> legs was bent out from the socket.
> 
A memory test that allows a 64k chip to pass, in lieu of a 256k chip, is 
missing the same test that the 'bent leg' diagnostic is missing. 
The 64k chip looks like a 256k with one pin missing.
If a chip has a floating address bit, it will respond to both the proper
address, and an address at the inverse of the floating bit.  
The important point is that it succeeds in read/write at both addresses.  It 
doesn't fail either of them.  Some other chip probably is also reading and 
writing the same data, but it works.
A proper memory test will write a unique pattern to each location in RAM,
and then read those locations.  If two locations have the same data, then 
somebody isn't addressing properly!  Of course if you have >64k you can't
uniquely test each word, so patterns of floating bit possibilities are 
tested.

On a separate thought:
A Memory diagnostic that checks refresh will write a pattern to all of 
memory, and then wait some arbitrary length of time (14 seconds), before
reading.  It then waits another 14 seconds, and reads again.
If refresh isn't working, or if a read causes a bit toggle due to bad
parity circuits, this will catch it.

A memory test that only takes 5 seconds to run isn't testing a whole lot.

-- 
---
Clarence A Dold - cdold@starfish.Convergent.COM		(408) 435-5274
		...pyramid!ctnews!mitisft!professo!dold
		P.O.Box 6685, San Jose, CA 95150-6685

davidsen@steinmetz.ge.com (William E. Davidsen Jr) (11/24/88)

In article <512@mitisft.Convergent.COM> dold@mitisft.Convergent.COM (Clarence Dold) writes:

| On a separate thought:
| A Memory diagnostic that checks refresh will write a pattern to all of 
| memory, and then wait some arbitrary length of time (14 seconds), before
| reading.  It then waits another 14 seconds, and reads again.
| If refresh isn't working, or if a read causes a bit toggle due to bad
| parity circuits, this will catch it.
| 
| A memory test that only takes 5 seconds to run isn't testing a whole lot.

  The refresh rate on a PC or AT is settable from software. There are
programs which allow you to play with the value. Making it longer gives
fewer wait states (up to 3% CPU speedup!) and making it shorter may
allow you to run bad memory chips without getting parity errors.

  If I were writing another memory test program, I'd definitely check
the margins on the refresh, but I'm not sure that my program would stay
valid while running, so it would report refresh errors by crashing the
system.

-- 
	bill davidsen		(wedu@ge-crd.arpa)
  {uunet | philabs}!steinmetz!crdos1!davidsen
"Stupidity, like virtue, is its own reward" -me