[comp.unix.microport] NMI in System Mode on System V/386

dfriedlander@cdp.UUCP (06/29/89)

It's true, diagnostics don't seem to pick up all memory problems.  
My AST386C ran fine under DOS (5MB memory.)  Indeed, not only did
diagnostics show no problem, but I was able to fill up all the memory
using Lotus and EMM, and nothing showed.  (Incidentally, the Lotus
and EMS test is generally better than the provided diagnostics programs).

Nonetheless, UNIX hung the system periodically.  When the memory was
swapped for AST memory, the problem went away.

I wonder if anyone understands why this sort of weirdness happens.

David Friedlander
Io Consulting Inc.

steveb@cs.utexas.edu (Steve Benz) (06/30/89)

In article <143800004@cdp> dfriedlander@cdp.UUCP writes:
> [ NMI in System Mode == memory problem ]
>It's true, diagnostics don't seem to pick up all memory problems.  
...
>
>I wonder if anyone understands why this sort of weirdness happens.

I have some experience with this one on the 286, it seems that
some memory expansion cards -- particularly those which haven't
got dip-switches or circuitry to determine the upper-bound of
memory -- don't work, and display exactly the symptoms that have
been described.

My theory as to why this problem occurs in Unix, but not in DOS
or in the diagnostics is that those two never try and do DMA to the
upper reaches of memory.

				- Steve

marc@noe.UUCP (Marc de Groot) (07/03/89)

In article <143800004@cdp> dfriedlander@cdp.UUCP writes:
>It's true, diagnostics don't seem to pick up all memory problems.  
>My AST386C ran fine under DOS (5MB memory.)  Indeed, not only did
>diagnostics show no problem, but I was able to fill up all the memory
> [ ... ]
>I wonder if anyone understands why this sort of weirdness happens.

Memory tests which REALLY test memory are not as easy to write as they
might appear.  There are quite a variety of problems which cause memory
to fail.

Any memory error where a bit a just STUCK was detected when your machine
was burned in.  In other words, the manufacturer got all the EASY problems.

Your memory probably works 99.9% of the time -- it's a very occasional
bad write of a bit that is causing the failure.

The memory test that the PC does at power-on is hopelessly ineffective
at catching marginal memory errors simply because it does not test the
memory long enough, and because it does not vary the patterns that it
writes.

Back in prehistoric times, when I had a S-100 bus machine with a Z80,
we had the "Rasmussen memory test" which had a good-sized repertoire
of different memory tests that it would run.  I would leave it running
for HOURS to catch marginal memory.  I haven't seen anything like this
for PC's.

-- 
Marc de Groot (KG6KF)                   These ARE my employer's opinions!
Noe Systems, San Francisco
UUCP: uunet!hoptoad!noe!marc
Internet: mdg@postgres.berkeley.edu -or- marc@kg6kf.AMPR.ORG