[unix-pc.general] Frames 80..92 missing

david@ms.uky.edu (David Herron -- One of the vertebrae) (04/08/88)

I'm getting this message sometimes when I boot my machine.  Probably
the machine is deciding that part of my memory is bad and not letting
it be used.

The message happens after the kernal has been read into memory from
the boot device when it's printing out the memory size, BETWEEN the
two memory sizes.  (i.e. after it says 3.5 megs of memory but before
it says 3.3 megs available).

The message doesn't always print -- I haven't verified it but I think
it won't print if I've done a cold boot and does print on warm boots.

What isn't clear is WHERE do frames 80..92 live?  I was reading through
the fancy hardware manual this morning to see if I could figure out
where the problem and came across the description of the memory mapping
scheme.  Among other things it tells me that the pages are 2K bytes
wide.  If a "frame" is == to "page" then frame 80 is at 0x50000 which
would place it somewhere on my motherboard.  Also, 10-15 missing pages
would be 20-30K of memory missing -- no big deal.  Especially if the
fix involves taking chips off the motherboard.

Am I correct on everything so far?  The tech manual doesn't describe
this message nor does it use "frame" in its terminology, so I'm
guessing here.

I've run memory diagnostics at various times over the last 4 months
trying to catch the error.  Finally this morning when I ran them it
caught an error but it's not real clear exactly where this error is
at.  I was down in the "s4test" mode and the error was at address
"0x5Axxx" (I forget the exact address, it's written down at home, the
important thing is that it's within the 80-92 range above).  BUT the
error was caught in the first section of the Random Write test to my
expansion memory.  Expansion memory is addressed from 0x200000 on up,
not in the 0x50000 range.  Yet the test showed an error at 0x5Axxx!

So what's going on?  Again the manual isn't explicit on the details of
every test and what would be going on.  I suppose whoever wrote that
software messed up a little (or was lazy) and didn't take into account
where the test was being made and merely printed the offset into the
area being tested.

The last feature of this is that my machine crashes somewhat frequently
with "panic: NMI (kernal parity error)".  I have a bunch of numbers
written down on various pieces of paper if anybody is interested.  I'll
be doing some testing tonight to see if the problem is repeatable.  One
common feature of these panics is that they happen in conjunction with
some activity on the modem.  Before upgrading to v3.51a it would panic
at the end of a uucico or ATE call.  That is, if it would panic at
all.  Now, the last two panics at least, it does so at the
beginning...  Oh, and my uudemon.hr script does the equivalent to:

	phtoggle
	uucico -r1
	phtoggle

Which means that there WILL be some activity with the modem at least
once an hour.  The only problem is that if I'm using the ATE, every
hour I get two windows pop up saying that there was a problem with
opening the modem and would I please hit ENTER :-).

My current theory is that some memory is bad and usually when the
machine boots the kernal detects the bad memory, but sometimes it
doesn't.  (First test is to see if ALL cold boots don't map out
those memory frames).  When it doesn't detect the bad memory the
kernal will eventually assign the memory to some process which
will eventually cause a problem and blooie.  The next test is
a usealot ... :-)

-- 
<---- David Herron -- The E-Mail guy            <david@ms.uky.edu>
<---- or:                {rutgers,uunet,cbosgd}!ukma!david, david@UKMA.BITNET
<----
<---- I don't have a Blue bone in my body!