[comp.sys.ibm.pc] Why am I getting this crazy parity interrupt???

lane@dalcs.UUCP (John Wright/Dr. Pat Lane) (08/07/88)

Perhaps some tech expert out there can tell me what's wrong with my system,
a locally built AT clone (6/10MHz 286, 1Meg RAM, herc clone video, Western
Digital disk controller, Panasonic 1.2Meg floppy, ST-225 hard disk).  I'm
using PCDOS 3.20.  Here's the problem:

I get an occasional but frequent memory parity interrupt at location F000:ADF1.
It always occurs while doing a floppy disk access.  It's never caused a problem
other than the interruption and I've never detected any corruption of the file
being read or written. I have one of those little TSR parity interuppt handlers
installed so it's just a persistent annoyance.  What really strikes me strange
is that the address given (by DOS's parity interrupt handler) is in ROM which
was, I thought, not parity checked?!?.

Another wrinkle: I have a ram-disk (VDISK) as drive E: and on the next access
to it after getting the floppy-disk parity error I always get a Data Error.
I always tell it to re-try and everthing is just dandy.  Weird, eh?


There are some related problems that only occur on the faster CPU speed:

Certain pop-up TSR programs cause DOS to report a memory parity error in low
memory when popped-up.  Again, It doesn't seem to cause any problem beyond the
interruption.

Ceratin graphics programs cause the video to go bannanas (a pattern appears 
that looks like a flock of seagulls and slowly expands on the screen and which 
point I nervously re-boot).  The programs work fine at the slower speed and 
other graphics programs work at all speeds.

Any tips/hints/clues much appreciated.

-- 
John Wright      /////////////////     Phone:  902-424-3805  or  902-424-6527
Post: c/o Dr Pat Lane, Biology Dept, Dalhousie U, Halifax N.S., CANADA B3H-4H8 
Cdn/Bitnet: lane@cs.dal.cdn    Arpa: lane%dalcs.uucp@uunet.uu.net
Uucp: lane@dalcs.uucp or {uunet,watmath,utai,garfield}!dalcs!lane  

jim@belltec.UUCP (Mr. Jim's Own Logon) (08/10/88)

In article <2968@dalcs.UUCP>, lane@dalcs.UUCP (John Wright/Dr. Pat Lane) writes:
> Perhaps some tech expert out there can tell me what's wrong with my system,
.
.
> 
> I get an occasional but frequent memory parity interrupt at location F000:ADF1.
> It always occurs while doing a floppy disk access.  It's never caused a problem
> other than the interruption and I've never detected any corruption of the file
> being read or written. I have one of those little TSR parity interuppt handlers
> installed so it's just a persistent annoyance.  What really strikes me strange
> is that the address given (by DOS's parity interrupt handler) is in ROM which
> was, I thought, not parity checked?!?.
> 
> -- 
> John Wright      /////////////////     Phone:  902-424-3805  or  902-424-6527


   Expert?  Well, maybe. The reason that the address reported is in ROM is
that without special hardware support (which isn't in PC's) the reported
error address is always the address fetch immediately AFTER the offending
read. This reported address often is nowhere near the actual address that 
failed. Only memory tests will report actual failing addresses (that or a
logic analyzer). 

   Why is it happening? Firstly, understand that it is a hardware problem.
Software cannot cause a parity error once the memory has been initialized
(which happens during the initial BIOS memory count). If you have tried
to upgrade the speed of your system by changing the crystal and maybe the
RAMs, this is the problem. There is a lot more to the system timing than
just the clock speed and the RAM access time. You could try faster RAMs,
but it may not make any difference. Could be the machine is just getting
old, parts age like everything else. When they age, the timing changes and
if the original design was on the edge (as far too many of the clone
machines are) then it breaks. Could be one weak RAM chip, could be noise
on the power supply, could be slightly conductive dust on the motherboard,
could be the star wars site next to your house.

   What can you do? Run an AT memory diagnostic for a day. IF it fails at
the same address or the same bit every time, then maybe replacing a single
RAM chip will solve the problem. Run an extended floppy test and watch for
parity errors, this will indicate if it is DMA related (DMA cycles have 
different memory timing than CPU cycles). If it is DMA related, Uh, well,
uh, well at least it is good to know that it is DMA related. Clean everything
in the system. Might make a difference, and it will make your mother proud of 
you.

   One final word of advice, unless you have a weak cell in a RAM, parity
errors are rarely single events. If you get a parity error the remainder 
of what you are doing is suspect. Some versions of DOS and UNIX do not
reset the parity logic after the first is seen, so any subsequent errors
are unnoticed. IF you are running something important, and want whatever
results you get to be good, give up after the first parity error and start
again. A fair number of computers just shut down on a parity error, giving
you no chance to proceed.


						-Jim Wall
						Bell Technologies Inc.