[net.unix-wizards] cp read fault

kjs%tufts.csnet@csnet-relay.arpa (Kevin Sullivan) (03/15/86)

we have had about 50 crashes in the last ten months.  the diagnostic
says cp read fault.  the machine finally dies with panic: mchk.  DEC
A bunch of register values get printed out.  DEC field service informs
us that the timeout address printed is the only good clue,  but that it
simply indicates the interrupt level of the problem device, not the
device itself.  since almost everything on our unibus is at priority level
5,  they are having problems isolating the malfunction.  in fact,  we have
replaced deuna, dmr, 8 dz's,  the dw780,  cables,  etc. - i.e. almost
everything,  but the problem persists.  has anyone else had this kind of
problem.  if so,  do you have any advice or info that might help us?

kevin sullivan
tufts university
kjs%tufts@csnet-relay

chris@umcp-cs.UUCP (Chris Torek) (03/15/86)

In artcile <1824@brl-smoke.ARPA> kjs%tufts.csnet@csnet-relay.arpa
asks for help in tracking down a `cp read fault' machine check.

Look at the program counter (`pc') value and see in which routine
you were running.  That can be a useful clue; e.g., if you are in
dmf.c, it is probably a DMF (or of course all the access paths
all the way out to the device).  The `va/viba' register gives the
virtual address the machine was working on when it got the read
fault.
-- 
In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 1415)
UUCP:	seismo!umcp-cs!chris
CSNet:	chris@umcp-cs		ARPA:	chris@mimsy.umd.edu

stanonik@nprdc.arpa (Ron Stanonik) (03/22/86)

Our newer vax 780 (two years newer) suffers from cp cache par faults,
a couple times a week.  Both vaxs run essentially the same kernel (differing
only in drivers for different equipment).

I'd like to hear more about the "out of band recvfrom" error.

Our dec field rep said there were two cache related boards in the cpu.  He
replaced one.  Soon we'll have him replace the other.

Ron Stanonik
stanonik@nprdc.arpa