kjs%tufts.csnet@csnet-relay.arpa (Kevin Sullivan) (03/15/86)
we have had about 50 crashes in the last ten months. the diagnostic says cp read fault. the machine finally dies with panic: mchk. DEC A bunch of register values get printed out. DEC field service informs us that the timeout address printed is the only good clue, but that it simply indicates the interrupt level of the problem device, not the device itself. since almost everything on our unibus is at priority level 5, they are having problems isolating the malfunction. in fact, we have replaced deuna, dmr, 8 dz's, the dw780, cables, etc. - i.e. almost everything, but the problem persists. has anyone else had this kind of problem. if so, do you have any advice or info that might help us? kevin sullivan tufts university kjs%tufts@csnet-relay
chris@umcp-cs.UUCP (Chris Torek) (03/15/86)
In artcile <1824@brl-smoke.ARPA> kjs%tufts.csnet@csnet-relay.arpa asks for help in tracking down a `cp read fault' machine check. Look at the program counter (`pc') value and see in which routine you were running. That can be a useful clue; e.g., if you are in dmf.c, it is probably a DMF (or of course all the access paths all the way out to the device). The `va/viba' register gives the virtual address the machine was working on when it got the read fault. -- In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 1415) UUCP: seismo!umcp-cs!chris CSNet: chris@umcp-cs ARPA: chris@mimsy.umd.edu
stanonik@nprdc.arpa (Ron Stanonik) (03/22/86)
Our newer vax 780 (two years newer) suffers from cp cache par faults, a couple times a week. Both vaxs run essentially the same kernel (differing only in drivers for different equipment). I'd like to hear more about the "out of band recvfrom" error. Our dec field rep said there were two cache related boards in the cpu. He replaced one. Soon we'll have him replace the other. Ron Stanonik stanonik@nprdc.arpa