kevin@uunet.uu.net (Kevin Kelleher) (03/01/89)
I have a 3/280 running 4.0.1 and have had two system crashes in the last two days during periods of heavy nfs activity (there are 7 3/60s connected to it). I am hoping that someone can make since out of the panic messages below. I have a call into sun about it, but we all know how long that can take. First crash: diff: trap address 0x8, pid 29905, pc = f027448, sr = 2004, stkfmt b, context 1 Bus Error Reg 80<INVALID> data fault address c faultc 0 faultb 0 dfault 1 rw 1 size 0 fcode 5 KERNEL MODE page map 0 pmgrp aa D0-D7 0 f 4 0 0 0 100 20 A0-A7 f117ce8 f117ce8 f117ce8 0 f0b2cc0 f2922cc ffff96b0 ffff9684 Begin traceback...fp = ffff96b0, sp = ffff9684 Called from f045d72, fp=ffff96d8, args=ffff9738 1 0 1 Called from f03da06, fp=ffff973c, args=ffff9738 1 f117560 0 Called from f03c85c, fp=ffff9768, args=21cdc 0 1 0 Called from f03c7da, fp=ffff9780, args=21cdc 1 0 ffff97b4 Called from f067cec, fp=ffff97a8, args=ffff9a18 28c54 281ac 0 Called from f004768, fp=efffb90, args=5 220f1 fffffffb 4 End traceback... panic: Bus error syncing file systems... Hung at this point.... Second crash (about 26 hours later): syslogd: trap address 0x8, pid 96, pc = f031e4a, sr = 2004, stkfmt b, context 1 Bus Error Reg 80<INVALID> data fault address 0 faultc 0 faultb 0 dfault 1 rw 1 size 0 fcode 5 KERNEL MODE page map 0 pmgrp 28 D0-D7 0 f19ba37 1 0 0 0 2 f112004 A0-A7 f0a34e8 f19ba38 0 0 f0b2020 f17ad0c ffff95f8 ffff95e8 Begin traceback...fp = ffff95f8, sp = ffff95e8 Called from f028a38, fp=ffff9628, args=f17ad0c f0b2020 38 ffff976c Called from f05c9ca, fp=ffff9638, args=f117ce8 ffff976c ffff9684 f0461da Called from f0461da, fp=ffff9684, args=0 ffff976c 0 2 Called from f03b7e4, fp=ffff96b0, args=f1e11c4 ffff976c 1 2 Called from f02de10, fp=ffff96d8, args=f0e3574 1 ffff976c effffdc Called from f02dd20, fp=ffff9780, args=ffff976c 1 ffff97b4 effe29f Called from f067cec, fp=ffff97a8, args=ffff9a18 effe250 222f0 0 Called from f004768, fp=effe268, args=79 d8 ffffffdd 1 End traceback... panic: Bus error syncing file systems... [32] 6 [32] [28] [21] [10] done dumping to vp f117c74, offset 50648 I kept copies of the core dumps if anyone can tell me how to gleem useful information out of them. Much thanks Kevin Kelleher {uunet,pyramid,altos}!xilinx!kevin Xilinx Inc. (408) 559-7778 x269 2069 Hamilton Ave San Jose CA 95125 [[ Don't rule out a hardware problem. This could, possibly, be caused by bad memory (but that's just a guess). --wnl ]]
chris@uunet.uu.net (Chris Brown) (03/16/89)
Reference: Kevin Kelleher's query in v7n173 > diff: > trap address 0x8, pid 29905, pc = f027448, sr = 2004, stkfmt b, context 1 > syslogd: > trap address 0x8, pid 96, pc = f031e4a, sr = 2004, stkfmt b, context 1 We saw similar messages whilst we were debugging a device driver. The problem was a bus error occurring during an interrupt service routine, caused by a device interrupting when it wasn't expected to, so that when it tried to use some pointers into what had been (once upon a time) a user's data area, it found garbage addresses. I think the program name (`diff' and `syslogd' in the report given) refers to the process which was executing at the time of the error. If the error is really in an interrupt routine, the name of the current process is a complete red herring. The fact that it's different in these two cases is what makes me think your error is an interrupt problem. Have you installed a new device or device driver recently? Sorry this isn't very explicit, but it's the best I can do! Hope it helps. Chris Brown, A.I. Vision Research Unit, Sheffield University (chris@aivru.sheffield.ac.uk)
guy@uunet.uu.net (Guy Harris) (03/31/89)
>I think the program name (`diff' and `syslogd' in the report given) refers >to the process which was executing at the time of the error. This is entirely correct. >If the error is really in an interrupt routine, the name of the current >process is a complete red herring. Precisely.