smaug@eng.umd.edu (Kurt J. Lidl) (06/01/90)
We have a uVax-II with a third party disk controller with two 320meg Maxtor disk drives attached to this controller. Additionally, a third party 8 meg memory board is in the system. After some downtime (for re-arrangment of the area where the machine lives), I cannot bring the machine back up. I have two different version of Ultrix running on the machine -- the first drive contains a copy of Ultrix v3.0, the second disk drive contains Ultrix 3.1. BOth have run fine in the past. What follows is a copy of the "typical" crash when I try to re-boot it. Not knowing that much about Digital's hardware, I am in need of help. What confuses me is why the machine gets so far along in the startup before it crashes and burns. Any ideas out there? Thanks in advance. ---- Ultrixboot (using VMB version 13) Loading (a)vmunix ... Sizes: text = 457336 data = 88704 bss = 296112 Starting at 0x29eb Ultrix-32 V3.0 (Rev 64) System #1: Sat Jun 3 00:24:34 EDT 1989 real mem = 9433088 avail mem = 7112704 using 460 buffers containing 943104 bytes of memory MicroVAX-II with an FPU Q22 bus uda0 at uba0 uq0 at uda0 csr 172150 vec 774, ipl 17 uda1 at uba0 uq17 at uda1 csr 160334 vec 770, ipl 17 qe0 at uba0 csr 174440 vec 764, ipl 17 ra1 at uq0 slave 1 (RA81) ra0 at uq0 slave 0 (RA81) ra3 at uq17 slave 1 (RX50) ra2 at uq17 slave 0 (RX50) Automatic reboot in progress... Fri May 4 17:36:50 EDT 1990 /dev/ra1a: 492 files, 5040 used, 4007 free (119 frags, 486 blocks) /dev/rra1g: umounted cleanly /dev/rra1d: 59 files, 256 used, 8791 free (47 frags, 1093 blocks) /dev/rra1f: 263 files, 5404 used, 111235 free (203 frags, 13879 blocks) /dev/rra1e: 2 files, 9 used, 48934 free (14 frags, 6115 blocks) Fri May 4 17:38:11 EDT 1990 System supports 2 users. check quotas: done. savecore: checking for dump...dump exists System went down at Fri May 4 17:29:11 1990 saving elbuf Saving 26624 bytes of image in /usr/adm/syserr/elbuffer local daemons: portmap biod routed syslog sendmail. machine check 80: read bus error, VAP is virtual sumpar = 80 most recent virtual addr =8798 internal state =0 pc = 8794 psl = 3c00000 mser = 241 cear = 1e30 dear = 1e30 panic: mchk sp = 80001b34 ap = 80001bac fp = 80001b90 pc = 80050dff ksp = 80000000 usp = 7fffe020 isp = 80001b04 p0pr = 80dc6e00 p0lr = 000000d4 p1br = 805c7200 p1lr = 001fffe6 sbr = 00083c54 slr = 000096a0 pcbb = 003e2000 scbb = 00000600 ipl = 0000001f astlvl = 00000004 sisr = 00000000 iccs = 00000040 interrupt stack: 80001b04: 80079a6f 00000003 00000000 00000000 80001b14: 800bd03c 800bd008 800bd0c8 80079a75 80001b24: 000000fc 00000000 00000020 00000000 80001b34: 00000000 * 2fff0000 80001bac ap 80001b90 fp 80001b44: 80050dff pc 00000000 r0 0000001f r1 00000001 r2 80001b54: 7fffe1e3 r3 00000026 r4 0000000f r5 00003295 r6 80001b64: 00000000 r7 7fffe364 r8 0000b3fc r9 80001be8 r10 80001b74: 0000000a r11 00000001 80079a70 00000000 80001b84: 0000008c 00000000 00000000 00000000 * 80001b94: 2c000000 80001bc8 ap 80001bb4 fp 80053cef pc 80001ba4: 00000012 r10 0000bdff r11 00000001 # 80001be8 80001bb4: 00000000 * 20000000 7fffe03c ap 7fffe020 fp 80001bc4: 80001f94 pc 00000001 # 80001be8 0000bdec 80001bd4: 0000bcab 00000001 7fffe1e3 00000004 80001be4: 00000000 0000000c 00000080 00008798 80001bf4: 00000000 00008794 03c00000 kernel stack: 80000000: syncing disks... done dumping to dev 909, offset 42603 Dump of 18418 pages successful -- /* Kurt J. Lidl (smaug@eng.umd.edu) | Unix is the answer, but only if you */ /* UUCP: uunet!eng.umd.edu!smaug | phrase the question very carefully. */
alan@shodha.dec.com ( Alan's Home for Wayward Notes File.) (06/02/90)
} [ Crash dump deleted for space savings... ]
The machine check 80 is (as it says) a failure to read
from memory. Generally this is due to a parity error,
but it could be something worse. The VAP listed (what-
ever that is) is a virtual memory address. If you can
turn it into a physical address you should be able to
figure out if the error is on the 8 MB board or the 1 MB
on the CPU board.
Since the system was getting fairly far along in booting
you could probably boot the standalone system and get it
running. That way at least you can check the file systems
and run a simple memory test:
dd if=/dev/mem of=/dev/null
You might also be able to boot the customer diagnostic
tape, which might have something on it test memory.
--
Alan Rollow alan@nabeth.enet.dec.com
johnd@physiol.su.oz.au (John Dodson) (06/02/90)
In <1990Jun1.151251.24062@eng.umd.edu> smaug@eng.umd.edu (Kurt J. Lidl) writes: >We have a uVax-II with a third party disk controller with two 320meg >Maxtor disk drives attached to this controller. Additionally, >a third party 8 meg memory board is in the system. >After some downtime (for re-arrangment of the area where the >machine lives), I cannot bring the machine back up. >What confuses me is why the machine gets so far along in the startup >before it crashes and burns. Any ideas out there? >machine check 80: read bus error, VAP is virtual > sumpar = 80 > most recent virtual addr =8798 > internal state =0 > pc = 8794 > psl = 3c00000 > mser = 241 You have a memory problem (m chk 80) problem is in additional memory board 1 (mser = 241) if it were mser =2C1 it would be board 2. it could be, the board itself (ie a bad chip) or the cable at the rear of the boards if this cable is not very very short ;-) you will have problems (or if it is not quite pushed right in) Try the cable, then replace the board (or remove it, but then I don't know if your kernel will manage with only the 1Mb on the cpu board) Most of these 3rd party boards come with 5yr or lifetime ?;-) warranty. For info on uvaxII hardware you need the KA630 cpu manual (can't recall the DEC part No. off hand) johnd@physiol.su.oz.au
smaug@eng.umd.edu (Kurt J. Lidl) (06/04/90)
In article <1990Jun1.151251.24062@eng.umd.edu> I wrote: >We have a uVax-II [... with a problem ...] >[crash dump copy deleted] Thanks to: Alan Rollow alan@nabeth.enet.dec.com John Dodson johnd@physiol.su.oz.au Both pointed out that it was memory check problem. Solution: I pulled all the boards, inspected the contacts and re-seated them firmly. Everything started magically working again. I guess that the movement of items while re-arranging things jogged the computer... Oh well. Live and learn. Thanks again. -- /* Kurt J. Lidl (smaug@eng.umd.edu) | Unix is the answer, but only if you */ /* UUCP: uunet!eng.umd.edu!smaug | phrase the question very carefully. */