[comp.unix.ultrix] uVax II problems

smaug@eng.umd.edu (Kurt J. Lidl) (06/01/90)

We have a uVax-II with a third party disk controller with two 320meg
Maxtor disk drives attached to this controller.  Additionally,
a third party 8 meg memory board is in the system.
After some downtime (for re-arrangment of the area where the
machine lives), I cannot bring the machine back up.

I have two different version of Ultrix running on the machine -- the
first drive contains a copy of Ultrix v3.0, the second disk drive
contains Ultrix 3.1.  BOth have run fine in the past.  What follows
is a copy of the "typical" crash when I try to re-boot it.  Not
knowing that much about Digital's hardware, I am in need of help.

What confuses me is why the machine gets so far along in the startup
before it crashes and burns.  Any ideas out there?

Thanks in advance.
----
Ultrixboot (using VMB version 13)

Loading (a)vmunix ...

Sizes:
text = 457336
data = 88704
bss  = 296112
Starting at 0x29eb

Ultrix-32 V3.0 (Rev 64) System #1: Sat Jun  3 00:24:34 EDT 1989
real mem  = 9433088
avail mem = 7112704
using 460 buffers containing 943104 bytes of memory
MicroVAX-II with an FPU
Q22 bus
uda0 at uba0
uq0 at uda0 csr 172150 vec 774, ipl 17
uda1 at uba0
uq17 at uda1 csr 160334 vec 770, ipl 17
qe0 at uba0 csr 174440 vec 764, ipl 17
ra1 at uq0 slave 1 (RA81)
ra0 at uq0 slave 0 (RA81)
ra3 at uq17 slave 1 (RX50)
ra2 at uq17 slave 0 (RX50)
Automatic reboot in progress...
Fri May  4 17:36:50 EDT 1990
/dev/ra1a: 492 files, 5040 used, 4007 free (119 frags, 486 blocks)
/dev/rra1g: umounted cleanly
/dev/rra1d: 59 files, 256 used, 8791 free (47 frags, 1093 blocks)
/dev/rra1f: 263 files, 5404 used, 111235 free (203 frags, 13879 blocks)
/dev/rra1e: 2 files, 9 used, 48934 free (14 frags, 6115 blocks)
Fri May  4 17:38:11 EDT 1990
System supports 2 users.
check quotas: done.
savecore: checking for dump...dump exists
System went down at Fri May  4 17:29:11 1990
saving elbuf
Saving 26624 bytes of image in /usr/adm/syserr/elbuffer
local daemons: portmap biod routed syslog sendmail.

machine check 80: read bus error, VAP is virtual
        sumpar  = 80
        most recent virtual addr        =8798
        internal state  =0
        pc      = 8794
        psl     = 3c00000

        mser    = 241
        cear    = 1e30
        dear    = 1e30

panic: mchk
sp      = 80001b34      ap      = 80001bac      fp      = 80001b90
pc      = 80050dff      ksp     = 80000000      usp     = 7fffe020
isp     = 80001b04      p0pr    = 80dc6e00      p0lr    = 000000d4
p1br    = 805c7200      p1lr    = 001fffe6      sbr     = 00083c54
slr     = 000096a0      pcbb    = 003e2000      scbb    = 00000600
ipl     = 0000001f      astlvl  = 00000004      sisr    = 00000000
iccs    = 00000040

interrupt stack:
80001b04: 80079a6f      00000003        00000000        00000000
80001b14: 800bd03c      800bd008        800bd0c8        80079a75
80001b24: 000000fc      00000000        00000020        00000000
80001b34: 00000000 *    2fff0000        80001bac ap     80001b90 fp
80001b44: 80050dff pc   00000000 r0     0000001f r1     00000001 r2
80001b54: 7fffe1e3 r3   00000026 r4     0000000f r5     00003295 r6
80001b64: 00000000 r7   7fffe364 r8     0000b3fc r9     80001be8 r10
80001b74: 0000000a r11  00000001        80079a70        00000000
80001b84: 0000008c      00000000        00000000        00000000 *
80001b94: 2c000000      80001bc8 ap     80001bb4 fp     80053cef pc
80001ba4: 00000012 r10  0000bdff r11    00000001 #      80001be8
80001bb4: 00000000 *    20000000        7fffe03c ap     7fffe020 fp
80001bc4: 80001f94 pc   00000001 #      80001be8        0000bdec
80001bd4: 0000bcab      00000001        7fffe1e3        00000004
80001be4: 00000000      0000000c        00000080        00008798
80001bf4: 00000000      00008794        03c00000
kernel stack:
80000000:
syncing disks... done

dumping to dev 909, offset 42603
Dump of 18418 pages successful

--
/* Kurt J. Lidl (smaug@eng.umd.edu) | Unix is the answer, but only if you */
/* UUCP: uunet!eng.umd.edu!smaug    | phrase the question very carefully. */

alan@shodha.dec.com ( Alan's Home for Wayward Notes File.) (06/02/90)

} [ Crash dump deleted for space savings... ]

	The machine check 80 is (as it says) a failure to read
	from memory.  Generally this is due to a parity error,
	but it could be something worse.  The VAP listed (what-
	ever that is) is a virtual memory address.  If you can
	turn it into a physical address you should be able to
	figure out if the error is on the 8 MB board or the 1 MB
	on the CPU board.

	Since the system was getting fairly far along in booting
	you could probably boot the standalone system and get it
	running.  That way at least you can check the file systems
	and run a simple memory test:

		dd if=/dev/mem of=/dev/null

	You might also be able to boot the customer diagnostic
	tape, which might have something on it test memory.
-- 
Alan Rollow				alan@nabeth.enet.dec.com

johnd@physiol.su.oz.au (John Dodson) (06/02/90)

In <1990Jun1.151251.24062@eng.umd.edu> smaug@eng.umd.edu (Kurt J. Lidl) writes:

>We have a uVax-II with a third party disk controller with two 320meg
>Maxtor disk drives attached to this controller.  Additionally,
>a third party 8 meg memory board is in the system.
>After some downtime (for re-arrangment of the area where the
>machine lives), I cannot bring the machine back up.

>What confuses me is why the machine gets so far along in the startup
>before it crashes and burns.  Any ideas out there?

>machine check 80: read bus error, VAP is virtual
>        sumpar  = 80
>        most recent virtual addr        =8798
>        internal state  =0
>        pc      = 8794
>        psl     = 3c00000

>        mser    = 241

You have a memory problem (m chk 80)
problem is in additional memory board 1 (mser = 241) if it were mser =2C1
it would be board 2.

it could be, the board itself (ie a bad chip) or the cable at the rear of the
boards if this cable is not very very short ;-) you will have problems (or if
it is not quite pushed right in)

Try the cable, then replace the board (or remove it, but then I don't know
if your kernel will manage with only the 1Mb on the cpu board)
Most of these 3rd party boards come with 5yr or lifetime ?;-) warranty.
For info on uvaxII hardware you need the KA630 cpu manual (can't recall the
DEC part No. off hand)

johnd@physiol.su.oz.au

smaug@eng.umd.edu (Kurt J. Lidl) (06/04/90)

In article <1990Jun1.151251.24062@eng.umd.edu> I wrote:
>We have a uVax-II [... with a problem ...]
>[crash dump copy deleted]

Thanks to:
Alan Rollow	alan@nabeth.enet.dec.com
John Dodson	johnd@physiol.su.oz.au

Both pointed out that it was memory check problem.

Solution:
	I pulled all the boards, inspected the contacts and re-seated
them firmly.  Everything started magically working again.  I guess
that the movement of items while re-arranging things jogged the
computer...  Oh well.  Live and learn.

Thanks again.
--
/* Kurt J. Lidl (smaug@eng.umd.edu) | Unix is the answer, but only if you */
/* UUCP: uunet!eng.umd.edu!smaug    | phrase the question very carefully. */