[comp.os.vms] Help? Dead VAX 11/750 ...

dent@unocss.unomaha.edu (dent) (07/24/90)

Hello...

The Student Chapter of ACM here at the Univ. of NE at Omaha owns a VAX
11/750, w/ 8 Megs RAM, Floating Point Accellerator, 2 RM05's, an RM03,
a DMF32, and a DELUA. (also a TS11, plus other misc. parts)

We've been running VMS on this system for about 4 weeks, until "suddenly"
we started getting errors about a corrupted memory cache.  The 750 would
then start restarting itself randomly, dropping to the monitor '>>>'
prompt with the PC register displayed, as well as error code 04 which
is "Interrupt stack not valid or unable to read SCB".  Sometimes the machine
was able to re-run VMS for a little while, but eventually did it all again.
Then, if that wasn't enough, the 750 refused to even give the monitor
prompt when turned on.  As it stands now, the machine prints one '%' on
the console when you flip it on, and then hangs.  As far as I know, it's
supposed to print that 1st '%' when it starts the microcode check, and then
either an error message (meaning the microcode is bad), or a 2nd '%' if
it is good.  Then it [normally] puts you in the monitor.

All of these problems seemed to align chronologically with a board swap we
had just performed: we took out a DZ11 board and replaced it with the DMF32
mentioned above (which does DMA, so it doesn't seem likely that it would
contribute to an interrupt problem...)  Because of the timing, however,
we yanked the DMF32 back out, but the problem was still there.

I was flipping through the "VAX Hardware Handbook" (published 1982) and
noticed that:

	"Interrupting devices on the UNIBUS are directly
	vectored through the System Control Block (SCB)."

so I wound up moving the UNIBUS terminator directly after the memory cards,
to in effect remove all of the UNIBUS activity.  No change.  I put the
terminator back in the last UNIBUS slot, and then took out all but 1 memory
card.  No change.  I swapped the remaining memory card with each of the other
7 in turn; no change in any case.

We're kind of at the end of our rope now; it really doesn't seem like UNIBUS
has anything to do with the problem.  Let me also add that we took out all of
the CPU boards and reseated the socketed chips, also with no effect.  (The
750 did sit in a warehouse for a while before UNO-ACM aquired it..)

Does the 1-but-not-2 '%' indicate that the microcode test itself is faulty?
Has anyone else run into this kind of "random reset" problem?  Why should
the problems all start after a few weeks of flawless performance?  Any help
that anyone would be able to offer would be /greatly/ appreciated by the
members of UNO-ACM here! :-)

-/ Dave Caplinger /---------------------------------------------------------
  President, Student Chapter of ACM at the University of Nebraska at Omaha 
  acmpres@zeus.unomaha.edu       ..!uunet!unocss!dent       ACMPRES@UNOMA1

grr@cbmvax.commodore.com (George Robbins) (07/29/90)

In article <3035@unocss.unomaha.edu> dent@unocss.unomaha.edu (dent) writes:
> Hello...
> 
> The Student Chapter of ACM here at the Univ. of NE at Omaha owns a VAX
> 11/750, w/ 8 Megs RAM, Floating Point Accellerator, 2 RM05's, an RM03,
> a DMF32, and a DELUA. (also a TS11, plus other misc. parts)
> 
> We've been running VMS on this system for about 4 weeks, until "suddenly"
> we started getting errors about a corrupted memory cache.  The 750 would
> then start restarting itself randomly, dropping to the monitor '>>>'
> 
> All of these problems seemed to align chronologically with a board swap we
> had just performed: we took out a DZ11 board and replaced it with the DMF32
> mentioned above (which does DMA, so it doesn't seem likely that it would
> contribute to an interrupt problem...)  Because of the timing, however,
> we yanked the DMF32 back out, but the problem was still there.

Well, DMF32 boards are notorious power suckers, so it may have put your
power supply around the bend.  General advice is to 1-by-1 pull out the
CPU boards, flex a little, shove on chips, replace boards, blow dust out
of system, power supplies and fan assembly/filter, check voltages, kick
twice, cycle power and so on.

If this doesn't work, call around to used equipment vendors and see if anyone
wants to donate you another 750 or a CPU boardset, the value of bare CPU's has
declined awfully close to zero...

-- 
George Robbins - now working for,     uucp:   {uunet|pyramid|rutgers}!cbmvax!grr
but no way officially representing:   domain: grr@cbmvax.commodore.com
Commodore, Engineering Department     phone:  215-431-9349 (only by moonlite)