dent@unocss.unomaha.edu (dent) (07/24/90)
Hello... The Student Chapter of ACM here at the Univ. of NE at Omaha owns a VAX 11/750, w/ 8 Megs RAM, Floating Point Accellerator, 2 RM05's, an RM03, a DMF32, and a DELUA. (also a TS11, plus other misc. parts) We've been running VMS on this system for about 4 weeks, until "suddenly" we started getting errors about a corrupted memory cache. The 750 would then start restarting itself randomly, dropping to the monitor '>>>' prompt with the PC register displayed, as well as error code 04 which is "Interrupt stack not valid or unable to read SCB". Sometimes the machine was able to re-run VMS for a little while, but eventually did it all again. Then, if that wasn't enough, the 750 refused to even give the monitor prompt when turned on. As it stands now, the machine prints one '%' on the console when you flip it on, and then hangs. As far as I know, it's supposed to print that 1st '%' when it starts the microcode check, and then either an error message (meaning the microcode is bad), or a 2nd '%' if it is good. Then it [normally] puts you in the monitor. All of these problems seemed to align chronologically with a board swap we had just performed: we took out a DZ11 board and replaced it with the DMF32 mentioned above (which does DMA, so it doesn't seem likely that it would contribute to an interrupt problem...) Because of the timing, however, we yanked the DMF32 back out, but the problem was still there. I was flipping through the "VAX Hardware Handbook" (published 1982) and noticed that: "Interrupting devices on the UNIBUS are directly vectored through the System Control Block (SCB)." so I wound up moving the UNIBUS terminator directly after the memory cards, to in effect remove all of the UNIBUS activity. No change. I put the terminator back in the last UNIBUS slot, and then took out all but 1 memory card. No change. I swapped the remaining memory card with each of the other 7 in turn; no change in any case. We're kind of at the end of our rope now; it really doesn't seem like UNIBUS has anything to do with the problem. Let me also add that we took out all of the CPU boards and reseated the socketed chips, also with no effect. (The 750 did sit in a warehouse for a while before UNO-ACM aquired it..) Does the 1-but-not-2 '%' indicate that the microcode test itself is faulty? Has anyone else run into this kind of "random reset" problem? Why should the problems all start after a few weeks of flawless performance? Any help that anyone would be able to offer would be /greatly/ appreciated by the members of UNO-ACM here! :-) -/ Dave Caplinger /--------------------------------------------------------- President, Student Chapter of ACM at the University of Nebraska at Omaha acmpres@zeus.unomaha.edu ..!uunet!unocss!dent ACMPRES@UNOMA1
grr@cbmvax.commodore.com (George Robbins) (07/29/90)
In article <3035@unocss.unomaha.edu> dent@unocss.unomaha.edu (dent) writes: > Hello... > > The Student Chapter of ACM here at the Univ. of NE at Omaha owns a VAX > 11/750, w/ 8 Megs RAM, Floating Point Accellerator, 2 RM05's, an RM03, > a DMF32, and a DELUA. (also a TS11, plus other misc. parts) > > We've been running VMS on this system for about 4 weeks, until "suddenly" > we started getting errors about a corrupted memory cache. The 750 would > then start restarting itself randomly, dropping to the monitor '>>>' > > All of these problems seemed to align chronologically with a board swap we > had just performed: we took out a DZ11 board and replaced it with the DMF32 > mentioned above (which does DMA, so it doesn't seem likely that it would > contribute to an interrupt problem...) Because of the timing, however, > we yanked the DMF32 back out, but the problem was still there. Well, DMF32 boards are notorious power suckers, so it may have put your power supply around the bend. General advice is to 1-by-1 pull out the CPU boards, flex a little, shove on chips, replace boards, blow dust out of system, power supplies and fan assembly/filter, check voltages, kick twice, cycle power and so on. If this doesn't work, call around to used equipment vendors and see if anyone wants to donate you another 750 or a CPU boardset, the value of bare CPU's has declined awfully close to zero... -- George Robbins - now working for, uucp: {uunet|pyramid|rutgers}!cbmvax!grr but no way officially representing: domain: grr@cbmvax.commodore.com Commodore, Engineering Department phone: 215-431-9349 (only by moonlite)