marmen@is2.bnr.ca (Rob Marmen 1532773) (10/20/89)
We are experiencing an intermittent problem on some of our dn4500s. These machines are equiped with 16 meg of memory, a 348 meg disk drive and an f series monitor. The machines will mysteriously reboot themselves. They will display the rainbow, run diagnostics and bring up the o/s. The error log will sometimes show a memory parity error, but sometimes no error is logged, just the startup. We have trapped a crash in progress and the error was "memory parity error". However, the workstation had several different and verified sets of memory boards. We suspect that the message is a fake. When the machine crashes with users logged in, the users had been running a large application with large data files. The most common operation before the crash was a save of the data to disk. Needless to say, the users are not happy. The problem is not network or application related. Some machines are extremely reliable, while a machine next to it will crash once a day. Local Apollo support has been doing a fantastic job trying to debug the problem. However, I would like to know if anyone else has seen the problem? We're rapidly running out of ideas. Anything at this point will be welcome. Thanking everyone in advance... rob... -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- | Robert Marmen marmen@bnr.ca OR | | Bell Northern Research marmen%bnr.ca@cunyvm.cuny.edu | | (613) 763-8244 My opinions are my own, not BNRs |
krowitz%richter@UMIX.CC.UMICH.EDU (David Krowitz) (10/20/89)
Hmm ... this may not be the same problem, but's here's something to check. We bought a brand new DN3500 with 8MB and a 690MB disk. No floppy or tape. It, too, would crash mysteriously and would kill users disk I/O intensive jobs with disk errors. Yet, there were no messages in the error logs. Our field service personel found that the disk controller board that had been shipped with the machine had been configured with either (or maybe both, I don't rememeber clearly) the floppy and/or the tape enabled (this was the Western Digital multifunction controller). He re-configured the board, and all has been fine since then. -- David Krowitz krowitz@richter.mit.edu (18.83.0.109) krowitz%richter.mit.edu@eddie.mit.edu krowitz%richter.mit.edu@mitvma.bitnet (in order of decreasing preference)
danny@idacom.UUCP (Danny Wilson) (10/22/89)
In article <119@bnrgate.bnr.ca>, marmen@is2.bnr.ca (Rob Marmen 1532773) writes: > > We are experiencing an intermittent problem on some of our > dn4500s. These machines are equiped with 16 meg of memory, a 348 > meg disk drive and an f series monitor. > > The machines will mysteriously reboot themselves. We have a similar, but perhaps unrelated problem on a DSP90 with 3 Mb. This node, connected to a 9-track tape, does routine backups via a bourne shell script (using wbak). Once in a while during the backup, the machine will orderly shut itself down. The message on the console simply says "Beginning shutdown sequence" kills off all the processes and proclaims "Shutdown is successful". Mentor has not been able to pinpoint exactly why this happens. Anyone else seen this? -- Danny Wilson IDACOM Electronics danny@idacom.uucp Edmonton, Alberta alberta!idacom!danny C A N A D A X.400 danny@idacom.cs.ubc.cdn
lampi@pnet02.gryphon.com (Michael Lampi) (10/26/89)
Do you have a power line filter; e.g., surge supressor, uninterruptible power supply (UPS), etc.? Some machines do a better job of filtering electrical noise better than others. Also, have you tried reseating the PALs in the mother board? Finally, have you run /systest/ssr_util/lsyserr to dump the system error message file? Michael Lampi MDL Corporation (213) 782-7888 UUCP: {ames!elroy, <routing site>}!gryphon!pnet02!lampi INET: lampi@pnet02.gryphon.com