[comp.sys.apollo] Intermittent DN4500 Problem

marmen@is2.bnr.ca (Rob Marmen 1532773) (10/20/89)

	We are experiencing an intermittent problem on some of our
dn4500s. These machines are equiped with 16 meg of memory, a 348 
meg disk drive and an f series monitor.

	The machines will mysteriously reboot themselves. They will
display the rainbow, run diagnostics and bring up the o/s. The 
error log will sometimes show a memory parity error, but sometimes
no error is logged, just the startup. We have trapped a crash in 
progress and the error was "memory parity error". However, the 
workstation had several different and verified sets of memory
boards. We suspect that the message is a fake.

	When the machine crashes with users logged in, the users had
been running a large application with large data files. The most
common operation before the crash was a save of the data to disk.
Needless to say, the users are not happy.

	The problem is not network or application related. Some 
machines are extremely reliable, while a machine next to it
will crash once a day.

	Local Apollo support has been doing a fantastic job trying to
debug the problem. However, I would like to know if anyone else
has seen the problem? We're rapidly running out of ideas.
Anything at this point will be welcome.

Thanking everyone in advance...    rob...
  
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
| Robert Marmen             marmen@bnr.ca  OR             |
| Bell Northern Research    marmen%bnr.ca@cunyvm.cuny.edu |
| (613) 763-8244         My opinions are my own, not BNRs |

krowitz%richter@UMIX.CC.UMICH.EDU (David Krowitz) (10/20/89)

Hmm ... this may not be the same problem, but's here's something to
check. We bought a brand new DN3500 with 8MB and a 690MB disk. No
floppy or tape. It, too, would crash mysteriously and would kill users
disk I/O intensive jobs with disk errors. Yet, there were no messages
in the error logs. Our field service personel found that the disk
controller board that had been shipped with the machine had been
configured with either (or maybe both, I don't rememeber clearly) the
floppy and/or the tape enabled (this was the Western Digital multifunction
controller). He re-configured the board, and all has been fine since
then.


 -- David Krowitz

krowitz@richter.mit.edu   (18.83.0.109)
krowitz%richter.mit.edu@eddie.mit.edu
krowitz%richter.mit.edu@mitvma.bitnet
(in order of decreasing preference)

danny@idacom.UUCP (Danny Wilson) (10/22/89)

In article <119@bnrgate.bnr.ca>, marmen@is2.bnr.ca (Rob Marmen 1532773) writes:
> 
> 	We are experiencing an intermittent problem on some of our
> dn4500s. These machines are equiped with 16 meg of memory, a 348 
> meg disk drive and an f series monitor.
> 
> 	The machines will mysteriously reboot themselves. 

We have a similar, but perhaps unrelated problem on a DSP90 with 3 Mb.
This node, connected to a 9-track tape, does routine backups via 
a bourne shell script (using wbak).

Once in a while during the backup, the machine will orderly shut itself
down.  The message on the console simply says "Beginning shutdown sequence"
kills off all the processes and proclaims "Shutdown is successful".

Mentor has not been able to pinpoint exactly why this happens.
Anyone else seen this?

-- 
Danny Wilson
IDACOM Electronics		danny@idacom.uucp
Edmonton, Alberta		alberta!idacom!danny
C A N A D A		X.400	danny@idacom.cs.ubc.cdn

lampi@pnet02.gryphon.com (Michael Lampi) (10/26/89)

Do you have a power line filter; e.g., surge supressor, uninterruptible power
supply (UPS), etc.?  Some machines do a better job of filtering electrical
noise better than others.  Also, have you tried reseating the PALs in the
mother board? Finally, have you run /systest/ssr_util/lsyserr to dump the
system error message file?

Michael Lampi
MDL Corporation
(213) 782-7888

UUCP: {ames!elroy, <routing site>}!gryphon!pnet02!lampi
INET: lampi@pnet02.gryphon.com