ehrlich@shire.cs.psu.edu (Daniel R. Ehrlich) (08/24/89)
A short history of our problem. July 13, 1989 System installed by Sun FE. SunOS 4.0.3 installed. July 14-Present System crashes frequently (2-3 times per day) with one of the following errors: "BAD TRAP. Kernel write data fault." "Watchdog reset." A Sun UNIX Technical Support Engineer has been logging in on occasion to our system to look at the crash dumps generated by the "BAD TRAP" crashes. There is no way I know of to force a dump after one gets the "Watchdog reset" error, so there are no dumps from this error. It has been pointed out to Sun that in the module machdep.c (at least in SunOS 4.0) there are #ifdef's in the fault handling code that depend on the CPU (4_260 vs 4_110) the module is compiled on. Unfortunately Sun does not supply this module as a source in the binary distributions, so it is not possible to determine which type of CPU machdep.o was compiled for. The gut feeling around here is that this is a possible cause of the "BAD TRAP" errors. The "Watchdog reset" errors seem to occur when both 7053 disk controllers as busy. One can usually generate a "Watchdog reset" in sigle user mode by running fsck(8) in parallel on disks attached to the two controllers. One might conclude that the 7053 controller has a timing problem and is not being a good VME bus neighbor. The other more ominous choice is that more than one 7053 has never been fully tested in a machine with a 501-1491 CPU board installed which has a faster clock that the older 4/260 CPUs. For reference here is our current configuration. Please note that as shipped from Sun the ALM-II and the SCSI adapter were not installed. Also, please not that both types of crashes were occuring BEFORE these boards were installed and have continued unabated since they were installed. Slot # Board Description 1 501-1491-05 Sun 4/200 CPU w/FPU2 2 3 501-1203-04 ALM-II Sixteen Port Async 4 501-1550-03 Xylogics 472 Tape Controller (Fujitsu 1600/6250 tape drive attached) 5 501-1249-04 7053 SMD Disk Controller (Two NEC D2363 disks attached) 6 501-1254-03 32Mb Memory Array 7 501-1249-03 7053 SMD Disk Controller (One NEC D2363 disk attached) 8 501-1217-03 SUN 3 VME SCSI Controller 501-1220-01 VME 3x2 Adapter (ExaByte 8mm tape drive attached) Any ideas, thoughts, or comments would be appreciated. -- Dan Ehrlich Computer Science Department The Pennsylvania State University 333 Whitmore Laboratory University Park, PA 16802