[comp.unix.xenix] Help! Mysterious System Lock-Up

noel@ubbs-nh.MV.COM (N. Del More) (09/24/89)

I am experiancing a mysterious problem which causes the system to lock-up
requiring a power down in order to reboot the system, their are not error
messages displayed or logged.  The configuration of the system is as
follows:

	80386 25 Mhz. Motherboard with an AMI BIOS
	4 Mb. 80 ns. DRAM 
	2 Serial Ports
	1 Parallel Port
	DTC 5280 Hard Disk/Floppy Controller
	Adaptec AHA-1540a SCSI Host Interface
	Video 7 EGA Deluxe Video Card
	Digiboard COM 8/i Intelligent Serial Board
	Archive SC402 QIC-02 Tape Drive Interface
	Archive VP 150 Tape Drive
	(1) Newbury NDR-4380 SCSI Hard Disk
	(1) Newbury NDR-3380 SCSI Hard Disk Controller
	SCO Xenix 2.3.3GT


Now while I'm not about to swear an oath on a stack of bibles, I do not
suspect a compatibility problem with anything except for the Adaptec
Host Adapter and/or a problem with the SCSI hard disk drives as I have
experianced few if any problems with any of the hardware previously and
the only really new item is the Adaptec and the SCO Xenix 2.3.3GT
Operating System.

I have the Adaptec configured as per the defaults as detailed in the
manual, and the hard disks are set up with the 4380 configured as Target 0
and the 3380 as 1.  The 4380 is divided up into 3 ~100 Mb. filesystems,
and the 3380 is currently divided into 2 ~150 Mb. filesystems (I had
previously set it up the same as the 4380 and experianced the same
problem).

The problem that I have encountered is a consistent lock-up of the system
whenever I write "X" quantity of files to the last filesystem on the 3380
(/usr5 and previously /usr6).  While the crashing is consistent, the
amount of data written to the filesystem is not, it has varied from
anywhere around 23% of capacity to 70% (as obtained using df -v -i).

The symptoms are a sudden lock-up of the system with the LED on the
Adaptec and Drive remaining on.  I've reformatted and verified the drive
three or four times using the Adaptec's ROM based format routine and no
errors have been reported.  After a reboot, fsck reports no errors except
those which would be normal after a system crash.

I've checked all devices and ensured that their are no address, interupt
or DMA channel conflicts.  I've triple checked the controller, cables,
termination (including the stupid little fuse) etc. And according to the
drive manuals, the drive does conform to SCSI standards required by the
Adaptec controller.

The only other thing I can report is that the adfmt command will NOT
format the drive as claimed by SCO, attempts to format the drive using
this command result in a failure do to an "I/O failure", additionally,
the badtrk command does not support SCSI drives.

I really hate bad mouthing SCO, but I find it strange how they can claim
to support a specific product (the Adaptec) and yet not support it in any
meaningful way while running to the bank with my $$$'s.  

Anyway, does anyone have any ideas what the problem might be?  

Roy Neese... are you out their?!!!

Needless to say, any help is greatly appreciated.

Noel
---
Noel B. Del More             |                             decvax!ubbs-nh!noel
17 Meredith Drive            |                             noel@ubbs-nh.mv.com 
Nashua, New Hampshire  03063 | It's unix me son!  `taint spozed tah make cents 

caf@omen.UUCP (Chuck Forsberg) (09/27/89)

In article <131@ubbs-nh.MV.COM> noel@ubbs-nh.MV.COM (N. Del More) writes:
:The symptoms are a sudden lock-up of the system with the LED on the
:Adaptec and Drive remaining on.  I've reformatted and verified the drive
:three or four times using the Adaptec's ROM based format routine and no
:errors have been reported.  After a reboot, fsck reports no errors except
:those which would be normal after a system crash.

I've seen something vaguely similar.  I'm running 3.2, also
with an Apadtec 1540 running the secondary disk.  About twice
in the last month or so the Adaptec became wedged, with the
activity light on the Wren V on steady.  The rest of the
system appeared to run normally (activity on the main (ESDI)
disk, etc.).

I did not see this problem until I made the SCSI drive secondary.
The first 6 months I used the Wren as the only drive, and shall
revert to that configuration as soon as the 3.x dev sys can compile
all my products.

I have a suspicion the problem is related to heat buildup on the
controller board(s).  I haven't had a wedge since I stopped laying
papers/manuals on top of the i/o boards.