[comp.unix.xenix] Disk Gobble-d-gook

kessler%cons.utah.edu@wasatch.utah.edu (Robert R. Kessler) (05/16/89)

One of our customers had a strange occurrence that I was wondering if
anyone else has seen.  They are running an IBM PS/2 Model 80, with an
8 port Hostess board, and a Mountain Tape Drive.  They have 4 Meg of
memory and the 115 Meg Disk drive.  We haven't upgraded them to 2.3
yet, so they are running the last 2.2 version of SCO Xenix.

One evening, they were quitting for the day and doing their nightly
backup.  This is accomplished by getting everyone out of the system,
doing a shutdown, then logging in as backup and putting in the tape.
When the tar finished, she pulled the tape out and suddenly the system
just shut itself down.  Odd.  Anyway, she then rebooted (doing an
fsck, which she said found lots of problems -- although we don't have
any specific details) and it seemed to come up fine.  She then started
"NIGHTRUN" which does various nightly tasks.  It printed a few pages
and then the system crashed again.

We then did some investigating and discovered that the disk had become
nearly totally trashed.  Some directories and their files were totally
missing.  Other files had sizes of 4 Gigabytes, their status was
changed to c instead of d for directory, etc. etc. etc.  It was really
a mess.

Luckily, the backup that she had done was successful and we were able
to recover the various parts of the system that were trashed.
However, I am worried that there is something lurking there that could
bite us again.  Could it be a hardware problem?  Power problem?  Does
anyone have any ideas?  We will be calling SCO today for their advice.

Thanks.
B.

root@consult.UUCP (Super user) (05/17/89)

In article <1839@wasatch.utah.edu> kessler%cons.utah.edu@wasatch.utah.edu (Robert R. Kessler) writes:
>One of our customers had a strange occurrence that I was wondering if
>anyone else has seen.  They are running an IBM PS/2 Model 80, with an
>yet, so they are running the last 2.2 version of SCO Xenix.
>We then did some investigating and discovered that the disk had become
>nearly totally trashed.  Some directories and their files were totally
>missing.  Other files had sizes of 4 Gigabytes, their status was
>bite us again.  Could it be a hardware problem?  Power problem?  Does
>anyone have any ideas?  We will be calling SCO today for their advice.


Here is one idea.  There was a problem with Release 2.2.2 and back
regarding the dskinit program when the system was installed.  Caused
a nasty little problem, formatted the driving using 112 cylinder 
instead of 110!!  This caused the swap file (end of drive) to be
partially located over the non-existent space.
This could cause some rather weird problems and various panic conditions.
It is fixed in 2.2.3 (xnx116 or xnx117).
..
This is just one idea.
But this one took a bite out of us, so thought I would share it with you
in the hopes it may prevent another user of the catastrophre we had.
..

clewis@eci386.uucp (Chris Lewis) (05/18/89)

In article <1839@wasatch.utah.edu> kessler%cons.utah.edu@wasatch.utah.edu (Robert R. Kessler) writes:

>One of our customers had a strange occurrence that I was wondering if
>anyone else has seen....

>One evening, they were quitting for the day and doing their nightly
>backup.  This is accomplished by getting everyone out of the system,
>doing a shutdown, then logging in as backup and putting in the tape.
>When the tar finished, she pulled the tape out and suddenly the system
>just shut itself down.  Odd.  Anyway, she then rebooted (doing an
>fsck, which she said found lots of problems ...

etc.

We have seen the occasional machine go down when you physically touch the
tape and/or tape drive (or in some cases simply stand up or sit down within
5 feet of the system).  We call it the "stand up, sit down, crash, crash, 
crash" crash ;-{

In our case, the system behaves *EXACTLY* as if you had hit the reset button.
Memory tests, Wangtek tape drive resets, the works.  (this is AT-style 386
by the way)

At least in our case, this appears to be related to static discharges.  If
you took a static hit while the disk was writing or something like that,
your disk could be severely scrambled - eg: zaps in the superblock or inode
table.  Depending on how bad it is, fsck may not be able to recover anything
sane.

We've been able to (cross-our-fingers) completely eliminate this problem
by:

	- make sure that every internal peripheral has a ground strap.
	  (yeah, I know, the manufacturer doesn't think these are necessary
	  any more, but their machines are crashing)
	- make sure that every external connector that has a metal shell
	  is firmly grounded.  (In our case, the COMM port studs didn't
	  make electrical contact to the chassis, so we put star washers
	  under 'em).

I assume that your PS-2/80 is probably okay w.r.t. the above, but you may
want to check into the grounding of other things like the multi-port boards.
-- 
Chris Lewis, R.H. Lathwell & Associates: Elegant Communications Inc.
UUCP: {uunet!mnetor, utcsri!utzoo}!lsuc!gate!eci386!clewis
Phone: (416)-595-5425