[comp.sys.sun] SS1+ SCSI disk woes

jch@devvax.tn.cornell.edu (Jeffrey C Honig) (10/08/90)

I'm having strange behaviour on my SS1+ with external SCSI drive.  The
system is configured with two internal Sun drives, 3 external HP 660M
(formatted), a Sun 150M QIC tape and an Exabyte with a bus terminator.
All the external cables are 2' long keeping us under the maximum length.
I am currently not using the internal drives; this SS1+ is playing server
until our real SS1+ server arrives.

What happens is that some file gets modified.  This usually shows up as an
executable that starts dumping core.  Comparing it to an identical copy
from backups or another system shows definite differences.  But the
strange problem is that eventually the problem goes away, although it may
require a reboot, and the file is back to normal.

No messages are printed.  I would suspect a parity error, but that should
be caught by error checking, shouldn't it?  That leaves me to suspect the
imbedded controller on one of the HP disks, but the problem is not limited
to one disk.  I guess it could also be software, does SunOS do much
caching of disk data?

Has anyone seen this or a similar problem?  Can anyone suggest any
solutions?

Thanks.
Jeff

hedrick@athos.rutgers.edu (Charles Hedrick) (11/02/90)

>What happens is that some file gets modified.  This usually shows up as an
>executable that starts dumping core.  Comparing it to an identical copy
>from backups or another system shows definite differences.  But the
>strange problem is that eventually the problem goes away, although it may
>require a reboot, and the file is back to normal.

Sounds to me like the dreaded "confused file problem".  One block of the
file, typically the first one, turns into the corresponding block of a
different file.  Typically it happens to files accessed via NFS, but we've
seen it in local files too.  There is supposedly a fix available from Sun.