[unix-pc.general] Disk error decoding, and causes

kevin@kosman.UUCP (Kevin O'Gorman) (01/11/90)
I've got a spare machine I haven't used since July.  Sure enough, when
I cranked it up, I heard lots of retries happening.

There are lots of disk errors in the unix.log, and they keep coming
because smgr is stepping on one of the bad spots.  The system does come
up, and it sort of limps along, so I've got something to work with.

I would like to map the bad sectors out, if that's the right thing to
do, but there are several things to notice:

This machine has both a WD2010 and the P5.1 upgrade.  The disk is a
ST4096, and is formatted with 9 heads.  The procedures I last saw
posted showed how to decode the error messages from unix.log, but
showed only 3 head bits.  Is the fourth bit in the messages anywhere?
Otherwise, how am I going to know the difference between head 0 and
head 8???

A typical error message is:
HDERR ST:51 EF:10 CL:FFEA CH:FF03 SN:FF05 SC:FF01 SDH:FF22 DMACNT:FFFF DCRREG:92
 MCRREG:C900 and a date

I worry about that EF:10.  I'm not used to getting that.  If I interpret it
right, it means the machine is having trouble finding sector headers.  All
the messages show the same thing.  I'm used to EF:40 which is a data CRC
error, and much more normal.

Anyway, does anyone know where to find the fourth head bit in the messages,
and does anyone have a suggestion about what happened to my disk???

-- 
Kevin O'Gorman ( kevin@kosman.UUCP, kevin%kosman.uucp@nrc.com )
voice: 805-984-8042 Vital Computer Systems, 5115 Beachcomber, Oxnard, CA  93035
Non-Disclaimer: my boss is me, and he stands behind everything I say.