[comp.unix.xenix.sco] Summary: Adaptec + Wren V SCSI disk problem resolved

ccfj@hippo.ru.ac.za (F.F. Jacot Guillarmod) (03/21/91)

Many thanks to all those who responded about my SCSI disk drive woes.
I finally resorted to doing a low level format of the disk in question,
which seems to have cured the problem.  Repeated wishful thinking and
fiddling with the cable + connectors didn't seem to help.

An attempt at running the Xenix 'badtrk' utility didn't seem to do
anything useful, but somebody mentions that 'badtrk' and scsi are
mutually exclusive.  The manual certainly didn't.

However, at least one of the follow ups indicates this reformat may not have
been necessary (of course Murphy made this message arrive _after_ all
the blood sweat and tears :-)

My own observations:  This is the first time I have had to go through
such an exercise, and while the software installation manuals have a
few words to say about the possibility of such an operation, they seem
less than crystal clear about when to re-install the backed up root file
system - specifically as to how far to go with re-installing the system
from the original release disks before cutting your losses and 
overwriting things with the backed up versions.  The whole thing was
complicated by the fact that we are running TCP/IP (i.e. special
entries in /etc/dev) and my belated realisation that 'tar' does not
back up block special 'files'.  Re-installing the basic operating
system + TCP/IP and then reloading the root file system from backup did
the trick, but it took a few panic stricken iterations to work this out.
Name of the game, I suppose.  Next time this sort of thing happens it
will take a fraction of the time to resolve.

On the hardware side, the reformatting went without a hitch.  There was
a bit of uncertainty when the low level format was taking place, as it
was unaccompanied by the usual flashing lights and loud clicking, but
most of this took place over lunch, so I may have missed the exciting
bits.

Appended are the responses - I certainly learnt a lot from them and
from the exercise, not least how useful network news can be.

======================================================================
>From: neese@adaptx1.UUCP
>Date: 14 Mar 91 19:27:23 GMT
>References: <667889673@hippo>

>/* ---------- "Problem with CDC Wren V scsi disk -" ---------- */
>We are using a 300 megabyte CDC Wren V + AHA1542A controller on a 386
>clone running SCO Xenix 2.3.2.  All of a sudden, the following errors
>are getting logged:
>
>Fri Mar 1 8:25:34
>
>scsi: ERROR: on disk dev=1/40 ha=0 id=0 lun=0 block=37364
>	sector=76776, cylinder/head = 37/31
>	hst 00 ust 02
>	AHA-1540  cmd  : 0A 01 2B E8 02 00
>	AHA-1540 sense : F0 00 03 00 01 2B E9 0A 00 00 00 00 10 00

This is a hard media error on a write command.  More specifically, it is
a CRC error.  You can get rid of this, on this particular drive, by enabling
the automatic write reallocation bit in the mode page of this drive.  Next
time a write to this block occurs, the drive will reallocate it and replace
it with a good one.  By default, this bit is turned off.
You will need SCSICNTL to do this, unless you want to write the software
yourself.

			Roy Neese
			Adaptec Senior SCSI Applications Engineer
			UUCP @  neese@adaptex
				uunet!cs.utexas.edu!utacfd!merch!adaptex!neese

=========================================================================

In article <ccfj.667889673@hippo> you write:
>We are using a 300 megabyte CDC Wren V + AHA1542A controller on a 386

We had the same problem on an ALR with a bunch of SCSI disks. It has
to do with badtracking. Xenix doesn't badtrack a SCSI disk (The SCSI
is supposed to take care of errors), yet every once in a while, Xenix
hits one before the controller can cover up. 

Two options: Check the cabling; our problem was caused by a corrupt
SCSI cable that made it *look* like there were errors. Second option
is to make a full backup (two if you believe in Murphy's Law), then
attack the disk with DOS debug by getting into the controller's setup
program (I think -g=dc00:6 will do, but that depends on your
controller and don't quote me on it) and doing a low-level format of
the drive. Check for disk errors while you're there. THis should
create some form of a bad track table and take care of your problems.

Good luck.

Oh, btw, if you don't do something soon, you *WILL* lose data.


-- 
Sean Fulton					sean@utoday.com
UNIX Today!					(516) 562-5430
 /* The opinions expressed above are not those of my employer */

=========================================================================

In article <ccfj.667889673@hippo> you write:
>We are using a 300 megabyte CDC Wren V + AHA1542A controller on a 386

F.F.

   I am by no means a SCSI expert, but I do have a couple of SCSI drive specs.
I also have a copy of the SCSI-2 Working Draft Proposal Revision 10b.  The info
on the front says that you can buy a copy of the document from:

Global Engineering Documents
2805 McGaw
Irvine, CA 92714

It goes on further to say that you should refer to document X3.131-198X.  It
is a proposed ANSI standard.  It should give us enough information to answer
your question.

The 'cmd' line is interpreted as follows:

The '0A' is a Write (6 byte) command.  The 6 byte field says (I think) that 
you are using non-extended Command Descriptor Blocks.  The '0' in the byte
'01' is the Logical Unit Number.  The '1' in the byte '01' and the '2BE8'
in the following bytes is the logical block address that you were writing
to.  The '02' is the number of contiguous blocks of data to be transferred.
So, you can see that it is a write command to logical block 12BE8.

>	AHA-1540 sense : F0 00 03 00 01 2B E9 0A 00 00 00 00 10 00
>

The above line is the result of issuing a Request Sense command.  Assume each
of the bits are numbered 0 -> 7 with bit 7 on the left.  The 7 bit in 'F0' is
the Valid bit. It says that the information field (bytes 3 -> 6) conform to
the SCSI-2 standard.  By the way, the bytes are numered 0 -> 7 with byte zero
on the left.  The '70' says that this is a "current error" as opposed to a
"deferred error".  Byte 1 is the Segment Number.  For the Write command, this
is an unused field.  The '0' in byte 2 says that it the Filemark bit is zero 
(not used for direct access devices), no End Of Medium (for sequential access
devices), no Incorrect Length Indicator, and bit 4 is reserved.  The '3' in
byte 2 says that we received Sense Key 3.  Sense Key 3 says "MEDIUM ERROR. 
Indicates that the command terminated with a non-recovered error condition
that was probably caused by a flaw in the  medium or an error in the recorded
data.  This sense key msy also be returned if the target is unable to
distinguish between a flaw in the medium and a specific hardware failure (sense
key 4).  Bytes 3 -> 6 are the Information field.  In this case it is the 
logical block address associated with the sense key.  Byte 7 is supposed to
be the additional sense length.  It looks as though it's a little short :-)
the data that is.  Anyway, bytes 12 & 13 say that you received an ID, CRC, or
ECC error when writing to the disk.

Sorry to be so lengthy, but thought this the best way to explain it.  You're
getting a write error.  Now, you need to determine the pattern.  If it's
occurring at the same logical block, then you need to flaw the track.  If
it's happening at more or less random spots, you MAY have a controller
problem.  Hope this helps.

-- 

Doug Marshall   <Doug.Marshall@SanDiego.NCR.COM>  * My humble opinions and
+1 619 485 3494 <...!ncr-sd!palomar!dougm>        * ideas are just that.
"All of us is smarter than each of us!" 

==========================================================================

>1 - what is going on?

It is a media error, 3rd sense byte = 03, and ID CRC error, 13th byte = 10, on
sector x012be9.  The command was a write.  Hence, you didn't write that sector.
You will have problem reading the data back from that sector.

>2 - what needs to be done to fix the problem?

Use SCSI reassign sector command to get rid of the bad sector x012be9.  You should
have a utility program which sends the SCSI command.

>3 - which FM's should I be reading to find out more?

Read the CDC product spec.

Good luck.

Yu-Ping Cheng, Auspex Systems Inc., ycheng@auspex.com

==========================================================================

> 
> scsi: ERROR: on disk dev=1/40 ha=0 id=0 lun=0 block=37364
> 	sector=76776, cylinder/head = 37/31
> 	hst 00 ust 02
> 	AHA-1540  cmd  : 0A 01 2B E8 02 00
                         |  '------'  |
                         |     |      |
                         |     |      |
                         |     |      `--- number of blocks
                         |     `---------- block number
                         `---------------- write command

> 	AHA-1540 sense : F0 00 03 00 01 2B E9 0A 00 00 00 00 10 00
sorry - I have no SCSI-Spec here.
	Matthias

==========================================================================

From: Alessandro.Forin@SPICE.CS.CMU.EDU

You have a bad block (blockno 12BE9).  Try reformatting.  sandro-

==========================================================================

My thanks again to those who took the time and trouble to respond.  Your
analysis and advice are appreciated.

--
     F.F.  Jacot Guillarmod - Computing  Centre - Rhodes  University
     Artillery Road - P.O Box 94 - Grahamstown - 6140 - South Africa
     Internet: ccfj@hippo.ru.ac.za    Phone: +27 [0]461 22023 xt 284
     uucp: ..!uunet!m2xenix!quagga!hippo!ccfj  Fax: +27 [0]461 25049