ccfj@hippo.ru.ac.za (F.F. Jacot Guillarmod) (03/21/91)
Many thanks to all those who responded about my SCSI disk drive woes. I finally resorted to doing a low level format of the disk in question, which seems to have cured the problem. Repeated wishful thinking and fiddling with the cable + connectors didn't seem to help. An attempt at running the Xenix 'badtrk' utility didn't seem to do anything useful, but somebody mentions that 'badtrk' and scsi are mutually exclusive. The manual certainly didn't. However, at least one of the follow ups indicates this reformat may not have been necessary (of course Murphy made this message arrive _after_ all the blood sweat and tears :-) My own observations: This is the first time I have had to go through such an exercise, and while the software installation manuals have a few words to say about the possibility of such an operation, they seem less than crystal clear about when to re-install the backed up root file system - specifically as to how far to go with re-installing the system from the original release disks before cutting your losses and overwriting things with the backed up versions. The whole thing was complicated by the fact that we are running TCP/IP (i.e. special entries in /etc/dev) and my belated realisation that 'tar' does not back up block special 'files'. Re-installing the basic operating system + TCP/IP and then reloading the root file system from backup did the trick, but it took a few panic stricken iterations to work this out. Name of the game, I suppose. Next time this sort of thing happens it will take a fraction of the time to resolve. On the hardware side, the reformatting went without a hitch. There was a bit of uncertainty when the low level format was taking place, as it was unaccompanied by the usual flashing lights and loud clicking, but most of this took place over lunch, so I may have missed the exciting bits. Appended are the responses - I certainly learnt a lot from them and from the exercise, not least how useful network news can be. ====================================================================== >From: neese@adaptx1.UUCP >Date: 14 Mar 91 19:27:23 GMT >References: <667889673@hippo> >/* ---------- "Problem with CDC Wren V scsi disk -" ---------- */ >We are using a 300 megabyte CDC Wren V + AHA1542A controller on a 386 >clone running SCO Xenix 2.3.2. All of a sudden, the following errors >are getting logged: > >Fri Mar 1 8:25:34 > >scsi: ERROR: on disk dev=1/40 ha=0 id=0 lun=0 block=37364 > sector=76776, cylinder/head = 37/31 > hst 00 ust 02 > AHA-1540 cmd : 0A 01 2B E8 02 00 > AHA-1540 sense : F0 00 03 00 01 2B E9 0A 00 00 00 00 10 00 This is a hard media error on a write command. More specifically, it is a CRC error. You can get rid of this, on this particular drive, by enabling the automatic write reallocation bit in the mode page of this drive. Next time a write to this block occurs, the drive will reallocate it and replace it with a good one. By default, this bit is turned off. You will need SCSICNTL to do this, unless you want to write the software yourself. Roy Neese Adaptec Senior SCSI Applications Engineer UUCP @ neese@adaptex uunet!cs.utexas.edu!utacfd!merch!adaptex!neese ========================================================================= In article <ccfj.667889673@hippo> you write: >We are using a 300 megabyte CDC Wren V + AHA1542A controller on a 386 We had the same problem on an ALR with a bunch of SCSI disks. It has to do with badtracking. Xenix doesn't badtrack a SCSI disk (The SCSI is supposed to take care of errors), yet every once in a while, Xenix hits one before the controller can cover up. Two options: Check the cabling; our problem was caused by a corrupt SCSI cable that made it *look* like there were errors. Second option is to make a full backup (two if you believe in Murphy's Law), then attack the disk with DOS debug by getting into the controller's setup program (I think -g=dc00:6 will do, but that depends on your controller and don't quote me on it) and doing a low-level format of the drive. Check for disk errors while you're there. THis should create some form of a bad track table and take care of your problems. Good luck. Oh, btw, if you don't do something soon, you *WILL* lose data. -- Sean Fulton sean@utoday.com UNIX Today! (516) 562-5430 /* The opinions expressed above are not those of my employer */ ========================================================================= In article <ccfj.667889673@hippo> you write: >We are using a 300 megabyte CDC Wren V + AHA1542A controller on a 386 F.F. I am by no means a SCSI expert, but I do have a couple of SCSI drive specs. I also have a copy of the SCSI-2 Working Draft Proposal Revision 10b. The info on the front says that you can buy a copy of the document from: Global Engineering Documents 2805 McGaw Irvine, CA 92714 It goes on further to say that you should refer to document X3.131-198X. It is a proposed ANSI standard. It should give us enough information to answer your question. The 'cmd' line is interpreted as follows: The '0A' is a Write (6 byte) command. The 6 byte field says (I think) that you are using non-extended Command Descriptor Blocks. The '0' in the byte '01' is the Logical Unit Number. The '1' in the byte '01' and the '2BE8' in the following bytes is the logical block address that you were writing to. The '02' is the number of contiguous blocks of data to be transferred. So, you can see that it is a write command to logical block 12BE8. > AHA-1540 sense : F0 00 03 00 01 2B E9 0A 00 00 00 00 10 00 > The above line is the result of issuing a Request Sense command. Assume each of the bits are numbered 0 -> 7 with bit 7 on the left. The 7 bit in 'F0' is the Valid bit. It says that the information field (bytes 3 -> 6) conform to the SCSI-2 standard. By the way, the bytes are numered 0 -> 7 with byte zero on the left. The '70' says that this is a "current error" as opposed to a "deferred error". Byte 1 is the Segment Number. For the Write command, this is an unused field. The '0' in byte 2 says that it the Filemark bit is zero (not used for direct access devices), no End Of Medium (for sequential access devices), no Incorrect Length Indicator, and bit 4 is reserved. The '3' in byte 2 says that we received Sense Key 3. Sense Key 3 says "MEDIUM ERROR. Indicates that the command terminated with a non-recovered error condition that was probably caused by a flaw in the medium or an error in the recorded data. This sense key msy also be returned if the target is unable to distinguish between a flaw in the medium and a specific hardware failure (sense key 4). Bytes 3 -> 6 are the Information field. In this case it is the logical block address associated with the sense key. Byte 7 is supposed to be the additional sense length. It looks as though it's a little short :-) the data that is. Anyway, bytes 12 & 13 say that you received an ID, CRC, or ECC error when writing to the disk. Sorry to be so lengthy, but thought this the best way to explain it. You're getting a write error. Now, you need to determine the pattern. If it's occurring at the same logical block, then you need to flaw the track. If it's happening at more or less random spots, you MAY have a controller problem. Hope this helps. -- Doug Marshall <Doug.Marshall@SanDiego.NCR.COM> * My humble opinions and +1 619 485 3494 <...!ncr-sd!palomar!dougm> * ideas are just that. "All of us is smarter than each of us!" ========================================================================== >1 - what is going on? It is a media error, 3rd sense byte = 03, and ID CRC error, 13th byte = 10, on sector x012be9. The command was a write. Hence, you didn't write that sector. You will have problem reading the data back from that sector. >2 - what needs to be done to fix the problem? Use SCSI reassign sector command to get rid of the bad sector x012be9. You should have a utility program which sends the SCSI command. >3 - which FM's should I be reading to find out more? Read the CDC product spec. Good luck. Yu-Ping Cheng, Auspex Systems Inc., ycheng@auspex.com ========================================================================== > > scsi: ERROR: on disk dev=1/40 ha=0 id=0 lun=0 block=37364 > sector=76776, cylinder/head = 37/31 > hst 00 ust 02 > AHA-1540 cmd : 0A 01 2B E8 02 00 | '------' | | | | | | | | | `--- number of blocks | `---------- block number `---------------- write command > AHA-1540 sense : F0 00 03 00 01 2B E9 0A 00 00 00 00 10 00 sorry - I have no SCSI-Spec here. Matthias ========================================================================== From: Alessandro.Forin@SPICE.CS.CMU.EDU You have a bad block (blockno 12BE9). Try reformatting. sandro- ========================================================================== My thanks again to those who took the time and trouble to respond. Your analysis and advice are appreciated. -- F.F. Jacot Guillarmod - Computing Centre - Rhodes University Artillery Road - P.O Box 94 - Grahamstown - 6140 - South Africa Internet: ccfj@hippo.ru.ac.za Phone: +27 [0]461 22023 xt 284 uucp: ..!uunet!m2xenix!quagga!hippo!ccfj Fax: +27 [0]461 25049