marc@dumbcat.sf.ca.us (Marco S Hyman) (06/06/91)
I haven't found this in TFM yet -- perhaps the net can help. Given an error message that says something like "SCSI absolute sector 1234 on drive 1 is bad" how can I map this sector number to a file/directory/(inode!). I've looked at /etc/partitions and can figure out what partition the error is in (I think) but the intricacies of fsdb escape me. Perhaps there is more doc than the man page available? Some other hidden gem? Something so obvious I'll be forever embarrassed that I missed it? Anything! Any and all help gladly accepted. // marc -- // home: marc@dumbcat.sf.ca.us pacbell!dumbcat!marc // work: marc@ascend.com uunet!aria!marc
cpcahil@virtech.uucp (Conor P. Cahill) (06/06/91)
marc@dumbcat.sf.ca.us (Marco S Hyman) writes: >I haven't found this in TFM yet -- perhaps the net can help. Given an error >message that says something like "SCSI absolute sector 1234 on drive 1 is bad" >how can I map this sector number to a file/directory/(inode!). I've looked at Read mkpart(1M), specifically the -A flag. Note that if you do add a bad sector, you should place an entry into /etc/partitions marking that sector as bad ("badsec =" - also documented on mkpart(1M)). -- Conor P. Cahill (703)430-9247 Virtual Technologies, Inc. uunet!virtech!cpcahil 46030 Manekin Plaza, Suite 160 Sterling, VA 22170
gary@sci34hub.sci.com (Gary Heston) (06/07/91)
In article <767@dumbcat.sf.ca.us> marc@dumbcat.sf.ca.us (Marco S Hyman) writes: >I haven't found this in TFM yet -- perhaps the net can help. Given an error >message that says something like "SCSI absolute sector 1234 on drive 1 is bad" >how can I map this sector number to a file/directory/(inode!). I've looked at >/etc/partitions and can figure out what partition the error is in (I think) >but the intricacies of fsdb escape me. Perhaps there is more doc than the man >page available? Some other hidden gem? Something so obvious I'll be forever >embarrassed that I missed it? Anything! I have run into this problem in the past, and came up with a fairly simple work-around that narrows it down to which file contains the bad sector: tar -cvf /dev/null / and watch for the error to hit. This works with tar because tar displays the filename when it starts trying to read it; whereas cpio deals with the file and then displays the name. You can, of course, redirect the output. -- Gary Heston System Mismanager and technoflunky uunet!sci34hub!gary or My opinions, not theirs. SCI Systems, Inc. gary@sci34hub.sci.com I support drug testing. I believe every public official should be given a shot of sodium pentathol and ask "Which laws have you broken this week?".
del@fnx.UUCP (Dag Erik Lindberg) (06/08/91)
In article <1991Jun06.123852.29851@virtech.uucp> cpcahil@virtech.uucp (Conor P. Cahill) writes: >marc@dumbcat.sf.ca.us (Marco S Hyman) writes: > >>I haven't found this in TFM yet -- perhaps the net can help. Given an error >>message that says something like "SCSI absolute sector 1234 on drive 1 is bad" >>how can I map this sector number to a file/directory/(inode!). I've looked at > >Read mkpart(1M), specifically the -A flag. Note that if you do add a bad >sector, you should place an entry into /etc/partitions marking that sector >as bad ("badsec =" - also documented on mkpart(1M)). I don't think this is what was asked. Marco wants to find out which *file* is corrupt because of the bad sector. And I can relate to his problem, having had a similar problem on a customer machine. Given the bad sector error, it is trivial to 'mkpart' the sector into the bad sector list, but how do you insure the file system is ok without restoring from a tape? In the case I had to deal with, the customer did not have a current backup of the system. While I could make a backup of the system, there were errors during the backup, and it was not clear from the output of cpio which files were trashed, as the output of cpio is buffered independently of stderr. Someone advised me to mark the sectors as bad, then fsck the drive, and fsck would report the truncated files. Well, it didn't, and then I was left with only a list of bad sectors. I spent a great deal of time getting that system fixed up. Unless someone knows another method, the only thing I can think of for t this situation is: find / -print -exec cp {} /dev/null \; which I have not tried. I suspect it would only work if run from the system console, and if the system console were a hardcopy device (or you are extremely patient). Note that trapping stderr from the find command would not necessarily tell you anything, as the console error messages are not going through stderr! -- del AKA Erik Lindberg uunet!pilchuck!fnx!del Who is John Galt?
ed@mtxinu.COM (Ed Gould) (06/08/91)
> I haven't found this in TFM yet -- perhaps the net can help. Given > an error message that says something like "SCSI absolute sector > 1234 on drive 1 is bad" how can I map this sector number to a > file/directory/(inode!). The tool to do this is icheck, if it exists in your version of Unix. However, it's a three-step process. First, you need to determine the relative sector number in the filesystem affected. If the driver is well written, it will report both absolute and relative sector numbers. If not, you'll have to subtract the starting sector number of the filesystem partition from the absolute sector number. (Be careful - partition offsets are often specified in cylinders, not sectors.) Second, the sector number must be converted into a filesystem block number; some drivers will report it as well. This is a simple matter of division, dividing the sector number by the blocking factor, being careful to round up properly. The "blocking factor" is the number of sectors per filesystem block: If you have a 1024-byte-block filesystem and 512-byte sectors, the factor is 2. This filesystem block number can be fed to icheck, which will report the inode number of the file containing the block. If you want the name(s) of that file, then feed the inumber to ncheck. -- Ed Gould No longer formally affiliated with, ed@mtxinu.COM and certainly not speaking for, mt Xinu. "I'll fight them as a woman, not a lady. I'll fight them as an engineer."
cpcahil@virtech.uucp (Conor P. Cahill) (06/10/91)
del@fnx.UUCP (Dag Erik Lindberg) writes: >I don't think this is what was asked. Marco wants to find out which *file* >is corrupt because of the bad sector. And I can relate to his problem, Your right, I misread the questiong. >having had a similar problem on a customer machine. Given the bad sector >error, it is trivial to 'mkpart' the sector into the bad sector list, >but how do you insure the file system is ok without restoring from a tape? If a backup is available, I would low level format the drive and reload the system. My reasoning for this is that if one sector goes bad, it is likely that more will follow. A low level format (along with correct entry of the manufacturers bad sector list) usually goes a long way towards ensuring that you don't have the same problem again (although, given time it will probably happen again). -- Conor P. Cahill (703)430-9247 Virtual Technologies, Inc. uunet!virtech!cpcahil 46030 Manekin Plaza, Suite 160 Sterling, VA 22170
marc@dumbcat.sf.ca.us (Marco S Hyman) (06/14/91)
In article <1991Jun10.134714.28189@virtech.uucp> cpcahil@virtech.uucp (Conor P. Cahill) writes: > If a backup is available, I would low level format the drive and reload the > system. My reasoning for this is that if one sector goes bad, it is likely > that more will follow. A low level format (along with correct entry of > the manufacturers bad sector list) usually goes a long way towards ensuring > that you don't have the same problem again (although, given time it will > probably happen again). That is exactly what I did. The surprising part is that the manufacturer defect list is empty on both disks and that the 386/ix format/scan (or does it use the AHA 1452 format/scan?) has never found an error. I entered the ones that I noted on a manual log, though. We'll see how long this lasts. It seems I have to do this every 6 months or so. In the mean while I think it's time to start on a utility that maps sectors to files. // marc -- // home: marc@dumbcat.sf.ca.us pacbell!dumbcat!marc // work: marc@ascend.com uunet!aria!marc
cpcahil@virtech.uucp (Conor P. Cahill) (06/14/91)
marc@dumbcat.sf.ca.us (Marco S Hyman) writes: >That is exactly what I did. The surprising part is that the manufacturer >defect list is empty on both disks and that the 386/ix format/scan (or does it >use the AHA 1452 format/scan?) has never found an error. I entered the ones I strongly recommend against using the OS low level format utility. Most controllers have a bios formatting utility that will be much better than OS utility. >that I noted on a manual log, though. We'll see how long this lasts. It >seems I have to do this every 6 months or so. This could be that you have a weak drive, or that the OS format utility just isn't good enough. If I had repeating bad sectors popping up every once in a while, I would make sure I have a good backup procedure. -- Conor P. Cahill (703)430-9247 Virtual Technologies, Inc. uunet!virtech!cpcahil 46030 Manekin Plaza, Suite 160 Sterling, VA 22170
cmf851@anu.oz.au (Albert Langer) (06/15/91)
In article <1055@dumbcat.sf.ca.us> marc@dumbcat.sf.ca.us (Marco S Hyman) writes: >That is exactly what I did. The surprising part is that the manufacturer >defect list is empty on both disks and that the 386/ix format/scan (or does it >use the AHA 1452 format/scan?) has never found an error. I entered the ones >that I noted on a manual log, though. We'll see how long this lasts. It >seems I have to do this every 6 months or so. Sorry if I have misunderstood this thread. My understanding is that SCSI drives normally map out bad sectors themselves and neither report defects to the operating system nor make use of a manufacturer's defect list. If that is wrong, somebody please tell me. If it is right then the discussion seems pointless unless I have misunderstood it. (I am assuming that AHA 1452 is a typo for AHA 1542 SCSI host adaptor). -- Opinions disclaimed (Authoritative answer from opinion server) Header reply address wrong. Use cmf851@csc2.anu.edu.au
marc@dumbcat.sf.ca.us (Marco S Hyman) (06/16/91)
In article <1991Jun14.181849.3725@newshost.anu.edu.au> cmf851@anu.oz.au (Albert Langer) writes: > Sorry if I have misunderstood this thread. My understanding is that > SCSI drives normally map out bad sectors themselves and neither report > defects to the operating system nor make use of a manufacturer's defect > list. If that is wrong, somebody please tell me. If it is right then > the discussion seems pointless unless I have misunderstood it. > > (I am assuming that AHA 1452 is a typo for AHA 1542 SCSI host adaptor). Yep. The SCSI controller is a 1542A. Using 386/ix 2.0.2 and a pair of Seagate 80 MByte drives (I forget the number) I get hard errors reported to the console. Automatic mad sector mapping is NOT performed. This is a GOOD thing as the hard errors are usually (more than 98% of the time) not hard errors. That is I can copy files, get errors on the original file, look at the copy, and find nothing wrong. I suspect the cheap Seagate drives -- or the fact that I'm running two of them. The last time I mapped out a bad sector by hand I lost a chunk of the /usr/lib/news directory. (I always wait until after doing a full backup before mapping anything out). Think of the problems that would occur of this happened automatically. // marc -- // home: marc@dumbcat.sf.ca.us pacbell!dumbcat!marc // work: marc@ascend.com uunet!aria!marc
rmk@rmkhome.UUCP (Rick Kelly) (06/18/91)
In article <1058@dumbcat.sf.ca.us> marc@dumbcat.sf.ca.us (Marco S Hyman) writes: >In article <1991Jun14.181849.3725@newshost.anu.edu.au> cmf851@anu.oz.au (Albert Langer) writes: > > Sorry if I have misunderstood this thread. My understanding is that > > SCSI drives normally map out bad sectors themselves and neither report > > defects to the operating system nor make use of a manufacturer's defect > > list. If that is wrong, somebody please tell me. If it is right then > > the discussion seems pointless unless I have misunderstood it. > > > > (I am assuming that AHA 1452 is a typo for AHA 1542 SCSI host adaptor). > >Yep. The SCSI controller is a 1542A. Using 386/ix 2.0.2 and a pair of >Seagate 80 MByte drives (I forget the number) I get hard errors reported to >the console. Automatic mad sector mapping is NOT performed. This is a GOOD >thing as the hard errors are usually (more than 98% of the time) not hard >errors. That is I can copy files, get errors on the original file, look at >the copy, and find nothing wrong. I suspect the cheap Seagate drives -- or >the fact that I'm running two of them. > >The last time I mapped out a bad sector by hand I lost a chunk of the >/usr/lib/news directory. (I always wait until after doing a full backup >before mapping anything out). Think of the problems that would occur of this >happened automatically. However, most SCSI drives can be modeselected to do auto bad sector mapping. But, as you say, this isn't the most desirable option. The Bernoulli box does this by default. Rick Kelly rmk@rmkhome.UUCP frog!rmkhome!rmk rmk@frog.UUCP
cmf851@anu.oz.au (Albert Langer) (06/21/91)
In article <9106171448.32@rmkhome.UUCP> rmk@rmkhome.UUCP (Rick Kelly) quotes and writes: >>The last time I mapped out a bad sector by hand I lost a chunk of the >>/usr/lib/news directory. (I always wait until after doing a full backup >>before mapping anything out). Think of the problems that would occur of this >>happened automatically. >However, most SCSI drives can be modeselected to do auto bad sector mapping. >But, as you say, this isn't the most desirable option. The Bernoulli box >does this by default. Why do you agree that auto bad sector mapping is undesirable? The argument quoted assumed that it would "automatically" lose chunks of needed files, when in fact that was clearly a result of NOT implementing automatic SCSI re-mapping and instead waiting until an unrecoverable hard error had actually lost data. My understanding is that the automatic remapping would occur when "too many" soft errors were happening for a particular sector (as defined by those with the best knowledge of drive failure characteristics - the drive manufacturer). This would result in the data being preserved by the remapping, so no missing chunks. Waiting for a "hard" failure on the other hand would result in manual re-mapping and lost data. -- Opinions disclaimed (Authoritative answer from opinion server) Header reply address wrong. Use cmf851@csc2.anu.edu.au
marc@dumbcat.sf.ca.us (Marco S Hyman) (06/22/91)
In article <1991Jun20.172754.13086@newshost.anu.edu.au> cmf851@anu.oz.au (Albert Langer) writes: > Why do you agree that auto bad sector mapping is undesirable? The problem is that many have never seen automatic re-mapping (no matter what the book says :-). If remaping only occurs after a recoverable error with the new sector written with the old data (as the manuals imply) this would be a good thing. However. Given my hardware/software I'm seeing a bogus hard error that, if it occurs at just the wrong time, causes a panic. Note: The data is just fine. A previous poster mentioned that most SCSI drives could be mode selected to do automatic bad sector mapping. Excuse my ignorance: How do I do this? // marc -- // home: marc@dumbcat.sf.ca.us pacbell!dumbcat!marc // work: marc@ascend.com uunet!aria!marc