[comp.unix.sysv386] Mapping abs sector numbers to files

marc@dumbcat.sf.ca.us (Marco S Hyman) (06/06/91)

I haven't found this in TFM yet -- perhaps the net can help.  Given an error
message that says something like "SCSI absolute sector 1234 on drive 1 is bad"
how can I map this sector number to a file/directory/(inode!).  I've looked at
/etc/partitions and can figure out what partition the error is in (I think)
but the intricacies of fsdb escape me.  Perhaps there is more doc than the man
page available?  Some other hidden gem?  Something so obvious I'll be forever
embarrassed that I missed it?  Anything!

Any and all help gladly accepted.

// marc
-- 
// home: marc@dumbcat.sf.ca.us		pacbell!dumbcat!marc
// work: marc@ascend.com		uunet!aria!marc

cpcahil@virtech.uucp (Conor P. Cahill) (06/06/91)

marc@dumbcat.sf.ca.us (Marco S Hyman) writes:

>I haven't found this in TFM yet -- perhaps the net can help.  Given an error
>message that says something like "SCSI absolute sector 1234 on drive 1 is bad"
>how can I map this sector number to a file/directory/(inode!).  I've looked at

Read mkpart(1M), specifically the -A flag.  Note that if you do add a bad
sector, you should place an entry into /etc/partitions marking that sector
as bad ("badsec =" - also documented on mkpart(1M)).

-- 
Conor P. Cahill            (703)430-9247        Virtual Technologies, Inc.
uunet!virtech!cpcahil                           46030 Manekin Plaza, Suite 160
                                                Sterling, VA 22170 

gary@sci34hub.sci.com (Gary Heston) (06/07/91)

In article <767@dumbcat.sf.ca.us> marc@dumbcat.sf.ca.us (Marco S Hyman) writes:
>I haven't found this in TFM yet -- perhaps the net can help.  Given an error
>message that says something like "SCSI absolute sector 1234 on drive 1 is bad"
>how can I map this sector number to a file/directory/(inode!).  I've looked at
>/etc/partitions and can figure out what partition the error is in (I think)
>but the intricacies of fsdb escape me.  Perhaps there is more doc than the man
>page available?  Some other hidden gem?  Something so obvious I'll be forever
>embarrassed that I missed it?  Anything!

I have run into this problem in the past, and came up with a fairly simple
work-around that narrows it down to which file contains the bad sector:

	tar -cvf /dev/null /

and watch for the error to hit.

This works with tar because tar displays the filename when it starts trying
to read it; whereas cpio deals with the file and then displays the name.

You can, of course, redirect the output.

-- 
Gary Heston   System Mismanager and technoflunky   uunet!sci34hub!gary or
My opinions, not theirs.    SCI Systems, Inc.       gary@sci34hub.sci.com
I support drug testing. I believe every public official should be given a
shot of sodium pentathol and ask "Which laws have you broken this week?".

del@fnx.UUCP (Dag Erik Lindberg) (06/08/91)

In article <1991Jun06.123852.29851@virtech.uucp> cpcahil@virtech.uucp (Conor P. Cahill) writes:
>marc@dumbcat.sf.ca.us (Marco S Hyman) writes:
>
>>I haven't found this in TFM yet -- perhaps the net can help.  Given an error
>>message that says something like "SCSI absolute sector 1234 on drive 1 is bad"
>>how can I map this sector number to a file/directory/(inode!).  I've looked at
>
>Read mkpart(1M), specifically the -A flag.  Note that if you do add a bad
>sector, you should place an entry into /etc/partitions marking that sector
>as bad ("badsec =" - also documented on mkpart(1M)).

I don't think this is what was asked.  Marco wants to find out which *file*
is corrupt because of the bad sector.  And I can relate to his problem,
having had a similar problem on a customer machine.  Given the bad sector
error, it is trivial to 'mkpart' the sector into the bad sector list,
but how do you insure the file system is ok without restoring from a tape?

In the case I had to deal with, the customer did not have a current
backup of the system.  While I could make a backup of the system, there
were errors during the backup, and it was not clear from the output of
cpio which files were trashed, as the output of cpio is buffered
independently of stderr.  Someone advised me to mark the sectors as bad,
then fsck the drive, and fsck would report the truncated files.  Well,
it didn't, and then I was left with only a list of bad sectors.  I spent
a great deal of time getting that system fixed up.

Unless someone knows another method, the only thing I can think of for t
this situation is:
find / -print -exec cp {} /dev/null \;

which I have not tried.  I suspect it would only work if run from the
system console, and if the system console were a hardcopy device (or you
are extremely patient).  Note that trapping stderr from the find command
would not necessarily tell you anything, as the console error messages
are not going through stderr!

-- 
del AKA Erik Lindberg                             uunet!pilchuck!fnx!del
                          Who is John Galt?

ed@mtxinu.COM (Ed Gould) (06/08/91)

> I haven't found this in TFM yet -- perhaps the net can help.  Given
> an error message that says something like "SCSI absolute sector
> 1234 on drive 1 is bad" how can I map this sector number to a
> file/directory/(inode!).

The tool to do this is icheck, if it exists in your version of
Unix.  However, it's a three-step process.  First, you need to
determine the relative sector number in the filesystem affected.
If the driver is well written, it will report both absolute and
relative sector numbers.  If not, you'll have to subtract the
starting sector number of the filesystem partition from the absolute
sector number.  (Be careful - partition offsets are often specified
in cylinders, not sectors.)  Second, the sector number must be
converted into a filesystem block number; some drivers will report
it as well.  This is a simple matter of division, dividing the
sector number by the blocking factor, being careful to round up
properly.  The "blocking factor" is the number of sectors per
filesystem block:  If you have a 1024-byte-block filesystem and
512-byte sectors, the factor is 2.  This filesystem block number
can be fed to icheck, which will report the inode number of the
file containing the block.

If you want the name(s) of that file, then feed the inumber to ncheck.

-- 
Ed Gould			No longer formally affiliated with,
ed@mtxinu.COM			and certainly not speaking for, mt Xinu.

"I'll fight them as a woman, not a lady.  I'll fight them as an engineer."

cpcahil@virtech.uucp (Conor P. Cahill) (06/10/91)

del@fnx.UUCP (Dag Erik Lindberg) writes:

>I don't think this is what was asked.  Marco wants to find out which *file*
>is corrupt because of the bad sector.  And I can relate to his problem,

Your right, I misread the questiong.

>having had a similar problem on a customer machine.  Given the bad sector
>error, it is trivial to 'mkpart' the sector into the bad sector list,
>but how do you insure the file system is ok without restoring from a tape?

If a backup is available, I would low level format the drive and reload the
system.  My reasoning for this is that if one sector goes bad, it is likely
that more will follow.  A low level format (along with correct entry of
the manufacturers bad sector list) usually goes a long way towards ensuring
that you don't have the same problem again (although, given time it will
probably happen again).

-- 
Conor P. Cahill            (703)430-9247        Virtual Technologies, Inc.
uunet!virtech!cpcahil                           46030 Manekin Plaza, Suite 160
                                                Sterling, VA 22170 

marc@dumbcat.sf.ca.us (Marco S Hyman) (06/14/91)

In article <1991Jun10.134714.28189@virtech.uucp> cpcahil@virtech.uucp (Conor P. Cahill) writes:
 > If a backup is available, I would low level format the drive and reload the
 > system.  My reasoning for this is that if one sector goes bad, it is likely
 > that more will follow.  A low level format (along with correct entry of
 > the manufacturers bad sector list) usually goes a long way towards ensuring
 > that you don't have the same problem again (although, given time it will
 > probably happen again).

That is exactly what I did.  The surprising part is that the manufacturer
defect list is empty on both disks and that the 386/ix format/scan (or does it
use the AHA 1452 format/scan?) has never found an error.  I entered the ones
that I noted on a manual log, though.  We'll see how long this lasts.  It
seems I have to do this every 6 months or so.

In the mean while I think it's time to start on a utility that maps sectors to
files.

// marc
-- 
// home: marc@dumbcat.sf.ca.us		pacbell!dumbcat!marc
// work: marc@ascend.com		uunet!aria!marc

cpcahil@virtech.uucp (Conor P. Cahill) (06/14/91)

marc@dumbcat.sf.ca.us (Marco S Hyman) writes:

>That is exactly what I did.  The surprising part is that the manufacturer
>defect list is empty on both disks and that the 386/ix format/scan (or does it
>use the AHA 1452 format/scan?) has never found an error.  I entered the ones

I strongly recommend against using the OS low level format utility.  Most
controllers have a bios formatting utility that will be much better
than OS utility.

>that I noted on a manual log, though.  We'll see how long this lasts.  It
>seems I have to do this every 6 months or so.

This could be that you have a weak drive, or that the OS format utility just
isn't good enough.  If I had repeating bad sectors popping up every once in
a while, I would make sure I have a good backup procedure.

-- 
Conor P. Cahill            (703)430-9247        Virtual Technologies, Inc.
uunet!virtech!cpcahil                           46030 Manekin Plaza, Suite 160
                                                Sterling, VA 22170 

cmf851@anu.oz.au (Albert Langer) (06/15/91)

In article <1055@dumbcat.sf.ca.us> marc@dumbcat.sf.ca.us (Marco S Hyman) writes:

>That is exactly what I did.  The surprising part is that the manufacturer
>defect list is empty on both disks and that the 386/ix format/scan (or does it
>use the AHA 1452 format/scan?) has never found an error.  I entered the ones
>that I noted on a manual log, though.  We'll see how long this lasts.  It
>seems I have to do this every 6 months or so.

Sorry if I have misunderstood this thread. My understanding is that
SCSI drives normally map out bad sectors themselves and neither report
defects to the operating system nor make use of a manufacturer's defect
list. If that is wrong, somebody please tell me. If it is right then
the discussion seems pointless unless I have misunderstood it.

(I am assuming that AHA 1452 is a typo for AHA 1542 SCSI host adaptor).

--
Opinions disclaimed (Authoritative answer from opinion server)
Header reply address wrong. Use cmf851@csc2.anu.edu.au

marc@dumbcat.sf.ca.us (Marco S Hyman) (06/16/91)

In article <1991Jun14.181849.3725@newshost.anu.edu.au> cmf851@anu.oz.au (Albert Langer) writes:
 > Sorry if I have misunderstood this thread. My understanding is that
 > SCSI drives normally map out bad sectors themselves and neither report
 > defects to the operating system nor make use of a manufacturer's defect
 > list. If that is wrong, somebody please tell me. If it is right then
 > the discussion seems pointless unless I have misunderstood it.
 > 
 > (I am assuming that AHA 1452 is a typo for AHA 1542 SCSI host adaptor).

Yep.  The SCSI controller is a 1542A.  Using 386/ix 2.0.2 and a pair of
Seagate 80 MByte drives (I forget the number) I get hard errors reported to
the console.  Automatic mad sector mapping is NOT performed.  This is a GOOD
thing as the hard errors are usually (more than 98% of the time) not hard
errors. That is I can copy files, get errors on the original file, look at
the copy, and find nothing wrong.  I suspect the cheap Seagate drives -- or
the fact that I'm running two of them.

The last time I mapped out a bad sector by hand I lost a chunk of the
/usr/lib/news directory.  (I always wait until after doing a full backup
before mapping anything out).  Think of the problems that would occur of this
happened automatically.

// marc
-- 
// home: marc@dumbcat.sf.ca.us		pacbell!dumbcat!marc
// work: marc@ascend.com		uunet!aria!marc

rmk@rmkhome.UUCP (Rick Kelly) (06/18/91)

In article <1058@dumbcat.sf.ca.us> marc@dumbcat.sf.ca.us (Marco S Hyman) writes:
>In article <1991Jun14.181849.3725@newshost.anu.edu.au> cmf851@anu.oz.au (Albert Langer) writes:
> > Sorry if I have misunderstood this thread. My understanding is that
> > SCSI drives normally map out bad sectors themselves and neither report
> > defects to the operating system nor make use of a manufacturer's defect
> > list. If that is wrong, somebody please tell me. If it is right then
> > the discussion seems pointless unless I have misunderstood it.
> > 
> > (I am assuming that AHA 1452 is a typo for AHA 1542 SCSI host adaptor).
>
>Yep.  The SCSI controller is a 1542A.  Using 386/ix 2.0.2 and a pair of
>Seagate 80 MByte drives (I forget the number) I get hard errors reported to
>the console.  Automatic mad sector mapping is NOT performed.  This is a GOOD
>thing as the hard errors are usually (more than 98% of the time) not hard
>errors. That is I can copy files, get errors on the original file, look at
>the copy, and find nothing wrong.  I suspect the cheap Seagate drives -- or
>the fact that I'm running two of them.
>
>The last time I mapped out a bad sector by hand I lost a chunk of the
>/usr/lib/news directory.  (I always wait until after doing a full backup
>before mapping anything out).  Think of the problems that would occur of this
>happened automatically.


However, most SCSI drives can be modeselected to do auto bad sector mapping.
But, as you say, this isn't the most desirable option.  The Bernoulli box
does this by default.

Rick Kelly	rmk@rmkhome.UUCP	frog!rmkhome!rmk	rmk@frog.UUCP

cmf851@anu.oz.au (Albert Langer) (06/21/91)

In article <9106171448.32@rmkhome.UUCP> rmk@rmkhome.UUCP 
(Rick Kelly) quotes and writes:

>>The last time I mapped out a bad sector by hand I lost a chunk of the
>>/usr/lib/news directory.  (I always wait until after doing a full backup
>>before mapping anything out).  Think of the problems that would occur of this
>>happened automatically.

>However, most SCSI drives can be modeselected to do auto bad sector mapping.
>But, as you say, this isn't the most desirable option.  The Bernoulli box
>does this by default.

Why do you agree that auto bad sector mapping is undesirable? The argument
quoted assumed that it would "automatically" lose chunks of needed files,
when in fact that was clearly a result of NOT implementing automatic SCSI
re-mapping and instead waiting until an unrecoverable hard error had
actually lost data.

My understanding is that the automatic remapping would occur when "too
many" soft errors were happening for a particular sector (as defined
by those with the best knowledge of drive failure characteristics - the
drive manufacturer). This would result in the data being preserved by
the remapping, so no missing chunks.

Waiting for a "hard" failure on the other hand would result in manual
re-mapping and lost data.

--
Opinions disclaimed (Authoritative answer from opinion server)
Header reply address wrong. Use cmf851@csc2.anu.edu.au

marc@dumbcat.sf.ca.us (Marco S Hyman) (06/22/91)

In article <1991Jun20.172754.13086@newshost.anu.edu.au> cmf851@anu.oz.au (Albert Langer) writes:
 > Why do you agree that auto bad sector mapping is undesirable?

The problem is that many have never seen automatic re-mapping (no matter what
the book says :-).  If remaping only occurs after a recoverable error with the
new sector written with the old data (as the manuals imply) this would be a
good thing.

However.  Given my hardware/software I'm seeing a bogus hard error that, if it
occurs at just the wrong time, causes a panic.  Note: The data is just fine.

A previous poster mentioned that most SCSI drives could be mode selected to do
automatic bad sector mapping.  Excuse my ignorance: How do I do this?

// marc
-- 
// home: marc@dumbcat.sf.ca.us		pacbell!dumbcat!marc
// work: marc@ascend.com		uunet!aria!marc