[comp.sys.att] From blocks to files

brant@manta.pha.pa.us (Brant Cheikes) (02/09/89)

Given a block number, how can I find out (a) if it's part of a file,
and (b) what file it's part of?

Why might this be useful?  Say you wake up one morning to discover a
bad block error in your unix.log.  If you knew whether the block was
allocated, and what file it was part of, you might be able to avoid
having to reformat the disk.  Though I'm not sure what effect adding a
bad block entry has on the freelist.
-- 
Brant Cheikes
University of Pennsylvania, Department of Computer and Information Science
brant@manta.pha.pa.us, brant@linc.cis.upenn.edu, bpa!manta!brant

jcm@mtunb.ATT.COM (was-John McMillan) (02/09/89)

In article <462@manta.pha.pa.us> brant@manta.pha.pa.us (Brant Cheikes) writes:
>Given a block number, how can I find out (a) if it's part of a file,
>and (b) what file it's part of?


A)  There are so many uses of BLOCK NUMBER (and representations thereof) I
    will simply PRESUME you are referring to:
    	A LOGICAL BLOCK # on an identified FILE-SYSTEM.
    
    For this case:
    	As root, run:
    		/etc/ncheck  -i  ####  -a  /dev/rfp###
    	(per instructions in Section 1M).
    
    The above will give you the mount-point-relative path-names of
    all files which contain the block[s].  (Don't bug me: I know
    that for most of you this is the FULL path name, but NOT for me!)

    (I've also seen at three other representations of block #s based
    on physical drive offsets (using PHYSICAL BLOCK #s, I presume)).

B)  The problem MAY be addressable WITHOUT EITHER bad-blocking or
    re-formatting.

    1)	Blocks contain META-information, and data.

    2)	META-stuff includes sector id's and synchronization fields.
	If META-merde is blown, only reformatting will fix.
	(In a kinder, gentler world, SINGLE-TRACK reformatting --
		with NO loss of other sectors -- would be available.)

    3)  The only sensed errors are data READ errors.  
    	These errors reflect either transient read (noise/vibration)
		problems, or unrecoverable read problems:
	Unrecoverable read problems arise from either transient write
		(signal/vibration) problems or from permanent (surface
		defect) problems.
	In general, the system silently re-tries enough times you
		aren't aware of transient READ errors.
	In my experience, a LARGE percent of "un-recoverable read errors"
		are of the TRANSIENT write-error type.
	Transient write problems may be corrected by re-writing the
		data block.
		
    4)	Therefore, I generally try to fix a disk by:
	a)  Identifying the file (or just using DD(1) to examine the
		entire disk, and then addressing the specific BLOCK).
	b)  Repeatedly trying to copy the bad file (or individual disk
		block) -- in the hope that the problem is an intermittent
		READ failure whose data may be salvaged.  (This
		usually fails, as the system has re-tried many times
		before you are aware of a problem.  But SOMETIMES!)
	c)  If the data was salvaged, I re-write the file/block and
		re-read several times to identify if the problem is
		repaired.
	d)  If the data was NOT salvaged, I write ZEROES into the
		file/block and re-read several times to identify if
		the block is readable.  The file is then scrapped.
		(If the file was in the INODE area, this produces
		anxiety & depression ;^)  (Hmmmm... I've never thought
		to try it, but I wonder if using RAW I/O, I could save
		HALF the bad LOGICAL [1K] block by doing this ZEROING
		on a PHYSICAL [512] block basis?  This could reduce
		INODE loss from 16- to 8-inodes.)
	
C)  Absurdly, I've never run any programs to augment the BAD-BLOCK
	list.  When I've lost sectors permanently, there has only been
	smoke & ashes left!  This, in part, reflects the higher
	reliability of the AT&T-accepted disks -- no joke here!

{ Tedious opinions of disk selection criteria deleted ;-) }

Anyway, FREE-LISTS are NOT the issue, since running "FSCK -s" will
rebuild them from scratch.  

jc mcmillan	-- att!mtunb!jcm	-- speaking for himself, if that

pfales@ttrde.UUCP (Peter Fales) (02/10/89)

In article <462@manta.pha.pa.us>, brant@manta.pha.pa.us (Brant Cheikes) writes:
> Given a block number, how can I find out (a) if it's part of a file,
> and (b) what file it's part of?
> 
> Why might this be useful?  Say you wake up one morning to discover a
> bad block error in your unix.log.  If you knew whether the block was
> allocated, and what file it was part of, you might be able to avoid
> having to reformat the disk.  Though I'm not sure what effect adding a
> bad block entry has on the freelist.

As it turns out, I am working on a program that does exactly this.  The
documentation I have is sketchy, but between knowing a little about UNIX
file systems, the information in /usr/include/sys/gdisk.h, and a little
experimenting, I was able to puzzle it out.  My program is a few weeks
(months?) away from being a useable product, but I can post it if there
is enough interest.

The way I am doing it - the only way so far as I know - is to search 
through the inode list, and look the list of blocks that belong to
each inode.  Then you can do a find -inum to find the file with that
inode.

There are a few other things to consider.  For example, the bad block
may be in the swap area, or (shudder) the inode list.  Actually, on
the unix-pc adding a bad block has no effect on disk space or on the free
list.  The file system normally uses only 16 sectors out of the 17
available on each track.  The 17th is used for sparing out other
sectors.   So, when you map out a bad block, it will be replaced transparently
by one of the spare sectors, with no change to the file system, but the 
data will be lost.  Hope you have good backups.

-- 
Peter Fales			AT&T, Room 2F-217
				200 Park Plaza
UUCP:	...att!ttrde!pfales	Naperville, IL 60566
Domain: pfales@ttrde.att.com	work:	(312) 416-5357

brant@manta.pha.pa.us (Brant Cheikes) (02/10/89)

In article <462@manta.pha.pa.us> I asked:
>Given a block number, how can I find out (a) if it's part of a file,
>and (b) what file it's part of?

In article <1392@mtunb.ATT.COM> jcm@mtunb.UUCP (was-John McMillan) replied:
>    [...] I
>    will simply PRESUME you are referring to:
>    	A LOGICAL BLOCK # on an identified FILE-SYSTEM.

This is nearly correct.  I meant a 512-byte block #, numbered from
zero, with block 0 referring to the boot block.

I'm starting from a HDERR message like this:

HDERR ST:51 EF:10 CL:FF45 CH:FF01 SN:FF00 SC:FF02 SDH:FF24
DMACNT:FFFF DCRREG:94 MCRREG:9D00 Wed Feb  8 10:00:58 1989

Given CH, CL, SN, and SDH, and knowing my disk stats, I can compute
the logical block number.  In the above case, for a disk with 8 heads
and 16 blocks (sectors) per track, the computation is:

	cyl # = 0x145 = 325 (decimal), sector 0, head 4.
	there are 8 heads * 16 blocks/track = 128 blocks/cyl
	logical block of error =
	 (cylinder# * blocks/cyl) + (head * blocks/track) + sector =
	 (325*128)+(16*4)+0 = 41664.

(NB: cylinder, head, and sector are all numbered from zero)

Now, knowing that the error occurred in the 41664'th 512-byte block on
the disk, I want to determine if that block is in the free list or if
it's part of a file.  If the latter, I want to know which file it's
allocated to.

(BTW, I can verify the block is not an inode block as follows:
  My disk has a 64 LOGICAL (1024-byte) block partition 0,
  an 8000 LOGICAL block partition 1, and
  an 114944 512-byte block partition 0.
  df -t shows a total of 14368 inodes.
  There are 8 inodes/block (see <sys/param.h> INOPB for 512-byte FS),
    so the inodes take up 14368/8 = 1796 512-byte blocks.
  So data blocks begin at block # (64*2)+(8000*2)+1796=17924.
  Since 41664 > 17924, the error isn't in an inode block.)

John suggested the following approach, given a LOGICAL block #:
>    	As root, run:
>    		/etc/ncheck  -i  ####  -a  /dev/rfp###

This is not the right answer.  The argument to -i is supposed to be an
inode number, not a block number (logical or otherwise).

So my question remains.  But thanks for trying!

[NB: if I have said anything incorrect here, I trust that someone will
 swiftly correct me.]

-- 
Brant Cheikes
University of Pennsylvania, Department of Computer and Information Science
brant@manta.pha.pa.us, brant@linc.cis.upenn.edu, bpa!manta!brant

brant@manta.pha.pa.us (Brant Cheikes) (02/10/89)

You all know the question.

In article <848@ttrde.UUCP> pfales@ttrde.UUCP (Peter Fales) writes:
>As it turns out, I am working on a program that does exactly this.
[...]
>My program is a few weeks
>(months?) away from being a useable product, but I can post it if there
>is enough interest.

I'm interested, so either post it or mail it to me, thanks.

>The way I am doing it - the only way so far as I know - is to search 
>through the inode list, and look the list of blocks that belong to
>each inode.  Then you can do a find -inum to find the file with that
>inode.

This is correct, though you should not overlook the freelist.

>There are a few other things to consider.  For example, the bad block
>may be in the swap area, or (shudder) the inode list.

I believe these are easy computations given knowledge of the sizes of
partitions 0, 1, and 2, and the total number of inodes.  My
understanding of the Unix filesystem is that the inode blocks are the
first blocks of partition 2.  So given 800 total inodes, and 8 inodes
per block, the first 100 blocks of partition 2 are reserved for the
inodes.
-- 
Brant Cheikes
University of Pennsylvania, Department of Computer and Information Science
brant@manta.pha.pa.us, brant@linc.cis.upenn.edu, bpa!manta!brant

jr@amanue.UUCP (Jim Rosenberg) (02/10/89)

In article <462@manta.pha.pa.us> brant@manta.pha.pa.us (Brant Cheikes) writes:
>Given a block number, how can I find out (a) if it's part of a file,
>and (b) what file it's part of?
>
>Why might this be useful?  Say you wake up one morning to discover a
>bad block error in your unix.log.  If you knew whether the block was
>allocated, and what file it was part of, you might be able to avoid
>having to reformat the disk.

This is something that people wanna do so often it amazes me there's not a
utility for this.  An fsdb wizard might be able to tell you how -- a script
redirecting fsdb's input???

At any rate, here's a method I've used on occasion when I knew I had a bad
block but had no idea what the file was.  You tar the entire file system to
/dev/null and capture the output.  There's a catch.  I'm not sure how it works
on the UNIX-PC's tar, but on some versions of tar the error message can VERY
EASILY escape notice.  E.g. you do something like

tar cvf /dev/null / >tar.list 2>&1

On some systems tar is too brain damaged to differentiate between EOF and a
read error, so a file with a bad block will show up as one *whose file size
changed*.  (tar reads fewer bytes than stat told it were there.)  So check
your capture for files whose size has changed.

One of the public domain tars may report file read errors unambiguously or
else if someone wants to do some quick hacking perhaps this could be hacked
into a PD tar without much work.  It would be much easier than doing it right
by writing a real utility that walks the file system reporting what blocks
belong to what files.  Quick & dirty, no warranty express or implied ... :-)
-- 
 Jim Rosenberg
     CIS: 71515,124                         decvax!idis! \
     WELL: jer                                   allegra! ---- pitt!amanue!jr
     BIX: jrosenberg                  uunet!cmcl2!cadre! /

pfales@ttrde.UUCP (Peter Fales) (02/10/89)

In article <1392@mtunb.ATT.COM>, jcm@mtunb.ATT.COM (was-John McMillan) writes:
> In article <462@manta.pha.pa.us> brant@manta.pha.pa.us (Brant Cheikes) writes:
> >Given a block number, how can I find out (a) if it's part of a file,
> >and (b) what file it's part of?
> 
> 
> A)  There are so many uses of BLOCK NUMBER (and representations thereof) I
>     will simply PRESUME you are referring to:
>     	A LOGICAL BLOCK # on an identified FILE-SYSTEM.
>     
>     For this case:
>     	As root, run:
>     		/etc/ncheck  -i  ####  -a  /dev/rfp###
>     	(per instructions in Section 1M).

Thanks for your posting John,  you had some good tips on file system 
repair to add to my bag of tricks, but I must disagree with the statement
above.  According to my manual, as well as empirical evidence the numbers
following "-i" are a list of inodes, not a list of logical blocks.

Consider that a large file will contain many blocks, but a file will never
have more than one inode.

I am not aware of any standard tools that will go from logical block
numbers to files, though I would love to be proved wrong.

-- 
Peter Fales			AT&T, Room 2F-217
				200 Park Plaza
UUCP:	...att!ttrde!pfales	Naperville, IL 60566
Domain: pfales@ttrde.att.com	work:	(312) 416-5357

jcm@mtunb.ATT.COM (was-John McMillan) (02/14/89)

Mea culpa:  too many balls in the air, too few brains in the head.
I indeed posted erroneous advice.  Read on....

In article <446@amanue.UUCP> jr@amanue.UUCP (Jim Rosenberg) writes:
>In article <462@manta.pha.pa.us> brant@manta.pha.pa.us (Brant Cheikes) writes:
>>Given a block number, how can I find out (a) if it's part of a file,
>>and (b) what file it's part of?
>>...
>This is something that people wanna do so often it amazes me there's not a
>utility for this.  An fsdb wizard might be able to tell you how -- a script
>redirecting fsdb's input???

There USED to be a program that did this:

	icheck  -b  #B#  ...  #B#  FileSystem
		-- produced a list of INODES which "Owned" those blocks.

(Unfortunately, I forgot to post the above in my previous, brain-damaged note.)
The next step is to use ICHECK's output:

	ncheck  -i  #I#  ...  #I#  FileSystem
		-- then turned those INODE numbers into FileNames.

When FSCK came along, AT&T seems to have dropped ICHECK.  I can't legitimately
hand out any hack I have for icheck... but others are apparently busily
at work on it.

Written properly, an "icheck" clone could be run as:
	ncheck -i `eyecheck -b #B# ... FS 2> Aye2` FS

Berkeley still provides ICHECK, I believe -- probably DCHECK as well.
Ahhhh, those beautiful red-eyed nights spent with *check, piecing
together blithered FS before FSCK was born.

jc mcmillan	-- att!mtunb!jcm	-- speaking for self, only
				(Those WEREN'T the "good ol' days", were they?)

brant@manta.pha.pa.us (Brant Cheikes) (02/14/89)

In article <1398@mtunb.ATT.COM> jcm@mtunb.UUCP (was-John McMillan)
writes [re a program to find inodes from blocks]:
>There USED to be a program that did this:
>	icheck  -b  #B#  ...  #B#  FileSystem
>		-- produced a list of INODES which "Owned" those blocks.

The extended features of icheck (superblock repair and miscellaneous
consistency checks) are no longer necessary, since they were
incorporated into fsck.  However, I have written a utility called "bf"
that will perform the above-described icheck function---find the
inodes that "own" specified blocks.  Bf has been tested on a 3b1
(SVR2) only, and probably makes assumptions about the filesystem
structure.  Bf was just posted to unix-pc.sources, and e-mail copies
(it's short) are available upon request.
-- 
Brant Cheikes
University of Pennsylvania, Department of Computer and Information Science
brant@manta.pha.pa.us, brant@linc.cis.upenn.edu, bpa!manta!brant