brant@manta.pha.pa.us (Brant Cheikes) (02/09/89)
Given a block number, how can I find out (a) if it's part of a file, and (b) what file it's part of? Why might this be useful? Say you wake up one morning to discover a bad block error in your unix.log. If you knew whether the block was allocated, and what file it was part of, you might be able to avoid having to reformat the disk. Though I'm not sure what effect adding a bad block entry has on the freelist. -- Brant Cheikes University of Pennsylvania, Department of Computer and Information Science brant@manta.pha.pa.us, brant@linc.cis.upenn.edu, bpa!manta!brant
jcm@mtunb.ATT.COM (was-John McMillan) (02/09/89)
In article <462@manta.pha.pa.us> brant@manta.pha.pa.us (Brant Cheikes) writes: >Given a block number, how can I find out (a) if it's part of a file, >and (b) what file it's part of? A) There are so many uses of BLOCK NUMBER (and representations thereof) I will simply PRESUME you are referring to: A LOGICAL BLOCK # on an identified FILE-SYSTEM. For this case: As root, run: /etc/ncheck -i #### -a /dev/rfp### (per instructions in Section 1M). The above will give you the mount-point-relative path-names of all files which contain the block[s]. (Don't bug me: I know that for most of you this is the FULL path name, but NOT for me!) (I've also seen at three other representations of block #s based on physical drive offsets (using PHYSICAL BLOCK #s, I presume)). B) The problem MAY be addressable WITHOUT EITHER bad-blocking or re-formatting. 1) Blocks contain META-information, and data. 2) META-stuff includes sector id's and synchronization fields. If META-merde is blown, only reformatting will fix. (In a kinder, gentler world, SINGLE-TRACK reformatting -- with NO loss of other sectors -- would be available.) 3) The only sensed errors are data READ errors. These errors reflect either transient read (noise/vibration) problems, or unrecoverable read problems: Unrecoverable read problems arise from either transient write (signal/vibration) problems or from permanent (surface defect) problems. In general, the system silently re-tries enough times you aren't aware of transient READ errors. In my experience, a LARGE percent of "un-recoverable read errors" are of the TRANSIENT write-error type. Transient write problems may be corrected by re-writing the data block. 4) Therefore, I generally try to fix a disk by: a) Identifying the file (or just using DD(1) to examine the entire disk, and then addressing the specific BLOCK). b) Repeatedly trying to copy the bad file (or individual disk block) -- in the hope that the problem is an intermittent READ failure whose data may be salvaged. (This usually fails, as the system has re-tried many times before you are aware of a problem. But SOMETIMES!) c) If the data was salvaged, I re-write the file/block and re-read several times to identify if the problem is repaired. d) If the data was NOT salvaged, I write ZEROES into the file/block and re-read several times to identify if the block is readable. The file is then scrapped. (If the file was in the INODE area, this produces anxiety & depression ;^) (Hmmmm... I've never thought to try it, but I wonder if using RAW I/O, I could save HALF the bad LOGICAL [1K] block by doing this ZEROING on a PHYSICAL [512] block basis? This could reduce INODE loss from 16- to 8-inodes.) C) Absurdly, I've never run any programs to augment the BAD-BLOCK list. When I've lost sectors permanently, there has only been smoke & ashes left! This, in part, reflects the higher reliability of the AT&T-accepted disks -- no joke here! { Tedious opinions of disk selection criteria deleted ;-) } Anyway, FREE-LISTS are NOT the issue, since running "FSCK -s" will rebuild them from scratch. jc mcmillan -- att!mtunb!jcm -- speaking for himself, if that
pfales@ttrde.UUCP (Peter Fales) (02/10/89)
In article <462@manta.pha.pa.us>, brant@manta.pha.pa.us (Brant Cheikes) writes: > Given a block number, how can I find out (a) if it's part of a file, > and (b) what file it's part of? > > Why might this be useful? Say you wake up one morning to discover a > bad block error in your unix.log. If you knew whether the block was > allocated, and what file it was part of, you might be able to avoid > having to reformat the disk. Though I'm not sure what effect adding a > bad block entry has on the freelist. As it turns out, I am working on a program that does exactly this. The documentation I have is sketchy, but between knowing a little about UNIX file systems, the information in /usr/include/sys/gdisk.h, and a little experimenting, I was able to puzzle it out. My program is a few weeks (months?) away from being a useable product, but I can post it if there is enough interest. The way I am doing it - the only way so far as I know - is to search through the inode list, and look the list of blocks that belong to each inode. Then you can do a find -inum to find the file with that inode. There are a few other things to consider. For example, the bad block may be in the swap area, or (shudder) the inode list. Actually, on the unix-pc adding a bad block has no effect on disk space or on the free list. The file system normally uses only 16 sectors out of the 17 available on each track. The 17th is used for sparing out other sectors. So, when you map out a bad block, it will be replaced transparently by one of the spare sectors, with no change to the file system, but the data will be lost. Hope you have good backups. -- Peter Fales AT&T, Room 2F-217 200 Park Plaza UUCP: ...att!ttrde!pfales Naperville, IL 60566 Domain: pfales@ttrde.att.com work: (312) 416-5357
brant@manta.pha.pa.us (Brant Cheikes) (02/10/89)
In article <462@manta.pha.pa.us> I asked: >Given a block number, how can I find out (a) if it's part of a file, >and (b) what file it's part of? In article <1392@mtunb.ATT.COM> jcm@mtunb.UUCP (was-John McMillan) replied: > [...] I > will simply PRESUME you are referring to: > A LOGICAL BLOCK # on an identified FILE-SYSTEM. This is nearly correct. I meant a 512-byte block #, numbered from zero, with block 0 referring to the boot block. I'm starting from a HDERR message like this: HDERR ST:51 EF:10 CL:FF45 CH:FF01 SN:FF00 SC:FF02 SDH:FF24 DMACNT:FFFF DCRREG:94 MCRREG:9D00 Wed Feb 8 10:00:58 1989 Given CH, CL, SN, and SDH, and knowing my disk stats, I can compute the logical block number. In the above case, for a disk with 8 heads and 16 blocks (sectors) per track, the computation is: cyl # = 0x145 = 325 (decimal), sector 0, head 4. there are 8 heads * 16 blocks/track = 128 blocks/cyl logical block of error = (cylinder# * blocks/cyl) + (head * blocks/track) + sector = (325*128)+(16*4)+0 = 41664. (NB: cylinder, head, and sector are all numbered from zero) Now, knowing that the error occurred in the 41664'th 512-byte block on the disk, I want to determine if that block is in the free list or if it's part of a file. If the latter, I want to know which file it's allocated to. (BTW, I can verify the block is not an inode block as follows: My disk has a 64 LOGICAL (1024-byte) block partition 0, an 8000 LOGICAL block partition 1, and an 114944 512-byte block partition 0. df -t shows a total of 14368 inodes. There are 8 inodes/block (see <sys/param.h> INOPB for 512-byte FS), so the inodes take up 14368/8 = 1796 512-byte blocks. So data blocks begin at block # (64*2)+(8000*2)+1796=17924. Since 41664 > 17924, the error isn't in an inode block.) John suggested the following approach, given a LOGICAL block #: > As root, run: > /etc/ncheck -i #### -a /dev/rfp### This is not the right answer. The argument to -i is supposed to be an inode number, not a block number (logical or otherwise). So my question remains. But thanks for trying! [NB: if I have said anything incorrect here, I trust that someone will swiftly correct me.] -- Brant Cheikes University of Pennsylvania, Department of Computer and Information Science brant@manta.pha.pa.us, brant@linc.cis.upenn.edu, bpa!manta!brant
brant@manta.pha.pa.us (Brant Cheikes) (02/10/89)
You all know the question. In article <848@ttrde.UUCP> pfales@ttrde.UUCP (Peter Fales) writes: >As it turns out, I am working on a program that does exactly this. [...] >My program is a few weeks >(months?) away from being a useable product, but I can post it if there >is enough interest. I'm interested, so either post it or mail it to me, thanks. >The way I am doing it - the only way so far as I know - is to search >through the inode list, and look the list of blocks that belong to >each inode. Then you can do a find -inum to find the file with that >inode. This is correct, though you should not overlook the freelist. >There are a few other things to consider. For example, the bad block >may be in the swap area, or (shudder) the inode list. I believe these are easy computations given knowledge of the sizes of partitions 0, 1, and 2, and the total number of inodes. My understanding of the Unix filesystem is that the inode blocks are the first blocks of partition 2. So given 800 total inodes, and 8 inodes per block, the first 100 blocks of partition 2 are reserved for the inodes. -- Brant Cheikes University of Pennsylvania, Department of Computer and Information Science brant@manta.pha.pa.us, brant@linc.cis.upenn.edu, bpa!manta!brant
jr@amanue.UUCP (Jim Rosenberg) (02/10/89)
In article <462@manta.pha.pa.us> brant@manta.pha.pa.us (Brant Cheikes) writes: >Given a block number, how can I find out (a) if it's part of a file, >and (b) what file it's part of? > >Why might this be useful? Say you wake up one morning to discover a >bad block error in your unix.log. If you knew whether the block was >allocated, and what file it was part of, you might be able to avoid >having to reformat the disk. This is something that people wanna do so often it amazes me there's not a utility for this. An fsdb wizard might be able to tell you how -- a script redirecting fsdb's input??? At any rate, here's a method I've used on occasion when I knew I had a bad block but had no idea what the file was. You tar the entire file system to /dev/null and capture the output. There's a catch. I'm not sure how it works on the UNIX-PC's tar, but on some versions of tar the error message can VERY EASILY escape notice. E.g. you do something like tar cvf /dev/null / >tar.list 2>&1 On some systems tar is too brain damaged to differentiate between EOF and a read error, so a file with a bad block will show up as one *whose file size changed*. (tar reads fewer bytes than stat told it were there.) So check your capture for files whose size has changed. One of the public domain tars may report file read errors unambiguously or else if someone wants to do some quick hacking perhaps this could be hacked into a PD tar without much work. It would be much easier than doing it right by writing a real utility that walks the file system reporting what blocks belong to what files. Quick & dirty, no warranty express or implied ... :-) -- Jim Rosenberg CIS: 71515,124 decvax!idis! \ WELL: jer allegra! ---- pitt!amanue!jr BIX: jrosenberg uunet!cmcl2!cadre! /
pfales@ttrde.UUCP (Peter Fales) (02/10/89)
In article <1392@mtunb.ATT.COM>, jcm@mtunb.ATT.COM (was-John McMillan) writes: > In article <462@manta.pha.pa.us> brant@manta.pha.pa.us (Brant Cheikes) writes: > >Given a block number, how can I find out (a) if it's part of a file, > >and (b) what file it's part of? > > > A) There are so many uses of BLOCK NUMBER (and representations thereof) I > will simply PRESUME you are referring to: > A LOGICAL BLOCK # on an identified FILE-SYSTEM. > > For this case: > As root, run: > /etc/ncheck -i #### -a /dev/rfp### > (per instructions in Section 1M). Thanks for your posting John, you had some good tips on file system repair to add to my bag of tricks, but I must disagree with the statement above. According to my manual, as well as empirical evidence the numbers following "-i" are a list of inodes, not a list of logical blocks. Consider that a large file will contain many blocks, but a file will never have more than one inode. I am not aware of any standard tools that will go from logical block numbers to files, though I would love to be proved wrong. -- Peter Fales AT&T, Room 2F-217 200 Park Plaza UUCP: ...att!ttrde!pfales Naperville, IL 60566 Domain: pfales@ttrde.att.com work: (312) 416-5357
jcm@mtunb.ATT.COM (was-John McMillan) (02/14/89)
Mea culpa: too many balls in the air, too few brains in the head. I indeed posted erroneous advice. Read on.... In article <446@amanue.UUCP> jr@amanue.UUCP (Jim Rosenberg) writes: >In article <462@manta.pha.pa.us> brant@manta.pha.pa.us (Brant Cheikes) writes: >>Given a block number, how can I find out (a) if it's part of a file, >>and (b) what file it's part of? >>... >This is something that people wanna do so often it amazes me there's not a >utility for this. An fsdb wizard might be able to tell you how -- a script >redirecting fsdb's input??? There USED to be a program that did this: icheck -b #B# ... #B# FileSystem -- produced a list of INODES which "Owned" those blocks. (Unfortunately, I forgot to post the above in my previous, brain-damaged note.) The next step is to use ICHECK's output: ncheck -i #I# ... #I# FileSystem -- then turned those INODE numbers into FileNames. When FSCK came along, AT&T seems to have dropped ICHECK. I can't legitimately hand out any hack I have for icheck... but others are apparently busily at work on it. Written properly, an "icheck" clone could be run as: ncheck -i `eyecheck -b #B# ... FS 2> Aye2` FS Berkeley still provides ICHECK, I believe -- probably DCHECK as well. Ahhhh, those beautiful red-eyed nights spent with *check, piecing together blithered FS before FSCK was born. jc mcmillan -- att!mtunb!jcm -- speaking for self, only (Those WEREN'T the "good ol' days", were they?)
brant@manta.pha.pa.us (Brant Cheikes) (02/14/89)
In article <1398@mtunb.ATT.COM> jcm@mtunb.UUCP (was-John McMillan) writes [re a program to find inodes from blocks]: >There USED to be a program that did this: > icheck -b #B# ... #B# FileSystem > -- produced a list of INODES which "Owned" those blocks. The extended features of icheck (superblock repair and miscellaneous consistency checks) are no longer necessary, since they were incorporated into fsck. However, I have written a utility called "bf" that will perform the above-described icheck function---find the inodes that "own" specified blocks. Bf has been tested on a 3b1 (SVR2) only, and probably makes assumptions about the filesystem structure. Bf was just posted to unix-pc.sources, and e-mail copies (it's short) are available upon request. -- Brant Cheikes University of Pennsylvania, Department of Computer and Information Science brant@manta.pha.pa.us, brant@linc.cis.upenn.edu, bpa!manta!brant