[comp.unix.ultrix] ncheck

chris@mimsy.umd.edu (Chris Torek) (02/14/90)

In article <709@shodha.dec.com> alan@shodha.dec.com writes:
>Once you've cleared a Forced Error on a replaced block you
>need to determine if the block was important.

I like to do this before clearing the forced error, myself.  This gives
me a chance to poke around with the (known bad) data before wiping it
out.  Be aware that ncheck (and perhaps icheck as well) work better if
you do this afterward, since it reads large chunks of data from the
disk and does not bother to retry individual sectors after an error.
(I fixed this in ncheck recently when we acquired a bad sector on an
RZ55.)

>[icheck -b ...] If the block belongs to a file

... or a set of files (by holding the inode data itself: there are 4 inodes
per sector) or is an indirect block of some file ...

>you can track down the file name by the inode number with:
>
>	ncheck -i inode-number special
>
>This can be slow,

This can be slow because ncheck is buggy!  I got this bug fixed in
4.3BSD-tahoe years ago, but the fix had not got into Ultrix as of the
second-to-last time I needed ncheck on the DECstations (the last time
was due to the bad block mentioned above; at that point we had either
fixed ncheck or switched versions of Ultrix, because that bug was no
longer present).  The bug is obvious if you have the source, and hard
to spot without it except via the fact that ncheck is slow.  The
problem is, in essence, that a loop that should be of the form:

	for (offset = 0; offset < directory_size; offset += entry_size) {
		blknum = file_offset / blksize;
		blkoff = file_offset % blksize;
		if (blkoff == 0)	/* at sector boundary */
			read block blknum;
		examine entry at blkoff;
	}

is instead of the form:

	for (offset = 0; offset < directory_size; offset += entry_size) {
		blknum = file_offset / blksize;
		blkoff = file_offset % blksize;
		if (blknum == 0)	/* oops */
			read block blknum;
		examine entry at blkoff;
	}

In an 8K/1K file system, this causes some entries in a directory whose
size is > 8192 bytes to be ignored; in a 4K/512 file system, it causes
some to be ignored in those directories > 4096 bytes long.  In all cases
it causes one file-system block read (translating to one physical read
of the raw device) per directory *entry*, instead of one per directory
*block*.

If you do not have Ultrix source, you cannot fix this bug.  (If you do
have source, take a look at readdir().  You will need to add a line or
two and use the lblkno macro to calculate the block number.  The block
offset calculation is correct, but needs to be done outside the if().)
(Actually, you could patch this via adb, if you have sufficient skill
and near-omniscient knowledge :-) .)
-- 
In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 7163)
Domain:	chris@cs.umd.edu	Path:	uunet!mimsy!chris