[comp.unix.ultrix] Hard errors on device

kjb@calmasd.Prime.COM (Ken Brucker) (06/21/91)

I've got a disk that is failing on one of my Ultrix nodes.  What I'd like to
know is if there's a way to determine which files have been corrupted by
several bad block errors.  I've got the sn number, LBN and block number as
reported by dump but don't know how to get to the file names from that info.

One thing I'm going to try next is doing a tar of the file systems that are
failing and see if I can get the file name info that way based on errors
that tar reports.  Any other ideas?

Thanks in advance!  - and please post answers!  It's one of my main mail
servers that has crapped out.

Ken
-- 
** Ken Brucker -- VMS Systems Programmer/Mangler -- ComputerVision
** kjb@calmasd.Prime.COM

alan@shodha.enet.dec.com ( Alan's Home for Wayward Notes File.) (06/21/91)

In article <2841@calmasd.Prime.COM>, kjb@calmasd.Prime.COM (Ken Brucker) writes:
> I've got a disk that is failing on one of my Ultrix nodes.  What I'd like to
> know is if there's a way to determine which files have been corrupted by
> several bad block errors.  I've got the sn number, LBN and block number as
> reported by dump but don't know how to get to the file names from that info.

	First unmount the file system.  That way only failing hardware
	will change it.

	If the disk is failing as you watch, what you do next may not
	matter, except to look for the most recent backups.  Catastro-
	phic failures are an ugly sight.  If the disk is stable you
	may at this point want to replace any bad blocks that didn't
	get replaced, scan for more and on DSA disks clear the forced
	error flag.  See the manual page for rzdisk and radisk.

	Using the "sn" numbers use the -b option of icheck(8) to
	determine what part of the file system has been corrupted.
	In the case of files the -b option will tell you which inode 
	number.  You can use this with ncheck(8) to see which files 
	were corrupted.

	There's a fair chance that some of the corrupted blocks	
	are in the inode space of the file system which makes
	recovery much harder.  You may be able to use fsck to
	repair the damage, but it's a long process.

	On some versions of ULTRIX ncheck(8) may be very slow.  I
	don't know if the performance problems were ever fixed.
	Most other methods require mounting the file system, though
	dump records the inode number along with the file name.
	Searching through a verbose restore listing may be quick
	enough.  Others are using the -i option of ls(1) to get
	the inode number and the -inum option of find(1).
> 
> -- 
> ** Ken Brucker -- VMS Systems Programmer/Mangler -- ComputerVision


-- 
Alan Rollow				alan@nabeth.cxn.dec.com

jarrell@vtserf.cc.vt.edu (Ron Jarrell) (06/21/91)

In article <2841@calmasd.Prime.COM> kjb@calmasd.Prime.COM (Ken Brucker) writes:
>I've got a disk that is failing on one of my Ultrix nodes.  What I'd like to
>know is if there's a way to determine which files have been corrupted by
>several bad block errors.  I've got the sn number, LBN and block number as

icheck and ncheck are useful utilities in times such as these.  Probably
why they weren't abandoned when fsck subsumed most of their functions.

Do an icheck -b blockno1 blockno2 etc.

Any files it find with those block numbers in them will be reported.

ncheck is useful if you have inode numbers and need the name, do a
ncheck -i inode-no.


-- 
Ron Jarrell
Virginia Tech Computing Center
jarrell@vtserf.cc.vt.edu