[comp.sys.dec] Dead disk on microVAX?

SJCOLE@cc.utah.edu (Samuel J. Cole) (07/12/90)

I have a microVAX II running ULTRIX V2.0-1 (a little old, I know) that will
not boot because of hard errors from the disk drive.  On power-up, the
critter passes self-test, and boots ULTRIX to the point where is checks the
disk.  At this point I get

Force Error Modifier set LBN 5733
ra0a: hard error sn5728
Force Error Modifier set LBN 5580
ra0a: hard error sn5568
Force Error Modifier set LBN 5580
ra0a: hard error sn5568

and the boot fails telling me 

ufs_mount: fs /dev/ra0g not cleaned -- please fsck

fsck does no good, because I get the same Force Error Modifier set business.

Is this disk drive dead, or is there something I can do to get it back (the
people that use the machine don't back it up very often, of course; sigh)?

Thanks in advance for any help!

Sam Cole
Chemistry Computer Center
University of Utah
Internet: cole@chemistry.chem.utah.edu
Bitnet: SJCOLE@UTAHCCA

jtkohl@MIT.EDU (John T Kohl) (07/12/90)

There are two ways to remedy this:

1) if you can find the stand-alone program rabads (or rqbads for q-bus
machines), boot it, and select 'init' on the drives affected; it will
scan the disk and do some bad block replacement.  Write down the block
numbers, convert to filesystem block #'s, and after rebooting use
ncheck,icheck,dcheck to find what file/directory contains those blocks;
those files/directories should be considered corrupt.

2) If you can't find rabads/rqbads, write a quick program to write
zeroes to the affected block; re-writing the damaged sectors will clear
the forced-error bit.  Then figure out the files/directories affected by
the problem and recreate them, as they are certainly corrupt.

BEWARE!  I am not speaking in any official capacity here; proceed at
your own risk.  You take full responsibility for any damages incurred.
--
John Kohl <jtkohl@ATHENA.MIT.EDU> or <jtkohl@MIT.EDU>
Digital Equipment Corporation/Project Athena
(The above opinions are MINE.  Don't put my words in somebody else's mouth!)

alan@shodha.dec.com ( Alan's Home for Wayward Notes File.) (07/13/90)

In article <76742@cc.utah.edu>, SJCOLE@cc.utah.edu (Samuel J. Cole) writes:
> I have a microVAX II running ULTRIX V2.0-1 (a little old, I know) that will
> not boot because of hard errors from the disk drive.  On power-up, the
> critter passes self-test, and boots ULTRIX to the point where is checks the
> disk.  At this point I get
> 
> Force Error Modifier set LBN 5733
> ra0a: hard error sn5728
> Force Error Modifier set LBN 5580
> ra0a: hard error sn5568
> Force Error Modifier set LBN 5580
> ra0a: hard error sn5568
> 
> and the boot fails telling me 
> 
> ufs_mount: fs /dev/ra0g not cleaned -- please fsck
>
	First a short discussion on forced error.  The blocks mentioned
	(LBN XXXX) were found to be bad and replaced with good blocks.
	Unfortunately the data couldn't be corrected and so the corrupt
	data was written to the good block.  To fix the disk you need
	to check for more errors, clear the forced error and identify
	the files that are broken.

	1.  Checking the disk for more errors.  If you can get the
	    system upto single user mode user radisk to scan the
	    disk for more bad blocks.  If you can't a copy of radisk
	    should be on the standalone system that was used for
	    installation.

		radisk -s -1 -1 /dev/rra0c

	    The -s is "scan".  The first -1 is the starting LBN (you
	    could use 0.  The second -1 is the length of the disk.
	    Using -1 lets radisk figure out how big the disk is.

	2.  Clearing forced errors.  Again use radisk.  The -c option
	    will clear the forced errors.

		radisk -c LBN length /dev/rra0c

	    If you have string of LBNs you can use the length option
	    to get them all.

	3.  Now you can use icheck and ncheck to track down the files
	    that are corrupted.  You can hand icheck a list of block
	    numbers and it will tell you what part of the file system
	    they belong to.  When the blocks belong to files it will
	    give you the inode numbers.  You can hand these to ncheck
	    and it will eventually print the name.  Once you have the
	    names you can replace those files from backups or the
	    original distribution.

		icheck -b block-1 block-2 ... block-n special-file

		ncheck -i inode-1 inode-2 ... inode-n special-file

	    Please note that radisk only knows about logical block
	    numbers of the whole disk and not blocks within ULTRIX
	    partitions.  You'll need to convert LBNs to block numbers.
 
> fsck does no good, because I get the same Force Error Modifier set business.
> 
> Is this disk drive dead, or is there something I can do to get it back (the
> people that use the machine don't back it up very often, of course; sigh)?

	If those are the only errors then the disk is in pretty
	good shape.  If the scan turns up more, but it doesn't
	get worse as you go, you can repair it, backup it up
	and then see if it breaks.
> 
> Thanks in advance for any help!
> 
> Sam Cole
> Chemistry Computer Center
> University of Utah
> Internet: cole@chemistry.chem.utah.edu
> Bitnet: SJCOLE@UTAHCCA


-- 
Alan Rollow				alan@nabeth.enet.dec.com