[comp.unix.wizards] fsck fails...help!

david@pyr.gatech.EDU (David Brown) (09/15/89)

Help!!  Fsck is failing on my machine.  I get the following error:

CANNOT READ: BLK 16 (CONTINUE)

In my trusty 4.3BSD System Managers Manual, I find that "...This
should never happen.  See a guru."  Well, where better to find them
than comp.unix.wizards?

Is the disk history?  Any help would be greatly appreciated.

Oh yeah.  I'm running SunOS4.0.3 on a Sun 3/60 with a 141 Mb Micropolis 1355.

Thanks,
  David Brown

-- 
----------------------------------------------------------------------------
David Brown                       Armstrong State College, Savannah, Georgia
ARPA: david@pyr.gatech.edu        uucp: ...!gatech!gitpyr!david

duane@anasaz.UUCP (Duane Morse) (09/16/89)

In article <9171@pyr.gatech.EDU>, david@pyr.gatech.EDU (David Brown) writes:
> 
> Help!!  Fsck is failing on my machine.  I get the following error:
> 
> CANNOT READ: BLK 16 (CONTINUE)
> 
> In my trusty 4.3BSD System Managers Manual, I find that "...This
> should never happen.  See a guru."  Well, where better to find them
> than comp.unix.wizards?
> 
> Is the disk history?  Any help would be greatly appreciated.

The message means that it cannot physically read the block. I don't
know how fsck varies from one system to the next, but the version on our
NCR Tower 32/600 also stops when this happens.

The only time this occurred on our system, I made a lucky guess about
something and 'fixed' the problem. In particular, I determined which
physical blocks the message referred to, verified the problem by trying
to dd the block, and then, as root, wrote zeroes to the bad block. My guess
was that either the smart SMD controller would allocate an alternate
block when I tried to write to the bad one, or the sector/block
checksum was bad and writing to the block would put out a good checksum.
After this I could run fsck.

Even if this works in your case, you'll probably lose some important
files since block 16 is near the beginning of the inode list.
(They'll hopefully show up in lost+found.) The
alternative may be reformatting the disk and re-installing everything,
however, because I'm somewhat doubtful about Unix's alternate block
mapping (through software) when it comes to inode blocks.
-- 

Duane Morse	...{asuvax or mcdphx}!anasaz!duane
(602) 861-7609

duane@anasaz.UUCP (Duane Morse) (09/17/89)

In article <726@anasaz.UUCP>, duane@anasaz.UUCP (Duane Morse) writes:
) In article <9171@pyr.gatech.EDU), david@pyr.gatech.EDU (David Brown) writes:
) ) 
) ) Help!!  Fsck is failing on my machine.  I get the following error:
) ) 
) ) CANNOT READ: BLK 16 (CONTINUE)
) ) 
) ) In my trusty 4.3BSD System Managers Manual, I find that "...This
) ) should never happen.  See a guru." 
) 
) The message means that it cannot physically read the block. I don't
) know how fsck varies from one system to the next, but the version on our
) NCR Tower 32/600 also stops when this happens.
) 
) The only time this occurred on our system, I made a lucky guess about
) something and 'fixed' the problem. In particular, I determined which
) physical blocks the message referred to, verified the problem by trying
) to dd the block, and then, as root, wrote zeroes to the bad block. My guess
) was that either the smart SMD controller would allocate an alternate
) block when I tried to write to the bad one, or the sector/block
) checksum was bad and writing to the block would put out a good checksum.
) After this I could run fsck.....

I received mail informing me that block 16 for 4.3BSD is in the superblock,
so clearing the block would probably have dire consequences. I've never
been near 4.3BSD; my comment is based on a trick that worked the only
time a similar problem arose on our system, which runs SysV. If block 16
is part of the superblock, it'd be worth a quick check to see if rewriting
the block with "reasonable" data is feasible.
I keep a printout of the superblock on our system for that reason. 
-- 

Duane Morse	...{asuvax or mcdphx}!anasaz!duane
(602) 861-7609

brian@ucsd.Edu (Brian Kantor) (09/18/89)

A 4.3BSD filesystem will have alternate superblocks; you may be able to
use one of them to get your filesystem back into shape enough to dump it
off to tape.  fsck -b is the option.  On most of the system I work with
(Vaxen, Suns, etc) the first alternate superblock is at 32; I have seen
it elsewhere (on a Pyramid).

Really heroic measures to recover something you should have been backing
up include using 'dd' to copy the damaged filesystem off to one of your swap
partitions (use conv=noerror), then fsck'ing THAT and copying it back,
or dd'ing it off to tape and back in to rewrite the bad blocks and then
fsck'ing it, etc.  The concept is to get it to the point where you can
get the data off to somewhere else and re-mkfs the file system, perhaps
after reformatting the bad spots on the disk.

Then go have a beer and plan a more robust backup scheme.
	- Brian