[comp.unix.questions] How to recover from fsck "Cannot read block"?

bob@acornrc.UUCP (Bob Weissman) (07/14/87)

Argh!  fsck tells us "CANNOT READ: BLK 291344".  According to the fsck
documentation, this is a "shouldn't happen" kind of error.  Great.  It
also says to call for a guru.  None of our local gurus know what do do
about this.  We could run badsect and just give up on the block, but we
don't know how to trnslate a block number to a sector number.

Thanks for any assistance.

-- 
Bob Weissman
Internet:	bob@acornrc.UUCP
UUCP:		...!{ ames | decwrl | oliveb | apple }!acornrc!bob
Arpanet:	bob%acornrc.UUCP@AMES.ARPA

mangler@cit-vax.Caltech.Edu (System Mangler) (07/14/87)

In article <412@acornrc.UUCP> bob@acornrc.UUCP (Bob Weissman) writes:
>Argh!	fsck tells us "CANNOT READ: BLK 291344".  According to the fsck

On a 4.[23] BSD VAX, that number is the last block-device block on a
standard "h" partition of 291346 sectors.  There aren't BLKDEV_IOSIZE
bytes left in the partition, so you get an error.  Make sure that you
fsck the raw device, not the block device.

In article <3224@cit-vax.Caltech.Edu>, sns@tybalt.caltech.edu (Samuel N. Southard) writes:
> I'm not really a guru,

Agreed.  Please accept my apologies for the misinformation from our site.

Don Speck   speck@vlsi.caltech.edu  {seismo,rutgers}!cit-vax!speck

generous@dgis.UUCP (Curtis Generous) (07/14/87)

In article <412@acornrc.UUCP> bob@acornrc.UUCP (Bob Weissman) writes:
>Argh!  fsck tells us "CANNOT READ: BLK 291344".  According to the fsck
>documentation, this is a "shouldn't happen" kind of error.  Great.  It
>also says to call for a guru.  None of our local gurus know what do do
>about this.  We could run badsect and just give up on the block, but we
>don't know how to trnslate a block number to a sector number.
>-- 
>Bob Weissman
>Internet:	bob@acornrc.UUCP
>UUCP:		...!{ ames | decwrl | oliveb | apple }!acornrc!bob
>Arpanet:	bob%acornrc.UUCP@AMES.ARPA

How about providing a bit more background information on this problem, such as:
 
	[] Host type (i.e. VAX 11/780, PDP, etc...)
	[] OS type (i.e. UNIX 4.2BSD, Ultrix, etc...)
	[] Disk drive type (i.e. 9766, eagle, etc...)
	[] Controller type.
	[] Driver used.
	[] Partition in which problem occurs (slice a, b, c, etc..)

		   etc...

We have experienced a similar 'symptom' here, but without further info, 
cannot say if these problems are related.

--curtis
Curtis C. Generous
Lawrence Livermore National Labs
ARPA: generous@lll-tis.ARPA
UUCP: {seismo,vrdxhq}!dgis!generous

bob@acornrc.UUCP (Bob Weissman) (07/14/87)

In article <254@dgis.UUCP>, generous@dgis.UUCP (Curtis Generous) writes:
> In article <412@acornrc.UUCP> bob@acornrc.UUCP (Bob Weissman) writes:
> >Argh!  fsck tells us "CANNOT READ: BLK 291344".  According to the fsck
> >documentation, this is a "shouldn't happen" kind of error.  Great.  It
> >also says to call for a guru.  None of our local gurus know what do do
> >about this.  We could run badsect and just give up on the block, but we
> >don't know how to trnslate a block number to a sector number.
> 
> How about providing a bit more background information on this problem, such as:
> 	[] Host type (i.e. VAX 11/780, PDP, etc...)
> 	[] OS type (i.e. UNIX 4.2BSD, Ultrix, etc...)
> 	[] Disk drive type (i.e. 9766, eagle, etc...)
> 	[] Controller type.
> 	[] Driver used.
> 	[] Partition in which problem occurs (slice a, b, c, etc..)

Yes, I should have been more specific.
	Host: 		VAX 11/750
	OS:		Unix 4.2bsd
	Disk:		Eagle
	Controller: 	?  I dunno about these.
	Driver:		?
	Partition:	/dev/hp1h

df output:
Filesystem    kbytes    used   avail capacity  Mounted on
/dev/hp1h     140564  117733    8774    93%    /util


-- 
Bob Weissman
Internet:	bob@acornrc.UUCP
UUCP:		...!{ ames | decwrl | oliveb | apple }!acornrc!bob
Arpanet:	bob%acornrc.UUCP@AMES.ARPA

mjb%hoosier.uucp@utah-gr.UUCP (Mark J. Bradakis) (07/14/87)

In article <412@acornrc.UUCP> bob@acornrc.UUCP (Bob Weissman) writes:
>Argh!  fsck tells us "CANNOT READ: BLK 291344".  According to the fsck
>...

Why, just last week I had the same problem.  This was on an HP 9000 model
350, with a 7945 disk drive.  Under 5.22 HP-UX, I got:

CANNOT READ: BLK 123456 (or some such)
CONTINUE?

I answered yes, then got

CANNOT SEEK: BLK 123456
CONTINUE?

I answered yes again, and continued the fsck.  It turned out that some
of the disk was saved, but a few dirs (unreferenced dir, name=/bin remove?)
disappeared.

I had originally thought it was a hardware error, but now the disk is fine.
Of course, after I finally got what I could from the disk I did a reinit
and built a new filesystem on it just in case.  The disk works fine now.

mjb.
---------------
mjb%hoosier@cs.utah.edu

"I take this medicine as prescribed, I'll sleep when I'm dead.
 It don't matter if I get a little tired, I'll sleep when I'm dead."

                                            Warren "Excitable" Zevon

chris@gargoyle.UChicago.EDU (Chris Johnston) (07/15/87)

In article <3225@cit-vax.Caltech.Edu> mangler@cit-vax.UUCP writes:
>In article <412@acornrc.UUCP> bob@acornrc.UUCP (Bob Weissman) writes:
>>Argh!	fsck tells us "CANNOT READ: BLK 291344".

>On a 4.[23] BSD VAX, that number is the last block-device block on a
>standard "h" partition of 291346 sectors.  There aren't BLKDEV_IOSIZE
>bytes left in the partition, so you get an error.  Make sure that you
>fsck the raw device, not the block device.

This is a bug pure and simple.

I have had this happen to me on a vax 730 and 750 on an r80, ra81,
and an eagle running Berkeley 4.2 and 4.3.  The block in question is
always a directory data block located near the end of a partition.

Note: One cannot run fsck on the raw root device!

cj

chris@mimsy.UUCP (Chris Torek) (07/15/87)

>In article <3225@cit-vax.Caltech.Edu> mangler@cit-vax.UUCP writes:
>>... There aren't BLKDEV_IOSIZE bytes left in the partition, so you
>>get an error.  Make sure that you fsck the raw device, not the block
>>device.

In article <693@gargoyle.UChicago.EDU> chris@gargoyle.UChicago.EDU
(Chris Johnston) writes:
>This is a bug pure and simple.

Pure perhaps: but not so simple.

>Note: One cannot run fsck on the raw root device!

True; but then, Berkeley's distributions always have 15884 sectors
in the root file system, or some other multiple of 4, since
BLKDEV_IOSIZE is 2048 and sectors are 512 bytes.  If you are willing
to make block I/O more expensive, you can make arbitrary block file
system sizes work by changing BLKDEV_IOSIZE to 512.  Alternatively,
you can make sure all your file systems are multiples of four
sectors.
-- 
In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 7690)
Domain:	chris@mimsy.umd.edu	Path:	seismo!mimsy!chris

erik@retix.retix.COM (Erik Forsberg) (07/17/87)

In article <415@acornrc.UUCP> bob@acornrc.UUCP (Bob Weissman) writes:
>
>Yes, I should have been more specific.
>	Host: 		VAX 11/750
>	OS:		Unix 4.2bsd
>	Disk:		Eagle
>	Controller: 	?  I dunno about these. (Emulex SC750)
>	Driver:		? (standard BSD 4.2 hp driver)
>	Partition:	/dev/hp1h or hp0h or ..
>
I have seen this many times too. It seems to consistently happens when
doing a full restore. newfs /dev/hp0h eagle followed by a restore -r
of a full level 0 dump. Seems to me it must be a bug in the file system.

The only method of recovering I could figure out was using ncheck followed
by icheck to figure out first which inode contains the bad block number
reference. The we can figure out which file it was. In my case it was
always a directory (/sys/mdec). After that, clri the bad inode, run fsck
to fix the damage, restore the lost directory and everything is fine.

-- 
----------------------------------------------------------------------------
Erik Forsberg, Retix, 2644 30th Street, Santa Monica CA 90405 (213) 399-2200
UUCP: {bradley,hao,litvax,trwrb,sdcrdcf,ucla-cs,ucsbcsl}!cepu!retix!erik