update is hanging

greg@suntan.viewlogic.com (Gregory Larkin) (03/13/91)

Hi all,

I am trying to build a new libc.a.  My 30 meg partition is
almost full, and I happen to know that there are bad blocks
out in the far reaches of the partition.

When I first made the partition, I ran "readall /dev/hd6" to
find the bad blocks.  I then ran "badblocks" to get rid of 
them (I think!).

I am now building libc.a and part way through the addition to
the archive, it stops.  I hit "F1" and /etc/update has a "C"
flag.  No other activity.  The disk is making sounds like it
does when it's hitting a bld block.  No messages about read
errors and the like have been printed.  After about 15 
minutes, /etc/update unhangs and the archive continues to be
made.

(Assuming this is a disk read/write error),
doesn't badblocks stop other programs, like "ar" from writing
to bad areas by allocating the bad blocks to files?

Is it a problem to mv the .BadXXXX files to a subdirectory
called "BAD".  Does the mv kill the bad blocks information?

Also, when I run fsck after badblocks, I get weird messages
about duplicate zones and missing bitmaps?  Is this normal?

Thanks for any help,


-- 
Greg Larkin (ASIC Engineer)|"This is a fragile ball we are living on; 
Viewlogic Systems, Inc.    |it's a miracle and we are destroying it.."
293 Boston Post Road West  |Peter Garrett, Midnight Oil               
Marlboro, MA 01752  (greg@Viewlogic.COM)

cechew@sol0.cs.monash.edu.au (Earl Chew) (03/14/91)

greg@suntan.viewlogic.com (Gregory Larkin) writes:

>When I first made the partition, I ran "readall /dev/hd6" to
>find the bad blocks.  I then ran "badblocks" to get rid of 
>them (I think!).

The problem may be that you used an `old' version of badblocks which had
problems with large file systems.

>I am now building libc.a and part way through the addition to
>the archive, it stops.  I hit "F1" and /etc/update has a "C"
>flag.  No other activity.  The disk is making sounds like it
>does when it's hitting a bld block.  No messages about read
>errors and the like have been printed.  After about 15 
>minutes, /etc/update unhangs and the archive continues to be
>made.

It could be that you've hit a large swarm of bad blocks. Bad block recovery can
be very slow. From memory, I think that the standard driver doesn't report bad
blocks unless the n-th retry fails (I can't remember what n is precisely). Thus
if the blocks are just weak, after a couple of retries the operation succeeds.
If you have a swarm of them, fs could spend a long time trying to sync().

>(Assuming this is a disk read/write error),
>doesn't badblocks stop other programs, like "ar" from writing
>to bad areas by allocating the bad blocks to files?

Yes. See my comment above. Try to get hold of a new version of badblocks (ie
one with the patch applied to fix the bug).

>Is it a problem to mv the .BadXXXX files to a subdirectory
>called "BAD".  Does the mv kill the bad blocks information?

No. mv simply makes a new directory link to the same inode. The bad block
allocation is in the inode and is thus preserved.

>Also, when I run fsck after badblocks, I get weird messages
>about duplicate zones and missing bitmaps?  Is this normal?

This is almost certainly due to the bug in badblocks. You may also want to add
to the bad block list those which are weak (ie cause several retries before
succeeding).

I hope this helps.

Earl
--
Earl Chew, Dept of Computer Science, Monash University, Australia 3168
EMAIL: cechew@bruce.cs.monash.edu.au PHONE: 03 5655778 FAX: 03 5655146
----------------------------------------------------------------------

ghelmer@dsuvax.uucp (Guy Helmer) (03/16/91)

In <1991Mar12.225454.19910@viewlogic.com> greg@suntan.viewlogic.com (Gregory Larkin) writes:

>Hi all,

>I am trying to build a new libc.a.  My 30 meg partition is
>almost full, and I happen to know that there are bad blocks
>out in the far reaches of the partition.

>When I first made the partition, I ran "readall /dev/hd6" to
>find the bad blocks.  I then ran "badblocks" to get rid of 
>them (I think!).

You must have run the original 1.5 badblocks.  It removes the
wrong blocks from the free zone bitmap when the block numbers are
greater than 8191.

>Also, when I run fsck after badblocks, I get weird messages
>about duplicate zones and missing bitmaps?  Is this normal?

fsck -r /dev/hd? has always fixed badblock's mistakes for me.  If you
have files that ended up on the bad part of the disk, you should rm them
or fsck won't be able to help you out.  de(8 or 1?) can help you
find out if a file has been placed in a bad zone.  It's not normal
for fsck to complain about anything, and if it does, your file system
contains an inconsistency that should be fixed, usually by running
fsck -r.

>Thanks for any help,

>Greg Larkin

-- 
Guy Helmer                       | helmer@sdnet.bitnet
Dakota State University          | dsuvax!ghelmer@wunoc.wustl.edu
(605) 256-5264, (605) 256-2788   | uunet!dsuvax!ghelmer
Ahh, if weddings were as easy to design as software...

dgraham@bmerh451.bnr.ca (Douglas Graham) (03/18/91)

In article <1991Mar15.170118.3983@dsuvax.uucp> ghelmer@dsuvax.uucp (Guy Helmer) writes:
>In <1991Mar12.225454.19910@viewlogic.com> greg@suntan.viewlogic.com (Gregory Larkin) writes:
>
>>Also, when I run fsck after badblocks, I get weird messages
>>about duplicate zones and missing bitmaps?  Is this normal?
>
>fsck -r /dev/hd? has always fixed badblock's mistakes for me.

I haven't seen this new badblocks that everybody is talking about,
but the scariest thing about the one included with 1.5.10 is that
not only does it make a mess of the filesystem on which it is mapping
out the bad blocks, it also screws up the root filesystem as well.
The latter problem occurs because badblocks creates a directory on the
root filesystem as a mount point, and then removes the directory
using unlink() instead of rmdir().  There is code in FS which explicitly
allows root to unlink() a directory, but it doesn't do the complete
job.  I don't know if this is desired behaviour on the part of FS,
but it is certainly a bug in badblocks.

Anyway, like the man says, do an "fsck -r" on both the root filesystem
and the new filesystem immediately after running badblocks, and everything
should be hunky dory.
---
Doug Graham dgraham@bnr.ca	Bell-Northern Research, Ottawa Ontario Canada