[comp.unix.microport] fsck-enhanced disk corruption

mem@zinn.MV.COM (Mark E. Mallett) (07/12/89)

If you have a computer in Southern NH, or at least the parts that I am
familiar with, you have to deal with some pretty abysmal power
situations.

My /usr/spool filesystem has been in rough shape since a power event
caught it by surprise a few weeks ago.  At the time, I observed that
running fsck simply made it worse.  After a lot of hours with fsdb, I
had it patched up, and off it went.

Until the next time, the other day.  This time, fsck made it
consistently worse, and I once again I went at it with fsdb.  I never
did get it properly patched -- the number of problems were enormous,
and fsck simply added to them each time I ran it.  I gave up, and
reformatted the partition.  But from both fsdb sessions, it was clear
that fsck was doing a very bad thing.  It was writing the (er, "a")
free block list into blocks that were already allocated.  Many of
these were directory blocks.

Since each pass of fsck only increased the problem, I kept getting more
and more references to the file with inode 50.  (Each full free list block
starts out with the number 50, for the number of free blocks identified
therein.  It happens that that looks like a valid directory entry, if
the block is also part of a directory, which it often was.)

More inspection revealed that TWO parallel free lists seemed to be
being created.  One was written one block behind the other.  It was
this trailing free list that was bad; I found it consistently being
written into valid blocks.  It was not a case of blocks being
accidentally written to two places at once, because the lists were
distinctly linked to their respective next blocks (again, in parallel
one behind the other).  I also do not know whether they were created
at the same time, or whether one was left behind each time.  But as I
say, it was only the trailing one that overlaid existing, allocated
blocks.

There is nothing unusual about this filesystem, except perhaps that
it is on the second drive, and has many more inodes than others.
(It is, after all, the news disk.)

The disturbing part is that fsck was the culprit, and I don't know
when it will strike again.

Any insight will be appreciated.

-mm-
-- 
Mark E. Mallett  Zinn Computer Co/ PO Box 4188/ Manchester NH/ 03103 
Bus. Phone: 603 645 5069    Home: 603 424 8129     BIX: mmallett
uucp: mem@zinn.MV.COM  (  ...{decvax|elrond|harvard}!zinn!mem   )
Northern MA and Southern NH consultants:  Ask (in mail!) about MV.COM