[comp.sys.att] Sys V inode bug

tfb@unf7.UUCP (t blakely) (08/10/89)

We've been bitten several time in the last few months by the infamous
System V disappearing inode bug.  To make things worse, we have found
that fsck, instead of recovering the "lost" inodes, trashed the file
system.  The first time we run out of inodes, fsck appears to fix the
problem.  The next time, fsck starts deleting files and directories.
It never completes without dup table overflows and running out of space
in lost+found.  Subsequent runs of fsck destroy more and more of the
file system until eventually nothing is left.  fsck never completes
without errors after the first time.  This is our news partition, so
it gets lots of activity.  It's a full 147 MB partition on a SCSI
drive on a 3B2/500 running V.3.1.  I made the file system originally
with 60000 inodes because we _really_ ran out using the default
number (full news feed, two week retention).

If anyone has _any_ solution, workaround, whatever for this problem,
I'd love to hear it.  Is fsck buggy or am I causing the problem with
too many inodes (how does it ever work on _really_ big file systems?)
Maybe I should split the disk into 2 file systems, but that would
complicate news a bit.  I've only been working with SysV for about a
year now, but in several years on Berkeley systems I've never had any
problems like this.  I understand that the "disappearing inode" bug
has been in Sys V since before the beginning.  If so, what's wrong
with AT&T (aside from the obvious) that they can't fix something
like this?

Tom Blakely
Univ. of North Fla.
(904)646-2820
uflorida!unf7!tfb
tfb@unfvm.BITNET

len@netsys.Netsys.COM (Len Rose) (08/11/89)

AT&T has released patches for this problem, it's available from
the Hotline.. You shouldn't just bash on them without doing the
obvious (calling to see if fixes exist..) I believe their fixes
were released in 10/88 .. This takes care of the problem in 3.0,
3.1,and 3.2

 "cat /usr/options/inode_fix.name"

Fix for out of inodes problem in 3.2 -- 10/10/88

Len

botton@laidbak.UUCP (Brian D. Botton) (08/11/89)

In article <213@unf7.UUCP> tfb@unf7.UUCP (t blakely) writes:
>. . . .  Is fsck buggy or am I causing the problem with . . . . . . .
          ^^^^^^^^^^^^^

  I lead a team of people that manages a network of 4 VAXen, an Alliant,
and ~300 Suns with 3 of the VAXen running Sys V.2.0.2, and I must state
that I don't trust the Sys V fsck.  More than once I've run fsck after a
crash and had it claim that the file systems are okay.  Then I'll cd into
a directory, do an ls, and get the following (or at least close to this):

	ls: . not found

  That's right, a directory without . and .. :-(.  Now maybe I'm expecting
too much, but I really think fsck should be able to do better than this.
To be honest, I haven't had the time to dig into why fsck is so brain
damaged.  Thoses VAXen have been around a long time and some are going away.
The rest are getting upgraded to V.3.1.1 and Mt. Xinu 4.3 (we are running
a beta 4.2 BSD now, current aren't we, :-).

>. . . . . . .  I've only been working with SysV for about a
>year now, but in several years on Berkeley systems I've never had any
						    ^^^^^^^^^^^^^^^^^^
>problems like this. . . . . . .
 ^^^^^^^^^^^^^^

  Neither have I, in fact, I trust the Berkeley fsck as much as I distrust
the Sys V fsck.

>. . . . . . . . . .If so, what's wrong
				  ^^^^^
>with AT&T (aside from the obvious) that they can't fix something
 ^^^^^^^^^
>like this?

  Careful, the phone police may show up at your door, ;-).

  I know that this hasn't helped your problem, but I thought you'de like
to know that you're not alone with fsck problems.

-- 
     ...     ___
   _][_n_n___i_i ________		Brian D. Botton
  (____________I I______I		laidbak!botton
  /ooOOOO OOOOoo  oo oooo

friedl@vsi.COM (Stephen J. Friedl) (08/12/89)

In article <2592@laidbak.UUCP>, botton@laidbak.UUCP (Brian D. Botton) writes:
>  More than once I've run fsck after a
> crash and had it claim that the file systems are okay.  Then I'll cd into
> a directory, do an ls, and get the following (or at least close to this):
> 
> 	ls: . not found
> 
>   That's right, a directory without . and .. :-(.  Now maybe I'm expecting
> too much, but I really think fsck should be able to do better than this.

For what it's worth, Sys V fsck advertises that it doesn't look for
this (although IMHO they certainly should).   "It's not a bug, it's
a feature".

     Steve

-- 
Stephen J. Friedl / V-Systems, Inc.  /  Santa Ana, CA  / +1 714 545 6442 
3B2-kind-of-guy   / {attmail uunet}!vsi!{bang!}friedl  /  friedl@vsi.com

"My new bestseller, _Teach_Yourself_to_Read_, is now available everywhere" -me

tfb@unf7.UUCP (t blakely) (08/18/89)

Thanks to all who responded to my recent query about the disappearing inodes.
I've obtained a diskette from AT&T (free of charge, even) containing the fix.
I'll install it soon and hope for the best.  It is apparently available via
the hotline to anyone experiencing such a problem.

Tom Blakely
Univ. of North Fla.
gatech!uflorida!unf7!tfb