[comp.sys.sgi] Dangling inode problem

dhinds@portia.Stanford.EDU (David Hinds) (12/02/90)

I'm not sure if this is strictly an Irix problem, but I certainly seem to
have found a file system bug.  After having to power down our machine a
few days ago to restart it, the resulting fsck activity produced a directory
entry in /lost+found that does not point to anything.  A file called '000258'
shows up if I do 'ls', but 'ls -l' complains that the file does not exist.
I have done everything I could think of to get rid of this dangling entry -
'rm', 'unlink', etc. all fail.  I can't create another file of the same name
on top of it.  I am completely at a loss - I suspect that since 'fsck' made
it, it won't be able to undo the damage.  This is a real pain because any
recursive search of the directory tree returns errors - 'find', 'bru', etc
all return failure statuses - which is screwing up all sorts of shell scripts.
How do I get rid of this dang thing?

 -David Hinds
  dhinds@cb-iris.stanford.edu

srp@babar.mmwb.ucsf.edu (Scott R. Presnell) (12/02/90)

dhinds@portia.Stanford.EDU (David Hinds) writes:

>few days ago to restart it, the resulting fsck activity produced a directory
>entry in /lost+found that does not point to anything.  A file called '000258'
>shows up if I do 'ls', but 'ls -l' complains that the file does not exist.
>I have done everything I could think of to get rid of this dangling entry -
>'rm', 'unlink', etc. all fail.  I can't create another file of the same name

[...]

>How do I get rid of this dang thing?

I had the exact same thing happen to me.  While not an optimal fix, another
reboot (and implicit fsck) cleared away the reference.

	- Scott
--
Scott Presnell				        +1 (415) 476-9890
Pharm. Chem., S-926				Internet: srp@cgl.ucsf.edu
University of California			UUCP: ...ucbvax!ucsfcgl!srp
San Francisco, CA. 94143-0446			Bitnet: srp@ucsfcgl.bitnet

bh@sgi.com (Bent Hagemark) (12/05/90)

In article <srp.660086900@babar.mmwb.ucsf.edu> srp@babar.mmwb.ucsf.edu (Scott R. Presnell) writes:
>dhinds@portia.Stanford.EDU (David Hinds) writes:
>
>>few days ago to restart it, the resulting fsck activity produced a directory
>>entry in /lost+found that does not point to anything.  A file called '000258'
>>shows up if I do 'ls', but 'ls -l' complains that the file does not exist.
>>I have done everything I could think of to get rid of this dangling entry -
>>'rm', 'unlink', etc. all fail.  I can't create another file of the same name
>
>[...]
>
>>How do I get rid of this dang thing?
>
>I had the exact same thing happen to me.  While not an optimal fix, another
>reboot (and implicit fsck) cleared away the reference.
>
>	- Scott
>--
>Scott Presnell				        +1 (415) 476-9890
>Pharm. Chem., S-926				Internet: srp@cgl.ucsf.edu
>University of California			UUCP: ...ucbvax!ucsfcgl!srp
>San Francisco, CA. 94143-0446			Bitnet: srp@ucsfcgl.bitnet


Yes, fsck properly clears the reference, and yes nothing else can.
The problem is that the directory entry refers to an inode which
is deallocated.  The kernel EFS code can't even namei() this name much
less unlink(2), open(2)...

The bug which creates such errant directory entries has been fixed and
is available in The Next Release.

Bent

sgf@cs.brown.edu (Sam Fulcomer) (12/05/90)

In article <1990Dec5.032329.5531@odin.corp.sgi.com> bh@sgi.com (Bent Hagemark) writes:
>
>Yes, fsck properly clears the reference, and yes nothing else can.

Well, let's be fair now... It wouldn't be hard to edit the directory (by 
some means other than fsck) directly through the disk driver. It's just that,
since most OS implementations somewhat prudishly disallow write() to dir,
no other approach can work. (except perhaps a good, swift kick) 


_/**/Sam

vjs@rhyolite.wpd.sgi.com (Vernon Schryver) (12/07/90)

In article <1990Dec5.032329.5531@odin.corp.sgi.com>, bh@sgi.com (Bent Hagemark) writes:
> 
> Yes, fsck properly clears the reference, and yes nothing else can.
> The problem is that the directory entry refers to an inode which
> is deallocated.  The kernel EFS code can't even namei() this name much
> less unlink(2), open(2)...
> 
> The bug which creates such errant directory entries has been fixed and
> is available in The Next Release.


In 3.3.1 and 3.3.2 you can sometimes get bogus files in lost+found that you
cannot get rid of, and that fsck refuses to destroy.  Just this morning, a
couple of previously vital inodes in / were turned into such zombies on my
personnal workstation by a probably hardware failure in a VME network
board.  Similar problems have happened to sgi.sgi.com.

My solution is to use explosives.  Unlink the node (with unlink not rm),
clri the inode (having correctly determined the i-number and special device
name), and then reboot.  This morning, that did not work because the
mini-root kernel would hang trying to update the completely bogus inode, so
on the zillionth reboot, I clri'ed them and then pushed the reset button.

Please note that this sort of deletion is effective and NOT recommended.
A typo can leave you cursing while you look for backup tapes.

Fsck for The Next Release continues to be improved, so it might be able to
kill more such zombies.


Vernon Schryver,   vjs@sgi.com