[net.bugs.v7] deleting a bad directory

guy@rlgvax.UUCP (Guy Harris) (05/14/84)

    >I took over this host in March and have been trying to get rid of a bad
    >directory which was created in November. The directory cat/file string
    >is '/u17/lost+found/#005628'.  I position myself at the directory
    >'lost+found' and then enter "rm -rf *562*" just encase there are any
    >unprintable characters.  The response is 'rmdir: Cannot remove "#005628".
    >No such file or directory.'  Do you have any suggestions on how to clean
    >this out.  Thanks:
    >Herb Overstreet   (DCA-EUR Administrator)

> Try the following:

> (1)  od -c . >foo.c

> (2)  edit foo.c, find the line with the correct entry
>      conviently (under v7 anyway) each directory in the directory
>      takes exactly one line in the od -c output. delete all other lines:
>      delete the first 3 fields (offset and the inode number),
>      delete spaces between characters (there are 3 between valid letters,
>      and 1 between letters that are  < ' ' or > 0176 and not in the standard
>      [\f, \r, \b, \n, \0] set of escapable  and 2 between escapable
>      characters.
>      much like below:

> {typical od -c of a V7 directory}
> offset |inode #|  file name--------------------------------------------
> 0000000 002  \0   .  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0
> 0000020 002  \0   .   .  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0
> 0000040 253  \0   u   u   r   u   n  \0  \0  \0  \0  \0  \0  \0  \0  \0
> 			    .
> 			    .
> 			    .
> 0001020 377 004   #   0   0   5   6   2   8 177  \0  \0  \0  \0  \0  \0
> 			    .
> 			    .
> 			    .

>     Turn the line of interest into a quoted string.
> 	"#005628\177\0\0\0\0\0\0"

It is possible that the actual line in question may look something like:

0001020 377 004   #   0   0   5   6   2   8 177  \0   e   a   t   m   e

which is a directory entry which is unreferenceable.  Such entries can't
be created by the UNIX kernel, but usually entries in "lost+found" aren't
created by the UNIX kernel, but by "fsck", which in V7, 4.1BSD, and System
III (possibly System V also) has a bug in it.  Basically, when it writes
the entry it does *not* clear out what was there before, and if the entry
used to be for a file with a name longer than a typical reconnected file
the junk from the old file name will be left around.  (By the way, the
directory format in all versions of UNIX since V5(!), except for 4.2BSD,
is identical; this includes V5, V6, V7, System X, and 2.xBSD/4.1BSD.)

Since this bug appears in several versions of UNIX, I'm also posting this to
just about every net.bugs.all group that moves.  This *is* fixed in System
V.

The gory details of the bug, for those of you who care:

   "mkentry" fixed; formerly, it would assume that any slot in the lost+found
   directory that had an inumber of zero (i.e., a free slot) had nulls in
   the entire name slot.  If this was not true (which is quite possible, as
   UNIX only clears out the inumber, not the name, when doing an unlink),
   it would make a directory entry with the inumber in decimal, a null, and
   then whatever junk happened to be there before.  This would produce an entry
   which would only compare to a string with a null in the middle; however,
   all name strings may only have a null at the end.  Therefore, this file
   would exist but there would be no way of accessing it with any UNIX
   system call.

And the fix: There's a piece of code in the "mkentry" routine in "fsck.c"
that looks something like:

	p = &dirp->d_name[7];	/* the exact number may be different */
	*--p = 0;

right before a line reading

	while(p > dirp->d_name) {

Change those two lines to:

	p = &dirp->d_name[DIRSIZ];
	while(p > &dirp->d_name[6])	/* change the exact number appropriately */
		*--p = 0;

Now, if you *do* have this problem (unremovable file in "lost+found"), the
solution is to:

	1) Clean all the stuff out of "lost+found" on that file system that
	   you can (i.e., delete it or move it back to its original home).

	2) Get the i-number of "lost+found" by doing an "ls -lid" on it.

	3) Unlink "lost+found", either by using the USG UNIX "/etc/unlink"
	   command or by writing a program to run the "unlink" system call
	   on that "lost+found".

	4) Make a new "lost+found" directory.

	5) Unmount the file system and run the fixed "fsck" on it.  Do *not*
	   reconnect the old "lost+found", but *do* reconnect the unremovable
	   file.

	6) Repeat the "fsck" until the file system is clean.

I believe this problem doesn't exist on 4.2BSD.

	Guy Harris
	{seismo,ihnp4,allegra}!rlgvax!guy

amg@pyuxn.UUCP (Alan M. Gross) (05/15/84)

I managed to fix a lost+found directory entry with a \0 in the middle
of it using fsdb to rename the file.  It was a Sunday evening, and I
even did it without unmounting the file system (possibly bad policy).

		Alan M. Gross
		{ariel,burl,clyde,floyd,
		gamma,harpo,ihnp4,mhuxl}!pyuxn!amg

-- 

		Alan M. Gross
		{ariel,burl,clyde,floyd,
		gamma,harpo,ihnp4,mhuxl}!pyuxn!amg

guy@rlgvax.UUCP (Guy Harris) (05/17/84)

Unfortunately, not all UNIX sites have "fsdb" (I suspect the site that was
having the problem was running 4.1BSD, which doesn't come with "fsdb") and
I haven't used "fsdb" enough to feel comfortable recommending it to novices.

	Guy Harris
	{seismo,ihnp4,allegra}!rlgvax!guy

guy@rlgvax.UUCP (Guy Harris) (05/19/84)

> Guy said that you could just '/etc/unlink' lost+found. It is
> my experience (Unix v7m) that you should FIRST do an unlink on
> lost+found/. Otherwise fsck finds lost+found before it finds
> lost+found/* , re-links lost+found (into the new lost+found),
> and you still have a problem.

That's why the directions 1) said you should do an "ls -lid" on "lost+found"
to get its inumber and 2) said you should tell "fsck" not to reconnect that
inode when it asks - have it clear it instead.  Unlinking "lost+found/."
and then "lost+found" will work (that removes both links to "lost+found",
and UNIX blows away the inode) unless some of the entries in "lost+found"
are directories - especially the unremovable one; in that case, there will
still be links to "lost+found" (as "lost+found/#blahblah/..") and the inode
will hang around.

	Guy Harris
	{seismo,ihnp4,allegra}!rlgvax!guy

jack@vu44.UUCP (05/24/84)

Guy said that you could just '/etc/unlink' lost+found. It is
my experience (Unix v7m) that you should FIRST do an unlink on
lost+found/. Otherwise fsck finds lost+found before it finds
lost+found/* , re-links lost+found (into the new lost+found),
and you still have a problem.
	Jack Jansen, {philabs|decvax}!mcvax!vu44!jack.