[comp.bugs.4bsd] 4.[23]bsd dump misses last inode

das@eplunix.UUCP (Dave Steffens) (06/23/87)

The following bug fix to src/etc/dump/dumptraverse.c
is hereby RETRACTED for 4.2bsd and 4.3bsd.
Do NOT apply this fix -- it is NOT correct!

The bug *does* exist in Ultrix 1.1 dump.
I'm not sure about later versions of Ultrix.
See follow-up postings for additional details.

In article <350@eplunix.UUCP> to comp.bugs.4bsd I wrote:

>	src/etc/dump/dumptraverse.c	4.2bsd 4.3bsd Ultrix	FIX
>
>	This bug was found in dump(8) on Ultrix 1.1 but is known
>	to be present in dump(8) on 4.2bsd and 4.3bsd as well.
>
>	The highest-numbered inode (maxino) on a filsys is _never_
>	written to tape by dump(8) because someone had a problem
>	with the range of a for-loop (see Fix, below).  Maxino is
>	calculated correctly, but the for-loop test clause is written
>	incorrectly such that dump(8) stops processing inodes at maxino-1.

This fix was also reported in article <8706181839.AA14748@okeeffe.Berkeley.EDU>
to comp.bugs.4bsd.ucb-fixes by bostic@OKEEFFE.BERKELEY.EDU (Keith Bostic) as:
> Subject: V1.28 (dump fails to write highest numbered inode to tape)

Keith's posting was presumably based on my earlier posting.
I have notified Berkeley that my original posting was in error
and I hope a retraction will soon be forthcoming from them as well.

Apologies to everyone who has been inconvenienced by my mistake
and most especially to Kirk McKusic and Keith Bostic.

-- 
{harvard,mit-eddie,think}!eplunix!das		David Allan Steffens
243 Charles St., Boston, MA 02114		Eaton-Peabody Laboratory
(617) 573-3748					Mass. Eye & Ear Infirmary

das@eplunix.UUCP (Dave Steffens) (06/23/87)

In article <358@eplunix.UUCP>, I (sheepishly) wrote:
> The following bug fix to src/etc/dump/dumptraverse.c
> is hereby RETRACTED for 4.2bsd and 4.3bsd.
> Do NOT apply this fix -- it is NOT correct!
> 
> The bug *does* exist in Ultrix 1.1 dump.
> I'm not sure about later versions of Ultrix.
> 
> This fix was reported in article <8706181839.AA14748@okeeffe.Berkeley.EDU>
> to comp.bugs.4bsd.ucb-fixes by bostic@OKEEFFE.BERKELEY.EDU (Keith Bostic):
> > Subject: V1.28 (dump fails to write highest numbered inode to tape)
> 
> Keith's posting was presumably based on my earlier posting.
> I have notified Berkeley that my original posting was in error
> and I hope a retraction will soon be forthcoming from them as well.
> 
> Apologies to everyone who has been inconvenienced by my mistake
> and most especially to Kirk McKusic and Keith Bostic.

Now I'm going to tell you why the fix is *wrong* and how it happened.
The blame is all mine.  Credit goes to Ken Lalonde at U of Waterloo
for putting me on the right track (see below).  I've posted a separate
article since I presume not everyone will be interested in the details.

Fortunately, tapes made with the modified dump will not be corrupted,
at least according to my tests.  In some cases there will be a garbage
inode written on the end of the tape which will have a bogus ino = maxino+1.
Since this inode never appears in a directory, it will never be extracted.
However, restore will report "expected next file xxx, got yyy", where xxx
is the value of maxino computed by restore from spcl.c_count in the clrmap
header and yyy is the bogus inode number.  This error is non-fatal.
There is more trouble if this garbage inode appears to be a directory.
Restore reports "cannot find directory inode yyy", where yyy is again
the bogus inode number.  This error is fatal.  In both cases, restore seems
to be otherwise successful because it finishes processing all the real inodes
before it hits the bogus one (N.B. directory case only lightly tested).

What happened was that I found (and fixed) a *real* bug in the Ultrix
1.1 version of dump.  All of my testing was done on this version of dump.
The code in the 4.[23]bsd versions is similar (but not identical).
Therefore, I *assumed* that the fix to the Ultrix version was directly
applicable to the 4.[23]bsd versions -- but unfortunately I didn't *test*
my assumptions.  Had I tested, I would have immediately seen the restore
errors and taken a closer look at my supposed fix.  So much for any claims
I might have made for following good programming practice.

My posting was for 4.3bsd because I assumed Ultrix 1.1 had long-since
been consigned to the dustbin of history.  Another bad mistake.

Then on Saturday I received this note:
> From harvard!seismo!watmath!orchid!kwlalonde Sat Jun 20 03:41:16 1987
> Date: Sat, 20 Jun 87 00:21:51 EDT
> From: Ken Lalonde <harvard!seismo!watmath!orchid!kwlalonde>
>
> The for loop in the stock dumptraverse.c looks like
> 	
> 	for (ino = 0; ino < maxino; ) {
> 		...
> 		ino++
> 		...
> 			(*fn)(getino(ino));
> 		...
> 	}
>
> The loop runs from 0..maxino-1, so getino is
> called with arguments 1..maxino.  That sounds correct to me.
> If the loop is changed to run from 0..maxino, the last
> time around we call getino(maxino+1).  This will eventually
> call bread, and bread will notice that it is attempting
> to read past the end of the fs.  There is a kludge for that case in bread,
> so nothing is really read, no error message is produced, and everything
> looks okay.  So it seems to me that the original code is correct.

Since then I have been engaged in extensive testing of all versions
of dump, both with and without my bug fix.  My strategy was to create
a filsys with only 128 inodes on a spare root partition and install
one hundred files on it.  This is enough to cause the allocation
of the higest-numbered inode.  Then I ran various combinations of dump
and restore, as well as looking at the output of dump with od.
Ken is indeed correct, and so is the 4.[23]bsd dump code.  The Ultrix
version of dump is not (1.1 at least -- I have no more recent version).

If my bug fix is installed in the 4.3bsd version of dump, the kludge
in bread() Ken mentions will potentially cause a bogus inode = maxino+1
to be written to tape.  This occurs because bread() will keep trying
to read past the end of the filsys in smaller and smaller pieces
until the count goes to zero.  Then it will return success and getino()
will return a pointer to the first inode in the last inode block read in.
If said inode is allocated, a bogus copy of it will be written on the end
of the tape with ino = maxino+1.  This is most troublesome when the bogus
inode appears to be a directory, less so if it appears to be a file.
Restore gets confused but apparently still manages to do the right thing
since the bogus inode appears *after* all the real ones have been processed.

For what it's worth, the Ultrix 1.1 pass() looks like: 
	if (map == NULL)
		for (ino = 0; ino < maxino; ino++)
			(*fn)(getino(ino));
	else
	for (ino = 0; ino < maxino; ) {
		...
		ino++;
		if(bits & 1) {
			dp = getino(ino);
			(*fn)(dp);
		}
	}

The bug is in the first for loop -- "map" is NULL on the first call
to pass() which in turn calls mark() with each inode via "fn".
Someone at DEC tried to be clever but introduced a bug instead.  As written,
the first for loop fails to mark the higest-numbered inode on the filsys.
Then later calls to pass() don't see said inode because it wasn't marked.
This is the bug that caused the original problem I saw and reported.
The fix is clearly the one I posted, namely to make the 1st for loop read:
	for (ino = 0; ino <= maxino; ino++)

The second for loop (which is similar in 4.[23]bsd) is correct as is.
However, the code is a bit obscure -- the loop appears to count from 0
to maxino-1 but because of the odd placement of "ino++", it actually runs
over inodes 1 thru maxino.  I guess I didn't really look at the inards
of this loop very carefully since I was so confident that I understood
the problem and had a *working* fix.  I know this bit of obscurantism
was what misled me; it seems that it misled someone at DEC also.

Kudos to Ken Lalonde for catching it!  Sackcloth and ashes for me.
I will repeat a thousand times... test it! test it!! test it!!!
I really am sorry about all this.  Don't quite know what else to say.
Guess I shouldn't be playing unix guru if I can't really cut it.

-- 
{harvard,mit-eddie,think}!eplunix!das		David Allan Steffens
243 Charles St., Boston, MA 02114		Eaton-Peabody Laboratory
(617) 573-3748					Mass. Eye & Ear Infirmary