das@eplunix.UUCP (Dave Steffens) (06/23/87)
The following bug fix to src/etc/dump/dumptraverse.c is hereby RETRACTED for 4.2bsd and 4.3bsd. Do NOT apply this fix -- it is NOT correct! The bug *does* exist in Ultrix 1.1 dump. I'm not sure about later versions of Ultrix. See follow-up postings for additional details. In article <350@eplunix.UUCP> to comp.bugs.4bsd I wrote: > src/etc/dump/dumptraverse.c 4.2bsd 4.3bsd Ultrix FIX > > This bug was found in dump(8) on Ultrix 1.1 but is known > to be present in dump(8) on 4.2bsd and 4.3bsd as well. > > The highest-numbered inode (maxino) on a filsys is _never_ > written to tape by dump(8) because someone had a problem > with the range of a for-loop (see Fix, below). Maxino is > calculated correctly, but the for-loop test clause is written > incorrectly such that dump(8) stops processing inodes at maxino-1. This fix was also reported in article <8706181839.AA14748@okeeffe.Berkeley.EDU> to comp.bugs.4bsd.ucb-fixes by bostic@OKEEFFE.BERKELEY.EDU (Keith Bostic) as: > Subject: V1.28 (dump fails to write highest numbered inode to tape) Keith's posting was presumably based on my earlier posting. I have notified Berkeley that my original posting was in error and I hope a retraction will soon be forthcoming from them as well. Apologies to everyone who has been inconvenienced by my mistake and most especially to Kirk McKusic and Keith Bostic. -- {harvard,mit-eddie,think}!eplunix!das David Allan Steffens 243 Charles St., Boston, MA 02114 Eaton-Peabody Laboratory (617) 573-3748 Mass. Eye & Ear Infirmary
das@eplunix.UUCP (Dave Steffens) (06/23/87)
In article <358@eplunix.UUCP>, I (sheepishly) wrote: > The following bug fix to src/etc/dump/dumptraverse.c > is hereby RETRACTED for 4.2bsd and 4.3bsd. > Do NOT apply this fix -- it is NOT correct! > > The bug *does* exist in Ultrix 1.1 dump. > I'm not sure about later versions of Ultrix. > > This fix was reported in article <8706181839.AA14748@okeeffe.Berkeley.EDU> > to comp.bugs.4bsd.ucb-fixes by bostic@OKEEFFE.BERKELEY.EDU (Keith Bostic): > > Subject: V1.28 (dump fails to write highest numbered inode to tape) > > Keith's posting was presumably based on my earlier posting. > I have notified Berkeley that my original posting was in error > and I hope a retraction will soon be forthcoming from them as well. > > Apologies to everyone who has been inconvenienced by my mistake > and most especially to Kirk McKusic and Keith Bostic. Now I'm going to tell you why the fix is *wrong* and how it happened. The blame is all mine. Credit goes to Ken Lalonde at U of Waterloo for putting me on the right track (see below). I've posted a separate article since I presume not everyone will be interested in the details. Fortunately, tapes made with the modified dump will not be corrupted, at least according to my tests. In some cases there will be a garbage inode written on the end of the tape which will have a bogus ino = maxino+1. Since this inode never appears in a directory, it will never be extracted. However, restore will report "expected next file xxx, got yyy", where xxx is the value of maxino computed by restore from spcl.c_count in the clrmap header and yyy is the bogus inode number. This error is non-fatal. There is more trouble if this garbage inode appears to be a directory. Restore reports "cannot find directory inode yyy", where yyy is again the bogus inode number. This error is fatal. In both cases, restore seems to be otherwise successful because it finishes processing all the real inodes before it hits the bogus one (N.B. directory case only lightly tested). What happened was that I found (and fixed) a *real* bug in the Ultrix 1.1 version of dump. All of my testing was done on this version of dump. The code in the 4.[23]bsd versions is similar (but not identical). Therefore, I *assumed* that the fix to the Ultrix version was directly applicable to the 4.[23]bsd versions -- but unfortunately I didn't *test* my assumptions. Had I tested, I would have immediately seen the restore errors and taken a closer look at my supposed fix. So much for any claims I might have made for following good programming practice. My posting was for 4.3bsd because I assumed Ultrix 1.1 had long-since been consigned to the dustbin of history. Another bad mistake. Then on Saturday I received this note: > From harvard!seismo!watmath!orchid!kwlalonde Sat Jun 20 03:41:16 1987 > Date: Sat, 20 Jun 87 00:21:51 EDT > From: Ken Lalonde <harvard!seismo!watmath!orchid!kwlalonde> > > The for loop in the stock dumptraverse.c looks like > > for (ino = 0; ino < maxino; ) { > ... > ino++ > ... > (*fn)(getino(ino)); > ... > } > > The loop runs from 0..maxino-1, so getino is > called with arguments 1..maxino. That sounds correct to me. > If the loop is changed to run from 0..maxino, the last > time around we call getino(maxino+1). This will eventually > call bread, and bread will notice that it is attempting > to read past the end of the fs. There is a kludge for that case in bread, > so nothing is really read, no error message is produced, and everything > looks okay. So it seems to me that the original code is correct. Since then I have been engaged in extensive testing of all versions of dump, both with and without my bug fix. My strategy was to create a filsys with only 128 inodes on a spare root partition and install one hundred files on it. This is enough to cause the allocation of the higest-numbered inode. Then I ran various combinations of dump and restore, as well as looking at the output of dump with od. Ken is indeed correct, and so is the 4.[23]bsd dump code. The Ultrix version of dump is not (1.1 at least -- I have no more recent version). If my bug fix is installed in the 4.3bsd version of dump, the kludge in bread() Ken mentions will potentially cause a bogus inode = maxino+1 to be written to tape. This occurs because bread() will keep trying to read past the end of the filsys in smaller and smaller pieces until the count goes to zero. Then it will return success and getino() will return a pointer to the first inode in the last inode block read in. If said inode is allocated, a bogus copy of it will be written on the end of the tape with ino = maxino+1. This is most troublesome when the bogus inode appears to be a directory, less so if it appears to be a file. Restore gets confused but apparently still manages to do the right thing since the bogus inode appears *after* all the real ones have been processed. For what it's worth, the Ultrix 1.1 pass() looks like: if (map == NULL) for (ino = 0; ino < maxino; ino++) (*fn)(getino(ino)); else for (ino = 0; ino < maxino; ) { ... ino++; if(bits & 1) { dp = getino(ino); (*fn)(dp); } } The bug is in the first for loop -- "map" is NULL on the first call to pass() which in turn calls mark() with each inode via "fn". Someone at DEC tried to be clever but introduced a bug instead. As written, the first for loop fails to mark the higest-numbered inode on the filsys. Then later calls to pass() don't see said inode because it wasn't marked. This is the bug that caused the original problem I saw and reported. The fix is clearly the one I posted, namely to make the 1st for loop read: for (ino = 0; ino <= maxino; ino++) The second for loop (which is similar in 4.[23]bsd) is correct as is. However, the code is a bit obscure -- the loop appears to count from 0 to maxino-1 but because of the odd placement of "ino++", it actually runs over inodes 1 thru maxino. I guess I didn't really look at the inards of this loop very carefully since I was so confident that I understood the problem and had a *working* fix. I know this bit of obscurantism was what misled me; it seems that it misled someone at DEC also. Kudos to Ken Lalonde for catching it! Sackcloth and ashes for me. I will repeat a thousand times... test it! test it!! test it!!! I really am sorry about all this. Don't quite know what else to say. Guess I shouldn't be playing unix guru if I can't really cut it. -- {harvard,mit-eddie,think}!eplunix!das David Allan Steffens 243 Charles St., Boston, MA 02114 Eaton-Peabody Laboratory (617) 573-3748 Mass. Eye & Ear Infirmary