[comp.bugs.4bsd] 4.3 Tahoe dump bug

cyrus@pprg.unm.edu (Tait Cyrus) (12/19/88)

In the process of trying to get the 4.3 Tahoe dump running on a Sun 3
running SunOS 3.X, I, along with others, have run into the following
bug (feature) (shown below).

>Writing dump file 0 (/research)
>  DUMP: Date of this level 1 dump: Sat Dec 17 12:59:10 1988
>  DUMP: Date of last level 0 dump: Wed Dec 14 19:08:42 1988
>  DUMP: Dumping /dev/rxy1g (/research) to /dev/rmt1h on host houdini
>  DUMP: mapping (Pass I) [regular files]
>  DUMP: mapping (Pass II) [directories]
>  DUMP: (This should not happen)bread from /dev/rxy1g [block 58766]: count=24, got=512
>  DUMP: (This should not happen)bread from /dev/rxy1g [block 60802]: count=536, got=1024
>				.
>				.
>				.
>  DUMP: (This should not happen)bread from /dev/rxy1g [block 372316]: count=1040, got=1536
>  DUMP: (This should not happen)bread from /dev/rxy1g [block 378344]: count=24, got=512
>  DUMP: More than 32 block read errors from 152660
>  DUMP: This is an unrecoverable error.
>  DUMP: NEEDS ATTENTION: Do you want to attempt to continue?: ("yes" or "no") no
>  DUMP: The ENTIRE dump is aborted.

This error is produced in dumptraverse.c routine bread.  I am having 
a difficult time trying to figure out what the heck this routine is
"supposed" to be doing.  I say there are several bugs in this routine
and that it should look something like the following:

bread(da, ba, cnt)
        daddr_t da;
        char *ba;
        int     cnt;
{
        int n;
	if (lseek(fi, (long)(da * dev_bsize), 0) < 0){
		msg("bread: lseek fails\n");
	}
	while( cnt ) {
	   n = read(fi, ba, cnt);
	   if( n == 0 ) {
	      msg("(This should not happen)bread from %s [block %d]: count=%d, got=%d\n",
		disk, da, cnt, n);
	      broadcast("DUMP IS AILING!\n");
	      msg("This is an unrecoverable error.\n");
	      if (!query("Do you want to attempt to continue?")){
		 dumpabort();
		 /*NOTREACHED*/
	         }
	      }
	   cnt -= n;
	   ba += n;
	   }
}

It currently looks like:

bread(da, ba, cnt)
        daddr_t da;
        char *ba;
        int     cnt;
{
        int n;

loop:
        if (lseek(fi, (long)(da * dev_bsize), 0) < 0){
                msg("bread: lseek fails\n");
        }
        n = read(fi, ba, cnt);
        if (n == cnt)
                return;
        if (da + (cnt / dev_bsize) > fsbtodb(sblock, sblock->fs_size)) {
                /*
                 * Trying to read the final fragment.
                 *
                 * NB - dump only works in TP_BSIZE blocks, hence
                 * rounds `dev_bsize' fragments up to TP_BSIZE pieces.
                 * It should be smarter about not actually trying to
                 * read more than it can get, but for the time being
                 * we punt and scale back the read only when it gets
                 * us into trouble. (mkm 9/25/83)
                 */
                cnt -= dev_bsize;
                goto loop;
        }
	msg("(This should not happen)bread from %s [block %d]: count=%d, got=%d\n",
		disk, da, cnt, n);
	if (++breaderrors > BREADEMAX){
		msg("More than %d block read errors from %d\n",
			BREADEMAX, disk);
		broadcast("DUMP IS AILING!\n");
		msg("This is an unrecoverable error.\n");
		if (!query("Do you want to attempt to continue?")){
			dumpabort();
			/*NOTREACHED*/
		} else
			breaderrors = 0;
	}
}

Am I misinterpreting what this routine is supposed to be doing?
Will my code work?  If not, why?

Thanks

---
W. Tait Cyrus   (505) 277-0806		e-mail: cyrus@pprg.unm.edu
University of New Mexico			
Dept of ECE - Parallel Processing Research Group
Albuquerque, New Mexico 87131

parmelee@wayback.cs.cornell.edu (Larry Parmelee) (12/19/88)

In article <23685@pprg.unm.edu> cyrus@pprg.unm.edu (Tait Cyrus) writes:
> 
> In the process of trying to get the 4.3 Tahoe dump running on a Sun 3
> running SunOS 3.X, I, along with others, have run into the following
> bug (feature) (shown below).

> >  DUMP: mapping (Pass II) [directories]
> >  DUMP: (This should not happen)bread from /dev/rxy1g [block 58766]: count=24, got=512
> >  DUMP: (This should not happen)bread from /dev/rxy1g [block 60802]: count=536, got=1024

[ Lots of "bread" errors on pass 2. ]

Under 4.2 (on which Sun 3.x is based) directories were allowed to
be any old size, whatever was convienient.  Under 4.3, as an efficiency
hack, directories were forced to be multiples of 512 bytes.  The 4.3
dump expects this, and will generate errors like you're seeing when
confronted with an old 4.2 filesystem containing random length directories.
(4.3 fsck will extend any short directories it finds, and thereafter the
kernal will maintain the convention).  

Below is my fix, in file dumptraverse.c, routine dsrch().  I think
this was all I had to change to get the 4.3 dump to work under sun3.5.
But there is one other slight problem-   If you do your dumps on
9-track 6250 bpi tape, the 4.3 dump likes to block the tape records
at 32k/block.  The sun restore program is only expecting 10K blocks.
(other than that, the sun restore seems to work fine with a 4.3 dump
dumptape.)  The solution is to change the definition of HIGHDENSITYTREC
in /usr/include/dumprestore.h to be 10, OR resign yourself to doing
restores with "dd if=/dev/rmt8 ibs=32k obs=10k | restore f -", OR
fix restore.

NOTE:  I don't have 4.3-Tahoe, so there's a little uncertainty here.
Also, I don't know if this is needed with SunOS 4.X, or if maybe
sun 4.X is now following the same directory size conventions.

-Larry Parmelee
parmelee@cs.cornell.edu

*** /tmp/,RCSt1a02738   Mon Dec 19 07:46:46 1988
--- dumptraverse.c      Tue Sep  1 09:38:18 1987
***************
*** 261,267 ****
--- 261,279 ----
                return;
        if (filesize > size)
                filesize = size;
+ #ifndef sun
        bread(fsbtodb(sblock, d), dblk, filesize);
+ #else /* sun */
+       /* Round up filesize to be a multiple of DEV_BSIZE. */
+ #if (DEV_BSIZE != 0) && ((DEV_BSIZE & (DEV_BSIZE - 1)) == 0)
+       /* If DEV_BSIZE is a power of two: */
+       bread(fsbtodb(sblock, d), dblk,
+                       ((filesize+DEV_BSIZE-1) & ~(DEV_BSIZE-1)));
+ #else
+       bread(fsbtodb(sblock, d), dblk,
+                       ((filesize+DEV_BSIZE-1)/DEV_BSIZE)*DEV_BSIZE);
+ #endif
+ #endif        /* sun */
        for (loc = 0; loc < filesize; ) {
                dp = (struct direct *)(dblk + loc);
                if (dp->d_reclen == 0) {

kent@tfd.UUCP (Kent Hauser) (12/19/88)

In article <23685@pprg.unm.edu>, cyrus@pprg.unm.edu (Tait Cyrus) writes:
> 
> In the process of trying to get the 4.3 Tahoe dump running on a Sun 3
> running SunOS 3.X, I, along with others, have run into the following
> bug (feature) (shown below).
> 
> >Writing dump file 0 (/research)
> >  DUMP: Date of this level 1 dump: Sat Dec 17 12:59:10 1988
> >  DUMP: Date of last level 0 dump: Wed Dec 14 19:08:42 1988
> >  DUMP: Dumping /dev/rxy1g (/research) to /dev/rmt1h on host houdini
> >  DUMP: mapping (Pass I) [regular files]
> >  DUMP: mapping (Pass II) [directories]
> >  DUMP: (This should not happen)bread from /dev/rxy1g [block 58766]: count=24, got=512
> >  DUMP: (This should not happen)bread from /dev/rxy1g [block 60802]: count=536, got=1024
> >				.
> >				.
> >				.
> ---

The problem is that some raw device drivers can not read a partial block.
See for instance the SunOS sd(4) man page.  I suspect that this problem
is inherent with SCSI devices.

The 'fix' is to make dump always read the complete frag when
reading the directories.  Attached is my fix:

*** dumptraverse.c~	Fri Nov 18 12:22:18 1988
--- dumptraverse.c	Sat Dec 10 12:21:49 1988
***************
*** 265,271 ****
  		return;
  	if (filesize > size)
  		filesize = size;
! 	bread(fsbtodb(sblock, d), dblk, filesize);
  	for (loc = 0; loc < filesize; ) {
  		dp = (struct direct *)(dblk + loc);
  		if (dp->d_reclen == 0) {
--- 265,272 ----
  		return;
  	if (filesize > size)
  		filesize = size;
! /* change the third bread arg from filesize to size to read whole frag */
! 	bread(fsbtodb(sblock, d), dblk, size);
  	for (loc = 0; loc < filesize; ) {
  		dp = (struct direct *)(dblk + loc);
  		if (dp->d_reclen == 0) {

==============================

> W. Tait Cyrus   (505) 277-0806		e-mail: cyrus@pprg.unm.edu


-- 
Kent Hauser			UUCP: sun!sundc!tfd!kent
Twenty-First Designs		INET: kent%tfd.uucp@sundc.sun.com

tadguy@cs.odu.edu (Tad Guy) (12/19/88)

In article <23642@cornell.UUCP>, parmelee@wayback (Larry Parmelee) writes:
>[At 6250bpi] the 4.3 dump likes to block the tape records at 32k/block.
>The sun restore program is only expecting 10K blocks.
>...
>OR resign yourself to doing restores with 
>	"dd if=/dev/rmt8 ibs=32k obs=10k | restore f -", OR
>fix restore.

Or use the ``b'' key to restore to specify a different blocksize.  No
need to fix restore (well, at least not for block sizes), and this is
probably faster than using dd:  restore bf 32 -

	...tad

-- 
Tad Guy         <tadguy@cs.odu.edu>     Old Dominion University, Norfolk, VA

cudcv@warwick.ac.uk (Rob McMahon) (12/20/88)

In article <23642@cornell.UUCP> parmelee@wayback.cs.cornell.edu (Larry Parmelee) writes:
>In article <23685@pprg.unm.edu> cyrus@pprg.unm.edu (Tait Cyrus) writes:

>> trying to get the 4.3 Tahoe dump running on a Sun 3 running SunOS 3.X, I
>> ... have run into the following bug (feature) (shown below).
>
>> >  DUMP: mapping (Pass II) [directories]
>> >  DUMP: (This should not happen)bread from /dev/rxy1g [block 58766]: count=24, got=512
>> >  DUMP: (This should not happen)bread from /dev/rxy1g [block 60802]: count=536, got=1024
>
>Under 4.2 (on which Sun 3.x is based) directories were allowed to be any old
>size, whatever was convienient.  Under 4.3, as an efficiency hack,
>directories were forced to be multiples of 512 bytes.

I believe this was actually a bug in 4.2.  When directories first got created
they were created short (24 bytes).  After adding the first file they were
extended to 512 bytes, and thereafter stayed a multiple of 512.  (Maybe
someone will correct me on this, my 4.2 memory is beginning to fade, and it
doesn't quite explain 536 bytes, which is suspiciously equal to 512 + 24 ...)

>The 4.3 dump expects this, and will generate errors like you're seeing when
>confronted with an old 4.2 filesystem containing random length directories.

I believe the real problem is that SunOS rounds requests for reads from a raw
device *up* to a multiple of 512 blocks.  Note that dump has asked for 24
bytes but got 512.  I don't know whether is actually puts all this data in
your buffer.  If it does it's horrible because it could overrun a buffer,
whose size you had accurately given.  If it doesn't it's horrible because it's
lying about how many bytes were read into your buffer.  IMHO it should behave
like a tape drive - return the amount asked for and junk the rest of the
block.  Or maybe it would be better to round down for requests greater than
the blocksize ?

>Below is my fix, in file dumptraverse.c, routine dsrch().

I believe your fix, but not your explanation.
-- 
UUCP:   ...!mcvax!ukc!warwick!cudcv	PHONE:  +44 203 523037
JANET:  cudcv@uk.ac.warwick             ARPA:   cudcv@warwick.ac.uk
Rob McMahon, Computing Services, Warwick University, Coventry CV4 7AL, England

dave@acorn.co.uk (Dave Lamkin) (12/21/88)

BSD 4.3 file systems ensured that a directory length was a multiple of
DEV_BSIZE, this is not the case on NFS 3.2 versions of BSD. This causes
problems with fsck and dump.

fsck will find (&hence fix) "errors" with directory lengths from time to
time.

dump when encountering a directory which is not a multiple of 512 bytes long
will fail to read it with the errors as described.  The fix is to round up
the request length to the bread routine to a multiple of DEV_BSIZE. This is
done in the routine dsrch in dumptraverse.c as below. A workaround is to fsck
the disc prior to dumping.

------------------------------------------------------------------------------
dsrch(d, size, filesize)
	daddr_t d;
	int size, filesize;
{
	register struct direct *dp;
	long loc;
	char dblk[MAXBSIZE];

	if(dadded)
		return;
	if (filesize > size)
		filesize = size;
/* +++++++++++++ Fix start ++++++++++
 * Extend the length of the directory.
 * NFS 3.2 no longer ensures multiple of DEV_BSIZE
 */
	filesize = (filesize + DEV_BSIZE - 1) & ~(DEV_BSIZE - 1);
/* ------------- Fix end ----------- */

	bread(fsbtodb(sblock, d), dblk, filesize);
	for (loc = 0; loc < filesize; ) {
		dp = (struct direct *)(dblk + loc);
		if (dp->d_reclen == 0) {
			msg("corrupted directory, inumber %d\n", ino);
			break;
		}
		loc += dp->d_reclen;
		if(dp->d_ino == 0)
			continue;
		if(dp->d_name[0] == '.') {
			if(dp->d_name[1] == '\0')
				continue;
			if(dp->d_name[1] == '.' && dp->d_name[2] == '\0')
				continue;
		}
		if(BIT(dp->d_ino, nodmap)) {
			dadded++;
			return;
		}
		if(BIT(dp->d_ino, dirmap))
			nsubdir++;
	}
}