cyrus@pprg.unm.edu (Tait Cyrus) (12/19/88)
In the process of trying to get the 4.3 Tahoe dump running on a Sun 3 running SunOS 3.X, I, along with others, have run into the following bug (feature) (shown below). >Writing dump file 0 (/research) > DUMP: Date of this level 1 dump: Sat Dec 17 12:59:10 1988 > DUMP: Date of last level 0 dump: Wed Dec 14 19:08:42 1988 > DUMP: Dumping /dev/rxy1g (/research) to /dev/rmt1h on host houdini > DUMP: mapping (Pass I) [regular files] > DUMP: mapping (Pass II) [directories] > DUMP: (This should not happen)bread from /dev/rxy1g [block 58766]: count=24, got=512 > DUMP: (This should not happen)bread from /dev/rxy1g [block 60802]: count=536, got=1024 > . > . > . > DUMP: (This should not happen)bread from /dev/rxy1g [block 372316]: count=1040, got=1536 > DUMP: (This should not happen)bread from /dev/rxy1g [block 378344]: count=24, got=512 > DUMP: More than 32 block read errors from 152660 > DUMP: This is an unrecoverable error. > DUMP: NEEDS ATTENTION: Do you want to attempt to continue?: ("yes" or "no") no > DUMP: The ENTIRE dump is aborted. This error is produced in dumptraverse.c routine bread. I am having a difficult time trying to figure out what the heck this routine is "supposed" to be doing. I say there are several bugs in this routine and that it should look something like the following: bread(da, ba, cnt) daddr_t da; char *ba; int cnt; { int n; if (lseek(fi, (long)(da * dev_bsize), 0) < 0){ msg("bread: lseek fails\n"); } while( cnt ) { n = read(fi, ba, cnt); if( n == 0 ) { msg("(This should not happen)bread from %s [block %d]: count=%d, got=%d\n", disk, da, cnt, n); broadcast("DUMP IS AILING!\n"); msg("This is an unrecoverable error.\n"); if (!query("Do you want to attempt to continue?")){ dumpabort(); /*NOTREACHED*/ } } cnt -= n; ba += n; } } It currently looks like: bread(da, ba, cnt) daddr_t da; char *ba; int cnt; { int n; loop: if (lseek(fi, (long)(da * dev_bsize), 0) < 0){ msg("bread: lseek fails\n"); } n = read(fi, ba, cnt); if (n == cnt) return; if (da + (cnt / dev_bsize) > fsbtodb(sblock, sblock->fs_size)) { /* * Trying to read the final fragment. * * NB - dump only works in TP_BSIZE blocks, hence * rounds `dev_bsize' fragments up to TP_BSIZE pieces. * It should be smarter about not actually trying to * read more than it can get, but for the time being * we punt and scale back the read only when it gets * us into trouble. (mkm 9/25/83) */ cnt -= dev_bsize; goto loop; } msg("(This should not happen)bread from %s [block %d]: count=%d, got=%d\n", disk, da, cnt, n); if (++breaderrors > BREADEMAX){ msg("More than %d block read errors from %d\n", BREADEMAX, disk); broadcast("DUMP IS AILING!\n"); msg("This is an unrecoverable error.\n"); if (!query("Do you want to attempt to continue?")){ dumpabort(); /*NOTREACHED*/ } else breaderrors = 0; } } Am I misinterpreting what this routine is supposed to be doing? Will my code work? If not, why? Thanks --- W. Tait Cyrus (505) 277-0806 e-mail: cyrus@pprg.unm.edu University of New Mexico Dept of ECE - Parallel Processing Research Group Albuquerque, New Mexico 87131
parmelee@wayback.cs.cornell.edu (Larry Parmelee) (12/19/88)
In article <23685@pprg.unm.edu> cyrus@pprg.unm.edu (Tait Cyrus) writes: > > In the process of trying to get the 4.3 Tahoe dump running on a Sun 3 > running SunOS 3.X, I, along with others, have run into the following > bug (feature) (shown below). > > DUMP: mapping (Pass II) [directories] > > DUMP: (This should not happen)bread from /dev/rxy1g [block 58766]: count=24, got=512 > > DUMP: (This should not happen)bread from /dev/rxy1g [block 60802]: count=536, got=1024 [ Lots of "bread" errors on pass 2. ] Under 4.2 (on which Sun 3.x is based) directories were allowed to be any old size, whatever was convienient. Under 4.3, as an efficiency hack, directories were forced to be multiples of 512 bytes. The 4.3 dump expects this, and will generate errors like you're seeing when confronted with an old 4.2 filesystem containing random length directories. (4.3 fsck will extend any short directories it finds, and thereafter the kernal will maintain the convention). Below is my fix, in file dumptraverse.c, routine dsrch(). I think this was all I had to change to get the 4.3 dump to work under sun3.5. But there is one other slight problem- If you do your dumps on 9-track 6250 bpi tape, the 4.3 dump likes to block the tape records at 32k/block. The sun restore program is only expecting 10K blocks. (other than that, the sun restore seems to work fine with a 4.3 dump dumptape.) The solution is to change the definition of HIGHDENSITYTREC in /usr/include/dumprestore.h to be 10, OR resign yourself to doing restores with "dd if=/dev/rmt8 ibs=32k obs=10k | restore f -", OR fix restore. NOTE: I don't have 4.3-Tahoe, so there's a little uncertainty here. Also, I don't know if this is needed with SunOS 4.X, or if maybe sun 4.X is now following the same directory size conventions. -Larry Parmelee parmelee@cs.cornell.edu *** /tmp/,RCSt1a02738 Mon Dec 19 07:46:46 1988 --- dumptraverse.c Tue Sep 1 09:38:18 1987 *************** *** 261,267 **** --- 261,279 ---- return; if (filesize > size) filesize = size; + #ifndef sun bread(fsbtodb(sblock, d), dblk, filesize); + #else /* sun */ + /* Round up filesize to be a multiple of DEV_BSIZE. */ + #if (DEV_BSIZE != 0) && ((DEV_BSIZE & (DEV_BSIZE - 1)) == 0) + /* If DEV_BSIZE is a power of two: */ + bread(fsbtodb(sblock, d), dblk, + ((filesize+DEV_BSIZE-1) & ~(DEV_BSIZE-1))); + #else + bread(fsbtodb(sblock, d), dblk, + ((filesize+DEV_BSIZE-1)/DEV_BSIZE)*DEV_BSIZE); + #endif + #endif /* sun */ for (loc = 0; loc < filesize; ) { dp = (struct direct *)(dblk + loc); if (dp->d_reclen == 0) {
kent@tfd.UUCP (Kent Hauser) (12/19/88)
In article <23685@pprg.unm.edu>, cyrus@pprg.unm.edu (Tait Cyrus) writes: > > In the process of trying to get the 4.3 Tahoe dump running on a Sun 3 > running SunOS 3.X, I, along with others, have run into the following > bug (feature) (shown below). > > >Writing dump file 0 (/research) > > DUMP: Date of this level 1 dump: Sat Dec 17 12:59:10 1988 > > DUMP: Date of last level 0 dump: Wed Dec 14 19:08:42 1988 > > DUMP: Dumping /dev/rxy1g (/research) to /dev/rmt1h on host houdini > > DUMP: mapping (Pass I) [regular files] > > DUMP: mapping (Pass II) [directories] > > DUMP: (This should not happen)bread from /dev/rxy1g [block 58766]: count=24, got=512 > > DUMP: (This should not happen)bread from /dev/rxy1g [block 60802]: count=536, got=1024 > > . > > . > > . > --- The problem is that some raw device drivers can not read a partial block. See for instance the SunOS sd(4) man page. I suspect that this problem is inherent with SCSI devices. The 'fix' is to make dump always read the complete frag when reading the directories. Attached is my fix: *** dumptraverse.c~ Fri Nov 18 12:22:18 1988 --- dumptraverse.c Sat Dec 10 12:21:49 1988 *************** *** 265,271 **** return; if (filesize > size) filesize = size; ! bread(fsbtodb(sblock, d), dblk, filesize); for (loc = 0; loc < filesize; ) { dp = (struct direct *)(dblk + loc); if (dp->d_reclen == 0) { --- 265,272 ---- return; if (filesize > size) filesize = size; ! /* change the third bread arg from filesize to size to read whole frag */ ! bread(fsbtodb(sblock, d), dblk, size); for (loc = 0; loc < filesize; ) { dp = (struct direct *)(dblk + loc); if (dp->d_reclen == 0) { ============================== > W. Tait Cyrus (505) 277-0806 e-mail: cyrus@pprg.unm.edu -- Kent Hauser UUCP: sun!sundc!tfd!kent Twenty-First Designs INET: kent%tfd.uucp@sundc.sun.com
tadguy@cs.odu.edu (Tad Guy) (12/19/88)
In article <23642@cornell.UUCP>, parmelee@wayback (Larry Parmelee) writes: >[At 6250bpi] the 4.3 dump likes to block the tape records at 32k/block. >The sun restore program is only expecting 10K blocks. >... >OR resign yourself to doing restores with > "dd if=/dev/rmt8 ibs=32k obs=10k | restore f -", OR >fix restore. Or use the ``b'' key to restore to specify a different blocksize. No need to fix restore (well, at least not for block sizes), and this is probably faster than using dd: restore bf 32 - ...tad -- Tad Guy <tadguy@cs.odu.edu> Old Dominion University, Norfolk, VA
cudcv@warwick.ac.uk (Rob McMahon) (12/20/88)
In article <23642@cornell.UUCP> parmelee@wayback.cs.cornell.edu (Larry Parmelee) writes: >In article <23685@pprg.unm.edu> cyrus@pprg.unm.edu (Tait Cyrus) writes: >> trying to get the 4.3 Tahoe dump running on a Sun 3 running SunOS 3.X, I >> ... have run into the following bug (feature) (shown below). > >> > DUMP: mapping (Pass II) [directories] >> > DUMP: (This should not happen)bread from /dev/rxy1g [block 58766]: count=24, got=512 >> > DUMP: (This should not happen)bread from /dev/rxy1g [block 60802]: count=536, got=1024 > >Under 4.2 (on which Sun 3.x is based) directories were allowed to be any old >size, whatever was convienient. Under 4.3, as an efficiency hack, >directories were forced to be multiples of 512 bytes. I believe this was actually a bug in 4.2. When directories first got created they were created short (24 bytes). After adding the first file they were extended to 512 bytes, and thereafter stayed a multiple of 512. (Maybe someone will correct me on this, my 4.2 memory is beginning to fade, and it doesn't quite explain 536 bytes, which is suspiciously equal to 512 + 24 ...) >The 4.3 dump expects this, and will generate errors like you're seeing when >confronted with an old 4.2 filesystem containing random length directories. I believe the real problem is that SunOS rounds requests for reads from a raw device *up* to a multiple of 512 blocks. Note that dump has asked for 24 bytes but got 512. I don't know whether is actually puts all this data in your buffer. If it does it's horrible because it could overrun a buffer, whose size you had accurately given. If it doesn't it's horrible because it's lying about how many bytes were read into your buffer. IMHO it should behave like a tape drive - return the amount asked for and junk the rest of the block. Or maybe it would be better to round down for requests greater than the blocksize ? >Below is my fix, in file dumptraverse.c, routine dsrch(). I believe your fix, but not your explanation. -- UUCP: ...!mcvax!ukc!warwick!cudcv PHONE: +44 203 523037 JANET: cudcv@uk.ac.warwick ARPA: cudcv@warwick.ac.uk Rob McMahon, Computing Services, Warwick University, Coventry CV4 7AL, England
dave@acorn.co.uk (Dave Lamkin) (12/21/88)
BSD 4.3 file systems ensured that a directory length was a multiple of DEV_BSIZE, this is not the case on NFS 3.2 versions of BSD. This causes problems with fsck and dump. fsck will find (&hence fix) "errors" with directory lengths from time to time. dump when encountering a directory which is not a multiple of 512 bytes long will fail to read it with the errors as described. The fix is to round up the request length to the bread routine to a multiple of DEV_BSIZE. This is done in the routine dsrch in dumptraverse.c as below. A workaround is to fsck the disc prior to dumping. ------------------------------------------------------------------------------ dsrch(d, size, filesize) daddr_t d; int size, filesize; { register struct direct *dp; long loc; char dblk[MAXBSIZE]; if(dadded) return; if (filesize > size) filesize = size; /* +++++++++++++ Fix start ++++++++++ * Extend the length of the directory. * NFS 3.2 no longer ensures multiple of DEV_BSIZE */ filesize = (filesize + DEV_BSIZE - 1) & ~(DEV_BSIZE - 1); /* ------------- Fix end ----------- */ bread(fsbtodb(sblock, d), dblk, filesize); for (loc = 0; loc < filesize; ) { dp = (struct direct *)(dblk + loc); if (dp->d_reclen == 0) { msg("corrupted directory, inumber %d\n", ino); break; } loc += dp->d_reclen; if(dp->d_ino == 0) continue; if(dp->d_name[0] == '.') { if(dp->d_name[1] == '\0') continue; if(dp->d_name[1] == '.' && dp->d_name[2] == '\0') continue; } if(BIT(dp->d_ino, nodmap)) { dadded++; return; } if(BIT(dp->d_ino, dirmap)) nsubdir++; } }