davet@RAND-UNIX.ARPA (Dave Truesdell) (09/27/85)
Index: dump/dumptraverse.c--etc Description: Dump would hang the system while doing incremental dumps (levels > 0), during pass II. When doing dumps from a raw device, dump does not force all reads to be done in multiples of DEV_BSIZE byte chunks. In most cases drivers seem to handle this correctly, but one obscure case has caused one of our systems (a VAX 11/785 running 4.2BSD) to hang. Repeat-By: Arrange for a raw read (size != n*512) of a directory to fail. In our case, a empty directory (containing ".", and "..") occupied a block which was forwarded by the HP driver. When dump attempted to read the directory entry ( 24 bytes long ), the system hung. Fix: Force bread to do all reads in multiples of DEV_BSIZE byte blocks. However, for efficiency, I have added a seperate version of bread called raw_bread, that is used in pass II. RCS file: RCS/dumptraverse.c,v retrieving revision 1.3 retrieving revision 1.4 diff -c3 -r1.3 -r1.4 *** /tmp/,RCSt1018693 Tue Sep 24 12:40:03 1985 --- /tmp/,RCSt2018693 Tue Sep 24 12:40:05 1985 *************** *** 265,271 return; if (filesize > size) filesize = size; ! bread(fsbtodb(sblock, d), dblk, filesize); for (loc = 0; loc < filesize; ) { dp = (struct direct *)(dblk + loc); if (dp->d_reclen == 0) { --- 269,275 ----- return; if (filesize > size) filesize = size; ! raw_bread(fsbtodb(sblock, d), dblk, filesize); for (loc = 0; loc < filesize; ) { dp = (struct direct *)(dblk + loc); if (dp->d_reclen == 0) { *************** *** 338,343 goto loop; } msg("(This should not happen)bread from %s [block %d]: count=%d, got=%d\n", disk, da, cnt, n); if (++breaderrors > BREADEMAX){ msg("More than %d block read errors from %d\n", --- 342,400 ----- goto loop; } msg("(This should not happen)bread from %s [block %d]: count=%d, got=%d\n", + disk, da, cnt, n); + if (++breaderrors > BREADEMAX){ + msg("More than %d block read errors from %d\n", + BREADEMAX, disk); + broadcast("DUMP IS AILING!\n"); + msg("This is an unrecoverable error.\n"); + if (!query("Do you want to attempt to continue?")){ + dumpabort(); + /*NOTREACHED*/ + } else + breaderrors = 0; + } + } + + /* + * This version of bread forces reads to be done in multiples of DEV_BSIZE + * (normally 512) bytes. This is required for proper operation on raw I/O + * devices. The lack of enforcement, has caused one of our systems to hang + * during incremental dumps. + */ + raw_bread(da, ba, cnt) + daddr_t da; + char *ba; + int cnt; + { + int n; + int size; + + size = cnt; + /* + * Note: This implicitly assumes that DEV_BSIZE is a power of 2. + * If, for whatever reason, this is not true, an alternate + * function is incuded, but ifdef'd out. + */ + if ((size & (DEV_BSIZE - 1)) != 0) + size = (size + (DEV_BSIZE - 1)) &~ (DEV_BSIZE - 1); + #ifdef not_def + if ((size % DEV_BSIZE) != 0) + size += DEV_BSIZE - (size % DEV_BSIZE); + #endif + loop: + if (lseek(fi, (long)(da * DEV_BSIZE), 0) < 0){ + msg("raw_bread: lseek fails\n"); + } + n = read(fi, ba, size); + if (n == size || n >= cnt) + return; + if (da + (size / DEV_BSIZE) > fsbtodb(sblock, sblock->fs_size)) { + size -= DEV_BSIZE; + cnt -= DEV_BSIZE; + goto loop; + } + msg("(This should not happen)raw_bread from %s [block %d]: count=%d, got=%d\n", disk, da, cnt, n); if (++breaderrors > BREADEMAX){ msg("More than %d block read errors from %d\n", -- David A. Truesdell Programmer/Analyst The RAND Corporation 1700 Main Street P.O. Box 2138 CSD/1 Santa Monica, CA 90406-2138 Phone: (213) 393-0411 ext. 7173 ARPAnet: davet@rand-unix UUCP/usenet: cbosg ---- decvax ---| fortune --| harpo ----|- randvax!davet trw-unix -| ucla-cs --| vortex ---
dave@onfcanim.UUCP (Dave Martindale) (09/30/85)
In article <1763@brl-tgr.ARPA> davet@RAND-UNIX.ARPA (Dave Truesdell) writes: >Index: dump/dumptraverse.c--etc > >Description: > Dump would hang the system while doing incremental dumps (levels > 0), > during pass II. > > When doing dumps from a raw device, dump does not force all reads > to be done in multiples of DEV_BSIZE byte chunks. In most cases > drivers seem to handle this correctly, but one obscure case has > caused one of our systems (a VAX 11/785 running 4.2BSD) to hang. > > >Repeat-By: > Arrange for a raw read (size != n*512) of a directory to fail. > > In our case, a empty directory (containing ".", and "..") occupied > a block which was forwarded by the HP driver. When dump attempted to > read the directory entry ( 24 bytes long ), the system hung. > >Fix: > > Force bread to do all reads in multiples of DEV_BSIZE byte blocks. > However, for efficiency, I have added a seperate version of bread > called raw_bread, that is used in pass II. But why fix dump when the problem is almost certainly in the disk driver? Looking at the code in hp.c, we see that when a bad block is forwarded, the replacement block is always read in its entirety, even if you asked for less data, and thus the Massbus adapter has been set up to correctly map only the lesser amount of data. The fix would seem to be easy, just make the following change to the BSE case of the switch in hpecc(): *** /tmp/hp.c Mon Sep 30 14:57:02 1985 --- hp.c Mon Sep 30 15:05:32 1985 *************** *** 1024,1030 sn = bn%st->nspc; tn = sn/st->nsect; sn %= st->nsect; ! mbp->mba_bcr = -512; rp->hpof &= ~HPOF_SSEI; #ifdef HPBDEBUG if (hpbdebug) --- 1024,1030 ----- sn = bn%st->nspc; tn = sn/st->nsect; sn %= st->nsect; ! mbp->mba_bcr = -MIN(512, bp->b_bcount-(int)ptob(npf)); rp->hpof &= ~HPOF_SSEI; #ifdef HPBDEBUG if (hpbdebug) Now, I don't have a filesystem that has a directory in a bad block (as far as I know), so I can't test this under the same conditions. But the old code is clearly wrong. Why not fix the hp driver and then try the original version of dump and see if everything works as it should? By the way, exactly the same bug appears in the "up" Unibus disk driver too.
chris@umcp-cs.UUCP (Chris Torek) (10/02/85)
> But why fix dump when the problem is almost certainly in the disk driver? Exactly. Your fix is correct, but possibly incomplete. Here is the other change from Berkeley for the ecc code: *** 911,916 #endif bp->b_flags &= ~B_BAD; ! mbp->mba_bcr = -(bp->b_bcount - (int)ptob(npf)); ! if (MASKREG(mbp->mba_bcr) == 0) return (0); break; --- 927,931 ----- #endif bp->b_flags &= ~B_BAD; ! if ((int)ptob(npf) >= bp->b_bcount) return (0); mbp->mba_bcr = -(bp->b_bcount - (int)ptob(npf)); *************** *** 914,917 if (MASKREG(mbp->mba_bcr) == 0) return (0); break; } --- 929,933 ----- if ((int)ptob(npf) >= bp->b_bcount) return (0); + mbp->mba_bcr = -(bp->b_bcount - (int)ptob(npf)); break; } -- In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 4251) UUCP: seismo!umcp-cs!chris CSNet: chris@umcp-cs ARPA: chris@mimsy.umd.edu
shannon@sun.uucp (Bill Shannon) (10/06/85)
> In article <1763@brl-tgr.ARPA> davet@RAND-UNIX.ARPA (Dave Truesdell) writes: > > When doing dumps from a raw device, dump does not force all reads > > to be done in multiples of DEV_BSIZE byte chunks. In most cases > > drivers seem to handle this correctly, but one obscure case has > > caused one of our systems (a VAX 11/785 running 4.2BSD) to hang. > > But why fix dump when the problem is almost certainly in the disk driver? Because: 1. If you read the man pages for the disk devices you will see that they strongly suggest that you do reads in multiples of 512 (== DEV_BSIZE) when reading the raw device. 2. Not all disk controllers on all machines are capable of reading partial sectors. The bug is in dump, and should be fixed there. Bill Shannon
chris@umcp-cs.UUCP (Chris Torek) (10/08/85)
In article <2856@sun.uucp> shannon@sun.uucp (Bill Shannon) writes: >> In article <1763@brl-tgr.ARPA> davet@RAND-UNIX.ARPA (Dave Truesdell) writes: >> >> But why fix dump when the problem is almost certainly in the disk driver? > > Because: [... the manuals] strongly suggest that you do reads in multiples > of 512 (== DEV_BSIZE) when reading the raw device [; and] Not all disk > controllers on all machines are capable of reading partial sectors. This is, I must admit, a good point. However. . . . > The bug is in dump, and should be fixed there. Actually, the bug is once again in the kernel. Dump should perhaps be more careful in its assumption that directory sizes are always correct; but directories are always supposed to be an exact multiple of DEV_BSIZE. The 4.2BSD mkdir system call wrote out `dirtemplate' without first padding it out, giving 24 byte directories, which on occasion would remain 24+k*512 bytes (usually the 24+ disappears after the first file creation). The 4.3BSD kernel has this fixed (and the 4.3 `fsck' adjusts directory sizes upward as necessary to fix the 24+ problem). -- In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 4251) UUCP: seismo!umcp-cs!chris CSNet: chris@umcp-cs ARPA: chris@mimsy.umd.edu
ado@elsie.UUCP (Arthur David Olson) (10/08/85)
> > But why fix dump when the problem is almost certainly in the disk driver? > > . . .If you read the man pages for the disk devices you will see > that they strongly suggest that you do reads in multiples > of 512 (== DEV_BSIZE) when reading the raw device. . . > The bug is in dump, and should be fixed there. If the claim is that dump "ought to" read in multiples of DEV_BSIZE, then the driver is buggy in its failure to reject read requests of other sizes. -- UNIX is an AT&T Bell Laboratories trademark. Sun is a Sun Microsystems trademark. -- UUCP: ..decvax!seismo!elsie!ado ARPA: elsie!ado@seismo.ARPA DEC, VAX and Elsie are Digital Equipment and Borden trademarks
shannon@sun.uucp (Bill Shannon) (10/10/85)
> > > But why fix dump when the problem is almost certainly in the disk driver? > > > > . . .If you read the man pages for the disk devices you will see > > that they strongly suggest that you do reads in multiples > > of 512 (== DEV_BSIZE) when reading the raw device. . . > > The bug is in dump, and should be fixed there. > > If the claim is that dump "ought to" read in multiples of DEV_BSIZE, > then the driver is buggy in its failure to reject read requests of other sizes. The manual doesn't go so far as to say that this will not work, or may not work, only that it is suggested that you not do it. The driver is free to provide this capability as an extension (in which case the previously mentioned bugs should be fixed so that it works properly) but no portable program should depend on that extension. Bill Shannon
ignatz@aicchi.UUCP (Ihnat) (10/15/85)
Just catching up on this group, so I'm a day or two late...but I find one thing interesting in the whole thing... No one has mentioned the obvious: If, as it sounds to me, a user program, with no particularly exotic permissions, may use a system driver with permitted calls and cause the system to have a prefrontal, then it's a BUG in the driver. I don't care WHAT is documented in the manual, or suggested as being what a right-thinking programmer would do. If A User Can Crash The System With It, It's A Bug. Cheerfully, -- Dave Ihnat Analysts International Corporation (312) 882-4673 ihnp4!aicchi!ignatz