[net.unix-wizards] minor bug in dump, can cause system to hang.

davet@RAND-UNIX.ARPA (Dave Truesdell) (09/27/85)

Index:  dump/dumptraverse.c--etc

Description:
	Dump would hang the system while doing incremental dumps (levels > 0),
	during pass II.

	When doing dumps from a raw device, dump does not force all reads
	to be done in multiples of DEV_BSIZE byte chunks.  In most cases
	drivers seem to handle this correctly, but one obscure case has
	caused one of our systems (a VAX 11/785 running 4.2BSD) to hang.


Repeat-By:
	Arrange for a raw read (size != n*512) of a directory to fail.

	In our case, a empty directory (containing ".", and "..") occupied
	a block which was forwarded by the HP driver.   When dump attempted to
	read the directory entry ( 24 bytes long ), the system hung.

Fix:

	Force bread to do all reads in multiples of DEV_BSIZE byte blocks.
	However, for efficiency, I have added a seperate version of bread
	called raw_bread, that is used in pass II.

RCS file: RCS/dumptraverse.c,v
retrieving revision 1.3
retrieving revision 1.4
diff -c3 -r1.3 -r1.4
*** /tmp/,RCSt1018693	Tue Sep 24 12:40:03 1985
--- /tmp/,RCSt2018693	Tue Sep 24 12:40:05 1985
***************
*** 265,271
  		return;
  	if (filesize > size)
  		filesize = size;
! 	bread(fsbtodb(sblock, d), dblk, filesize);
  	for (loc = 0; loc < filesize; ) {
  		dp = (struct direct *)(dblk + loc);
  		if (dp->d_reclen == 0) {

--- 269,275 -----
  		return;
  	if (filesize > size)
  		filesize = size;
! 	raw_bread(fsbtodb(sblock, d), dblk, filesize);
  	for (loc = 0; loc < filesize; ) {
  		dp = (struct direct *)(dblk + loc);
  		if (dp->d_reclen == 0) {
***************
*** 338,343
  		goto loop;
  	}
  	msg("(This should not happen)bread from %s [block %d]: count=%d, got=%d\n",
  		disk, da, cnt, n);
  	if (++breaderrors > BREADEMAX){
  		msg("More than %d block read errors from %d\n",

--- 342,400 -----
  		goto loop;
  	}
  	msg("(This should not happen)bread from %s [block %d]: count=%d, got=%d\n",
+ 		disk, da, cnt, n);
+ 	if (++breaderrors > BREADEMAX){
+ 		msg("More than %d block read errors from %d\n",
+ 			BREADEMAX, disk);
+ 		broadcast("DUMP IS AILING!\n");
+ 		msg("This is an unrecoverable error.\n");
+ 		if (!query("Do you want to attempt to continue?")){
+ 			dumpabort();
+ 			/*NOTREACHED*/
+ 		} else
+ 			breaderrors = 0;
+ 	}
+ }
+ 
+ /*
+  *     This version of bread forces reads to be done in multiples of DEV_BSIZE
+  * (normally 512) bytes.  This is required for proper operation on raw I/O
+  * devices.  The lack of enforcement, has caused one of our systems to hang
+  * during incremental dumps.
+  */
+ raw_bread(da, ba, cnt)
+ 	daddr_t da;
+ 	char *ba;
+ 	int	cnt;	
+ {
+ 	int n;
+ 	int size;
+ 
+ 	size = cnt;
+ 	/*
+ 	 * Note:  This implicitly assumes that DEV_BSIZE is a power of 2.
+ 	 *	If, for whatever reason, this is not true, an alternate
+ 	 *	function is incuded, but ifdef'd out.
+ 	 */
+ 	if ((size & (DEV_BSIZE - 1)) != 0)
+ 		size = (size + (DEV_BSIZE - 1)) &~ (DEV_BSIZE - 1);
+ #ifdef	not_def
+ 	if ((size % DEV_BSIZE) != 0)
+ 		size += DEV_BSIZE - (size % DEV_BSIZE);
+ #endif
+ loop:
+ 	if (lseek(fi, (long)(da * DEV_BSIZE), 0) < 0){
+ 		msg("raw_bread: lseek fails\n");
+ 	}
+ 	n = read(fi, ba, size);
+ 	if (n == size || n >= cnt)
+ 		return;
+ 	if (da + (size / DEV_BSIZE) > fsbtodb(sblock, sblock->fs_size)) {
+ 		size -= DEV_BSIZE;
+ 		cnt  -= DEV_BSIZE;
+ 		goto loop;
+ 	}
+ 	msg("(This should not happen)raw_bread from %s [block %d]: count=%d, got=%d\n",
  		disk, da, cnt, n);
  	if (++breaderrors > BREADEMAX){
  		msg("More than %d block read errors from %d\n",
--
    David A. Truesdell
    Programmer/Analyst

    The RAND Corporation
    1700 Main Street
    P.O. Box 2138
    CSD/1
    Santa Monica, CA 90406-2138

Phone:
    (213) 393-0411 ext. 7173
ARPAnet:
    davet@rand-unix
UUCP/usenet:
    cbosg ----
    decvax ---|
    fortune --|
    harpo ----|- randvax!davet
    trw-unix -|
    ucla-cs --|
    vortex ---

dave@onfcanim.UUCP (Dave Martindale) (09/30/85)

In article <1763@brl-tgr.ARPA> davet@RAND-UNIX.ARPA (Dave Truesdell) writes:
>Index:  dump/dumptraverse.c--etc
>
>Description:
>	Dump would hang the system while doing incremental dumps (levels > 0),
>	during pass II.
>
>	When doing dumps from a raw device, dump does not force all reads
>	to be done in multiples of DEV_BSIZE byte chunks.  In most cases
>	drivers seem to handle this correctly, but one obscure case has
>	caused one of our systems (a VAX 11/785 running 4.2BSD) to hang.
>
>
>Repeat-By:
>	Arrange for a raw read (size != n*512) of a directory to fail.
>
>	In our case, a empty directory (containing ".", and "..") occupied
>	a block which was forwarded by the HP driver.   When dump attempted to
>	read the directory entry ( 24 bytes long ), the system hung.
>
>Fix:
>
>	Force bread to do all reads in multiples of DEV_BSIZE byte blocks.
>	However, for efficiency, I have added a seperate version of bread
>	called raw_bread, that is used in pass II.

But why fix dump when the problem is almost certainly in the disk driver?
Looking at the code in hp.c, we see that when a bad block is forwarded,
the replacement block is always read in its entirety, even if you asked
for less data, and thus the Massbus adapter has been set up to correctly
map only the lesser amount of data.  The fix would seem to be easy, just
make the following change to the BSE case of the switch in hpecc():

*** /tmp/hp.c	Mon Sep 30 14:57:02 1985
--- hp.c	Mon Sep 30 15:05:32 1985
***************
*** 1024,1030
  		sn = bn%st->nspc;
  		tn = sn/st->nsect;
  		sn %= st->nsect;
! 		mbp->mba_bcr = -512;
  		rp->hpof &= ~HPOF_SSEI;
  #ifdef HPBDEBUG
  		if (hpbdebug)

--- 1024,1030 -----
  		sn = bn%st->nspc;
  		tn = sn/st->nsect;
  		sn %= st->nsect;
! 		mbp->mba_bcr = -MIN(512, bp->b_bcount-(int)ptob(npf));
  		rp->hpof &= ~HPOF_SSEI;
  #ifdef HPBDEBUG
  		if (hpbdebug)

Now, I don't have a filesystem that has a directory in a bad block (as far
as I know), so I can't test this under the same conditions.  But the old code 
is clearly wrong.  Why not fix the hp driver and then try the original version
of dump and see if everything works as it should?

By the way, exactly the same bug appears in the "up" Unibus disk driver too.

chris@umcp-cs.UUCP (Chris Torek) (10/02/85)

> But why fix dump when the problem is almost certainly in the disk driver?

Exactly.  Your fix is correct, but possibly incomplete.  Here is the
other change from Berkeley for the ecc code:

*** 911,916
  #endif
                bp->b_flags &= ~B_BAD;
!               mbp->mba_bcr = -(bp->b_bcount - (int)ptob(npf));
!               if (MASKREG(mbp->mba_bcr) == 0)
                        return (0);
                break;
--- 927,931 -----
  #endif
                bp->b_flags &= ~B_BAD;
!               if ((int)ptob(npf) >= bp->b_bcount)
                        return (0);
                mbp->mba_bcr = -(bp->b_bcount - (int)ptob(npf));
***************
*** 914,917
                if (MASKREG(mbp->mba_bcr) == 0)
                        return (0);
                break;
        }

--- 929,933 -----
                if ((int)ptob(npf) >= bp->b_bcount)
                        return (0);
+               mbp->mba_bcr = -(bp->b_bcount - (int)ptob(npf));
                break;
        }
-- 
In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 4251)
UUCP:	seismo!umcp-cs!chris
CSNet:	chris@umcp-cs		ARPA:	chris@mimsy.umd.edu

shannon@sun.uucp (Bill Shannon) (10/06/85)

> In article <1763@brl-tgr.ARPA> davet@RAND-UNIX.ARPA (Dave Truesdell) writes:
> >	When doing dumps from a raw device, dump does not force all reads
> >	to be done in multiples of DEV_BSIZE byte chunks.  In most cases
> >	drivers seem to handle this correctly, but one obscure case has
> >	caused one of our systems (a VAX 11/785 running 4.2BSD) to hang.
>
> But why fix dump when the problem is almost certainly in the disk driver?

Because:
	1.  If you read the man pages for the disk devices you will see
	    that they strongly suggest that you do reads in multiples
	    of 512 (== DEV_BSIZE) when reading the raw device.

	2.  Not all disk controllers on all machines are capable of
	    reading partial sectors.

The bug is in dump, and should be fixed there.

					Bill Shannon

chris@umcp-cs.UUCP (Chris Torek) (10/08/85)

In article <2856@sun.uucp> shannon@sun.uucp (Bill Shannon) writes:
>> In article <1763@brl-tgr.ARPA> davet@RAND-UNIX.ARPA (Dave Truesdell) writes:
>>
>> But why fix dump when the problem is almost certainly in the disk driver?
>
> Because: [... the manuals] strongly suggest that you do reads in multiples
> of 512 (== DEV_BSIZE) when reading the raw device [; and]  Not all disk
> controllers on all machines are capable of reading partial sectors.

This is, I must admit, a good point.  However. . . .

> The bug is in dump, and should be fixed there.

Actually, the bug is once again in the kernel.  Dump should perhaps
be more careful in its assumption that directory sizes are always
correct; but directories are always supposed to be an exact multiple
of DEV_BSIZE.  The 4.2BSD mkdir system call wrote out `dirtemplate'
without first padding it out, giving 24 byte directories, which on
occasion would remain 24+k*512 bytes (usually the 24+ disappears
after the first file creation).

The 4.3BSD kernel has this fixed (and the 4.3 `fsck' adjusts
directory sizes upward as necessary to fix the 24+ problem).
-- 
In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 4251)
UUCP:	seismo!umcp-cs!chris
CSNet:	chris@umcp-cs		ARPA:	chris@mimsy.umd.edu

ado@elsie.UUCP (Arthur David Olson) (10/08/85)

> > But why fix dump when the problem is almost certainly in the disk driver?
>
> . . .If you read the man pages for the disk devices you will see
> that they strongly suggest that you do reads in multiples
> of 512 (== DEV_BSIZE) when reading the raw device. . .
> The bug is in dump, and should be fixed there.

If the claim is that dump "ought to" read in multiples of DEV_BSIZE,
then the driver is buggy in its failure to reject read requests of other sizes.
--
UNIX is an AT&T Bell Laboratories trademark.
Sun is a Sun Microsystems trademark.
--
	UUCP: ..decvax!seismo!elsie!ado    ARPA: elsie!ado@seismo.ARPA
	DEC, VAX and Elsie are Digital Equipment and Borden trademarks

shannon@sun.uucp (Bill Shannon) (10/10/85)

> > > But why fix dump when the problem is almost certainly in the disk driver?
> >
> > . . .If you read the man pages for the disk devices you will see
> > that they strongly suggest that you do reads in multiples
> > of 512 (== DEV_BSIZE) when reading the raw device. . .
> > The bug is in dump, and should be fixed there.
> 
> If the claim is that dump "ought to" read in multiples of DEV_BSIZE,
> then the driver is buggy in its failure to reject read requests of other sizes.

The manual doesn't go so far as to say that this will not work, or may not
work, only that it is suggested that you not do it.  The driver is free to
provide this capability as an extension (in which case the previously
mentioned bugs should be fixed so that it works properly) but no portable
program should depend on that extension.

					Bill Shannon

ignatz@aicchi.UUCP (Ihnat) (10/15/85)

Just catching up on this group, so I'm a day or two late...but I find one
thing interesting in the whole thing...

No one has mentioned the obvious:  If, as it sounds to me, a user program,
with no particularly exotic permissions, may use a system driver with permitted
calls and cause the system to have a prefrontal, then it's a BUG in the
driver.  I don't care WHAT is documented in the manual, or suggested as
being what a right-thinking programmer would do.  If A User Can Crash The
System With It, It's A Bug.

	Cheerfully,


-- 
	Dave Ihnat
	Analysts International Corporation
	(312) 882-4673
	ihnp4!aicchi!ignatz