[net.bugs.2bsd] Serious mapping problem with 2.9

ps@uok.UUCP (10/24/84)

Subject: panic: trap type 11
Index:	/usr/include/sys/seg.h 2.9bsd

Description:
	The system crashes with a 'panic: trap type 11'.  This is
	quite possible on a system compiled with NOKA5 undefined
	and data extending up into the fifth segment. (> 40K data+bss)
	The machine may also just stop without saying very much (anything?)
	if you get two segmentation violations in a short period of time.
	In this last case, the machine will not even reboot itself because
	it is spinning in trap().

Repeat-By:
	Hard to repeat on demand.  This problem seems to be timing
	dependant.  A possible scenario is:

		buffer gets mapped in:
			*KDSD5 = (BSIZE << 2) | RW
			*KDSA5 = click address in buffer pool
			... (copying buffer)
		clock interrupts: 
			map[0].se_addr = *KDSA5;
			map[0].se_desc = *KDSD5;
			*KDSD5 = seg5.se_desc;
			*KDSA5 = seg5.se_addr;
			... (handle 1 second clock processing)
			...	((++lbolt >= hz) && BASEPRI(ps)) is true
			...	drop the priority to spl1().
			... finish processing, start the restormap stuff
			*KDSD5 = map[0].se_desc;
		disk interrupts:
			calls wakeup to notify process of completion of I/O
			no Savemap() done because *KDSA5 == seg5.se_addr
			and KDSD5 is not checked.
				(notice, KDSD5 wrong, KDSA5 correct)
			wakeup tries to reference something beyond the size
			of a buffer in the fifth segment,
				segmentation violation!

Fix:
	The solution appears to be to change the savemap() macro
	in /usr/include/sys/seg.h.  Both the address and descriptor
	registers need to be checked when deciding whether to call
	Savemap().

	It was:
	#define	savemap(map)	{if (*KDSA5 != seg5.se_addr) Savemap(map); \
			 	else map[0].se_desc = NOMAP; }

	It should be changed to:
	#define	savemap(map) \
		{\
		if (*KDSA5 != seg5.se_addr || *KDSD5 != seg5.se_desc)\
			Savemap(map);\
		else\
			map[0].se_desc = NOMAP;\
		}

	There is also a small problem in sys/machdep.c in the Restormap()
	routine.  The code to restore the values of KDS?5 is done in the
	wrong order and should be reversed.

	It was:
		*KDSA5 = map[0].se_addr;
		*KDSD5 = map[0].se_desc;

	It should be:
		*KDSD5 = map[0].se_desc;
		*KDSA5 = map[0].se_addr;


	We have been running on a kernel with these changes for about two
	weeks with no apparent problems.  Prior to making these changes,
	we would crash at random times, varying from about 5 minutes to 48
	hours.

borman@decvax.UUCP (Dave borman) (11/03/84)

One interesting thing to note about the fix for the mapping problem is
that you wind up saving the map when you don't need to, but that's better
than not saving it when you need to.  The KA5 descriptor will have the
modified bit clear when it is saved at boot time.  Then, anytime you
touch anything in KA5 the modified savemap macro will force the save,
even though it isn't needed.

Depending on how far into KA5 your data extends, I've been considering
just getting rid of the macro since you'll wind up saveing it most
of the time anyway. Any thoughts on this?

Speaking of 2.9, has anyone out there had trouble with mount/unmount?
When running with RX50 floppies and several hundred buffers, I would
have problems with kernel cancelling write request before all the
buffers got written out.  This caused me a few headaches and several
re-writes of the unmount system call to make sure that 1) all the
delayed writes got queued up to be written, and 2) not return from the
mount until they are all done.

I suppose most people aren't mounting and unmounting file systems a lot.

		-Dave Borman, Digital UNIX Engineering Group
		decvax!borman