[comp.bugs.4bsd] exec

shannon@sun.uucp (Bill Shannon) (07/28/87)

Index:	sys/kern_exec.c 4.3BSD [FIX]


Description:
	vinifod (in vm_subr.c) is called with a pointer to pte's.  vinifod
	calls bmap, which can sleep.  While the process is sleeping in bmap,
	it can be swapped out.  When it is swapped back in, the pte's can
	be allocated at a different kernel virtual address.  vinifod will
	then use the old pte address and scribble on random kernel data.
Repeat-By:
	Lotsa luck.  While debugging new hardware we had just the right
	combination of events to make this occur repeatably.  I wouldn't
	expect it to occur very often in general.
Fix:
	The quick fix is to prevent the process from being swapped during the
	time the pte's are being initialized.  (Sorry I don't have better
	diff's or something.)

	In kern_exec.c, change line 453 to

		u.u_procp->p_flag |= pagi | SKEEP;

	After line 469 (after the call to vinifod) add:

		u.u_procp->p_flag &= ~SKEEP;

goshen@ccicpg.UUCP (Shmuel Goshen) (07/28/87)

In article <24281@sun.uucp> shannon@sun.uucp (Bill Shannon) writes:
>Description:
>	vinifod (in vm_subr.c) is called with a pointer to pte's.  vinifod
>	calls bmap, which can sleep.  While the process is sleeping in bmap,
>	it can be swapped out.  When it is swapped back in, the pte's can
>	be allocated at a different kernel virtual address.  vinifod will
>	then use the old pte address and scribble on random kernel data.

>Fix:
>	In kern_exec.c, change line 453 to
>
>		u.u_procp->p_flag |= pagi | SKEEP;
>
>	After line 469 (after the call to vinifod) add:
>
>		u.u_procp->p_flag &= ~SKEEP;


The same fix should be made in vm_pt.c around line 120.
Set SKEEP before the call to vinifod and reset it after the
call. The modified text will look like:

	if (xp->x_flag & XLOAD) {
		p->p_flag |= SKEEP;
		vinifod((struct fpte *)tptopte(p, 0), PG_FTEXT, xp->x_iptr,
		    (daddr_t)1, xp->x_size);
		p->p_flag &= ~SKEEP;

-- 

Shmuel Goshen				(714) 951-8053	
Computer Consoles Inc.			(714) 458-7282
Irvine, CA.		  {allegra!hplabs!felix,seismo!rlgvax}!ccicpg!goshen

goshen@ccicpg.UUCP (Shmuel Goshen) (07/28/87)

Ooops, 
Please disregard my previous posting (additional change to vm_pt.c).
That fix is not needed with the fix posted by Bill Shannon. (It was needed 
only for the way WE, at CCI fixed the same problem in kern_exec.c.
-- 

Shmuel Goshen				(714) 951-8053	
Computer Consoles Inc.			(714) 458-7282
Irvine, CA.		  {allegra!hplabs!felix,seismo!rlgvax}!ccicpg!goshen

mojo@sun.uucp (Joseph Moran) (07/29/87)

In article <1497@ccicpg.UUCP> goshen@ccicpg.UUCP (Shmuel Goshen) writes:
>In article <24281@sun.uucp> shannon@sun.uucp (Bill Shannon) writes:
>>Description:
>>	vinifod (in vm_subr.c) is called with a pointer to pte's.  vinifod
>>	calls bmap, which can sleep.  While the process is sleeping in bmap,
>>	it can be swapped out.  When it is swapped back in, the pte's can
>>	be allocated at a different kernel virtual address.  vinifod will
>>	then use the old pte address and scribble on random kernel data.
>
>>Fix:
>>	In kern_exec.c, change line 453 to
>>
>>		u.u_procp->p_flag |= pagi | SKEEP;
>>
>>	After line 469 (after the call to vinifod) add:
>>
>>		u.u_procp->p_flag &= ~SKEEP;
>
>
>The same fix should be made in vm_pt.c around line 120.
>Set SKEEP before the call to vinifod and reset it after the
>call. The modified text will look like:
>
>	if (xp->x_flag & XLOAD) {
>		p->p_flag |= SKEEP;
>		vinifod((struct fpte *)tptopte(p, 0), PG_FTEXT, xp->x_iptr,
>		    (daddr_t)1, xp->x_size);
>		p->p_flag &= ~SKEEP;
>

Be careful, this stuff can be very tricky.  I believe that the original
fix suggested by shannon was correct and the 2nd fix is well intended
but might be incorrect depending on other changes made to the system.

It was not obvious from the original message, but in the first fix the
SKEEP flag is on for the process being exec'ed all during the calls to
vgetvm(), xalloc(), and vinifod() from getxfile().  It is my belief
that the only time vinitpt() will end up calling vinifod() is when it
is called as a result of getxfile() calling xalloc() calling xlink().
If my supposition is correct, then the change to getxfile() for setting
the SKEEP flag will prevent the swap operation from occurring for both
initialization of the data pte's (vinifod() being called directly from
getxfile()) and for the text pte's (vinifod() being called from vinitpt()
as a result of the xalloc() call).

And as a matter of fact, using the second "fix" listed above will break
the first fix since it is assuming the that SKEEP flag is turned off
when vinitpt() is was called.  Thus if you believe that the vinitpt()
code might be called with the SKEEP flag off once the first fix is
applied the correct change to vinitpt() would be something along the
line of:

	if (xp->x_flag & XLOAD) {
		int fixflag;

		if ((p->p_flag & SKEEP) == 0) {
printf("vinitpt: SKEEP not set");
			p->p_flag |= SKEEP;
			fixflag = 1;
		} else
			fixflag = 0;
		vinifod((struct fpte *)tptopte(p, 0), PG_FTEXT, xp->x_iptr,
		    (daddr_t)1, xp->x_size);
		if (fixflag)
			p->p_flag &= ~SKEEP;
	}

My belief is that if the original suggested fix is made, the debugging
printf should never occur.  If it does, then my statement that says the
the original fix should be sufficient would be incorrect, and then the
code suggest above (minus the printf) could be used.  So far in the testing
we have done we have not seen vinifod() being called with the SKEEP flag
off when using just the original proposed fix to getxfile().

Another way to attack this problem is to have vinifod() itself set and
clear the SKEEP flag similar to how it's done above so that it doesn't
assume anything about the process flags on entry.  But then this
assumes that there is no path from the place where the addresses passed
in to vinifod() were calculated to the place where vinifod() turns on
the SKEEP flag were the system could sleep and the process might get
swapped out.  It turns out that this is currently valid, as the only 2
calls to vinifod() use a tptopte() or a dptopte() macro inline with the
vinifod() call, but this might change in some implementation of the BSD
vm system.

BTW, the failure mode we saw for this problem was that a user process
would once and awhile dump core an a *very* heavily loaded system
during some stress testing.  Looking at the core file, all the data in
the core file which was mapped to a place in the executable file that
was beyond the cross-over to indirect blocks in the file (12 blocks)
was all zeroes.  We eventually tracked it down to the process getting
swapped trying to read the indirect block in vinifod().  This caused
the pte's for the part of the data segment that was mapped by the
indirect block to end up being left marked as PG_FZERO (zero fill on
demand) which was set up by vgetvm().  We never did see a kernel
problem that resulted from vinifod() filling in pte's at the wrong
address in usrpt space.

chris@mimsy.UUCP (Chris Torek) (07/29/87)

>>The same fix should be made in vm_pt.c around line 120.

I had wondered about this too.

In article <24372@sun.uucp> mojo@sun.uucp (Joseph Moran) writes:
-It was not obvious from the original message, but in the first fix the
-SKEEP flag is on for the process being exec'ed all during the calls to
-vgetvm(), xalloc(), and vinifod() from getxfile().  It is my belief
-that the only time vinitpt() will end up calling vinifod() is when it
-is called as a result of getxfile() calling xalloc() calling xlink().

This seems to be true.  Note that xalloc() can twiddle SKEEP itself,
but only if not paging from an inode, in which case XPAGI is not
set and hence xlink will not call vinitpt() anyway.  In other words,
the original change may unnecessarily turn off SKEEP, but so what?

-Another way to attack this problem is to have vinifod() itself set and
-clear the SKEEP flag similar to how it's done above so that it doesn't
-assume anything about the process flags on entry.

Or have vinifod() find the ptes itself.  It could simply redo the
?ptopte() after each call to bmap().  But is it safe to swap the
process before all its ptes are set?
-- 
In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 7690)
Domain:	chris@mimsy.umd.edu	Path:	seismo!mimsy!chris