[net.bugs.v7] v7 shared text bug & fix.

bux@dual.UUCP (Dave Buxbaum) (11/29/84)

I recently found an interesting bug in the Version 7 shared text code
which, under the right circumstances, causes deadlock in the kernel.

The initial symptom was that our V7 system would "hang" occasionally while
under heavy disk load. (This would happen in anywhere from  2 to 48
hours, while running a test which was really disk bound).

The deadlock results from the following situation:

Consider N number of processes executing out of a shared text segment.
At some point one of the processes executes the exit system call, while
the others remain in various states of completion.  From within the
exit(), a call is made to xfree() to relinquish use of the shared text
segment:

/*
 */
xfree()
{
	register struct text *xp;
	register struct inode *ip;

	if((xp=u.u_procp->p_textp) == NULL)
		return;
	dprelse(u.u_procp);
	xlock(xp);
	xp->x_flag &= ~XLOCK;
	u.u_procp->p_textp = NULL;       	/* TROUBLE BREWING */
	u.u_ptsize = 0;
	ip = xp->x_iptr;
	if(--xp->x_count==0 && (ip->i_mode&ISVTX)==0) {
		xp->x_iptr = NULL;
		mfree(swapmap, ctod(xp->x_size), xp->x_daddr);
		mfree(coremap, xp->x_size, xp->x_caddr);
		ip->i_flag &= ~ITEXT;
		if (ip->i_flag&ILOCK)
			ip->i_count--;
		else
			iput(ip);
	} else
		xccdec(xp);
}

Note that the text pointer in the user structure is set NULL, thus
severing the connection between the text segment and the exiting
process. The interesting case occurs when the else clause is taken; a
call to xccdec().  Here is xccdec():

xccdec(xp)
register struct text *xp;
{

	if (xp==NULL || xp->x_ccount==0)
		return;
	xlock(xp);
	if (--xp->x_ccount==0) {
		if (xp->x_flag&XWRIT) {
			xp->x_flag &= ~XWRIT;
			swap(xp->x_daddr,xp->x_caddr,xp->x_size,B_WRITE);
		}
		mfree(coremap, xp->x_size, xp->x_caddr);
	}
	xunlock(xp);
}

The lock is granted after the text pointer has been set NULL. If the
call to swap causes a context switch, which it can under certain
circumstances, we find the dying process out on the swap device while
holding the text lock granted in xccdec().

The scheduler does check that the given process is not holding a lock
on a text segment, but in this case the connection between the process
table and text table has already been severed. This renders the check
useless.

The next time the scheduler selects a candidate who wants to run from
the same segment we have big trouble:

sched()
{
	.
	.
	.
	/* Look for a candidate */
	.
	.
	.
	/*
	 * If there is no one there, wait.
	 */
	if (outage == -20000) {
		runout++;
		sleep((caddr_t)&runout, PSWP);
		goto loop;
	}

	/*
	 * Found one, see if there is core for that process;
	 * if so, swap it in.
	 */

	if (swapin(p))
	.
	.
	/* If not, swap others out until we have room */
	.
	.
	for (rp = &proc[0]; rp < &proc[NPROC]; rp++) {
		if (rp->p_stat==SZOMB
		 || (rp->p_flag&(SSYS|SLOCK|SULOCK|SLOAD))!=SLOAD)
			continue;

		/*** HERE IS THE CHECK ***/

		if (rp->p_textp && rp->p_textp->x_flag&XLOCK)  
			continue;

The first thing swapin() does if the new process is running a shared
text segment is increment the incore count. Before doing this, a call to
xlock() is made and since the swapped out dying process is still
holding the lock, the kernel sleeps in sched() waiting for a lock it
will never get.

The fix I applied is rather simple and, there is more than one solution.
I chose to lock the dying process in core while in this "critical section".

xccdec(xp)
register struct text *xp;
{
	register int	prevlock;

	if (xp==NULL || xp->x_ccount==0)
		return;
	xlock(xp);
	if (!(prevlock = u.u_procp->p_flag & SLOCK))	/* Bug fix:	*/
		u.u_procp->p_flag |= SLOCK;	/*    Don't let this process  */
	if (--xp->x_ccount==0) {		/*    swap out while holding  */
		if (xp->x_flag&XWRIT) {		/*    the text lock.	      */
			xp->x_flag &= ~XWRIT;
			swap(xp->x_daddr,xp->x_caddr,xp->x_size,B_WRITE);
		}
		mfree(coremap, xp->x_size, xp->x_caddr);
	}
	xunlock(xp);
	if (!prevlock)
		u.u_procp->p_flag &= ~SLOCK;	/* 	All done.	    */
}

Any comments on this bug fix?

	David Buxbaum

	dual!bux@BERKELEY.ARPA
	{ihnp4,ucbvax,hplabs,decwrl,cbosgd,sun,nsc,apple,pyramid}!dual!bux
	Dual Systems Corporation, Berkeley, California

jwg@lanierrnd.UUCP (Joe Guthridge) (11/30/84)

Recently David Buxbaum wrote about a V7 scheduling bug:

>I recently found an interesting bug in the Version 7 shared text code
>which, under the right circumstances, causes deadlock in the kernel.

To summarize, he was worried that a process exiting and writing out a
shared text segment had that text segment locked but could then be
swapped out in the middle of writing out that text segment.  When the
scheduler wants to schedule one of the other processes using the segment,
he hangs because he needs to lock that segment temporarily.

The scheduler code is:

>		if (rp->p_textp && rp->p_textp->x_flag&XLOCK)  
>			continue;

I cannot find a problem.  The flaws in the argument seem to me to be:

First, when you are busy writing out a shared text segment (with a call
to swap()) you are considered to have I/O in progress, and therefore your
process is *locked in core*, preventing you from being swapped!  You can be
descheduled, but you cannot be swapped out.

Second, the above code should prevent the other processes from being
scheduled, because their proc table entries still have a valid p_textp
that points to an XLOCK'ed text segment.

What have I missed?  We have never observed any problems like this.  Maybe
he has done some special work to his device drivers that lost the process
lock during I/O.

-- 
					Joe Guthridge
					..!akgua!lanierrnd!jwg

bux@dual.UUCP (Dave Buxbaum) (12/08/84)

In reply to Joe Gurthridge's misunderstanding of the bug:

>> I cannot find a problem. The flaws in the agrument seem to me to be:

>> Fisrt,when you are busy writing out a shared text segment you are considered
>> to have I/O in progress ........ you cannot be swapped out.

This is not the case.  It is quite possible to be swapped out in this instance.
My first article followed the thread rather closely but I will summarize here
for Joe's benefit.

From xccdec(), a call to swap() is made to write out the text segment. If
no swap buffers are availible, the process is put to sleep.  There is nothing
preventing the process at this point from being swapped out.

>> Second, the above code prevents other processes from being scheduled,
>> because their proc table entries still have a valid text pointer.

Well this is indeed true. However, the case of interest is when one or more
of these processes are already on the swapdev and  then try to get back in.

Again, this was explained in detail in my first article.
I am not surprized that many machines may not experience the problem since
it depends on some rather unlikely timing/loading.

Several other fixes have been proposed and I have been made aware of the
fact that this bug was posted to the net about 1 year ago(before my time!)


	David Buxbaum

	dual!bux@BERKELEY.ARPA
	{ihnp4,ucbvax,hplabs,decwrl,cbosgd,sun,nsc,apple,pyramid}!dual!bux
	Dual Systems Corporation, Berkeley, California