[net.bugs.4bsd] panic: dup page unlock

norman@decvax.UUCP (Norman Wilson) (08/28/83)

a bug in the paging subsystem in 4.1bsd, which can cause
somewhat inexplicable crashes (`panic: dup page unlock').

the problem (some understanding of berkeley's paging code assumed):
due to a logic error in vmpage.c/cleanup,
it's possible that in cleaning up after a dirty page push, the pageout
daemon will end up looking at the page tables of the process with the
dirty page, rather than at its own as it intended.  most of the time
this doesn't matter; but if, while the dirty page (and its dirty
neighbors) are being written out, the offending process happens to
do an sbrk and toss away some of the pages, the pt's won't be the
same anymore.  in particular, if the process then does another sbrk
to allocate more memory, it might end up with zero-fill pte's; this
will cause cleanup to call munlock with an argument of zero, which
is illegal; munlock doesn't check, but on my system at least, it happens
to make munlock think that the middle of the proc table is a struct cmap;
it happens that c_lock in the alleged cmap isn't set, hence a `dup page unlock'
panic results.

the bourne shell, which does frequent sbrks, seems to be particularly
likely to trigger the problem.

the exact problem: in the inner loop in cleanup, the variable `pte'
is meant to be pointing to the daemon's page table entry for each
page in the cleaned kluster in succession.  unfortunately, the code
at the bottom of the loop (which sets pte to the users' pte --
inside the second switch) leaves it pointing to the user's page table
instead ... which may have changed since the push was started ...
a quick and dirty fix is to invent a new struct pte * variable, and
to use that in the bottom of the loop, instead of pte.  a diff between
a fixed vmpage.c and the original 4.1bsd one follows:

810d809
< 	struct pte *upte;		/* ntw - quick hack */
861c860
< 					upte = tptopte(xp->x_caddr, c->c_page);
---
> 					pte = tptopte(xp->x_caddr, c->c_page);
865c864
< 					upte = dptopte(rp, c->c_page);
---
> 					pte = dptopte(rp, c->c_page);
869c868
< 					upte = sptopte(rp, c->c_page);
---
> 					pte = sptopte(rp, c->c_page);
872c871
< 				if (upte->pg_v) 
---
> 				if (pte->pg_v) 

i would claim that the real cure for this is to rewrite cleanup
to separate the code which frees `gone' pages from that which,
if possible, frees the centre page; since the criteria for these
are different, it would seem clearer (and less error-prone) to
take the latter out of the loop.  after proving the above fix, i
performed such a rewrite; if anyone's interested, send me mail.

i've managed (with a bit of screwing around) to reproduce the problem
at will, and am therefore reasonably confident of the fix; i've also
been running with the rewritten cleanup for several months here, with
no problems.

norman wilson
caltech high energy physics group
ucbvax!cithep!norman  decvax!cithep!norman