[net.unix-wizards] V7 "proc on q" diagnostic message

pag@hao.UUCP (Peter Gross) (11/19/84)

In V7 Unix (and perhaps others) there is a diagnostic printf "proc on q"
in the routine setrq() which puts its argument on the run queue.  The
message is the result of finding the given process already on the run
queue.  We were getting this message occasionally in early morning hours.
I changed the printf() to a panic() to get a better handle on what
was happening.  It turned out to be a program which does a double-
buffered copy of the root file system to a spare file system (uses the
program "dbuf" which was posted to USENET long ago).  This program forks
off 4 processes, 2 each for reading and writing, and does lots of signalling
back and forth.  The panic comes when one process sends another an EMT
signal to indicate that an I/O operation has completed.  The kernal traps
the kill() system call, calls psignal(), which in turn calls setrq().
setrq() finds that the proc is already on the runq (yet it's p_stat was
SSLEEP!).  Sounds like a race condition to me.  Our kernel is highly
hacked for performance (load-based scheduling, auto-nicing, forced swapping
of cpu-bound procs, etc.)  I am curious if anyone else has seen the
"proc on q" message, and what caused it.

--peter gross
hao!pag			UUCP
hao!pag@seismo.ARPA	ARPA

edwards@felix.UUCP (Dave Edwards) (12/01/84)

I've always been amazed at how a bug can exist for years with nobody
noticing it, then within days of each other, several people do.

Peter Gross has noted some "proc on q" printfs from his V7 system.  He
does not mention what incarnation of V7 or what type of system he has,
but I have seen something similar on our V7 derived from the MIT 68000
port.  Incidentally, our news feed has apparently been a bit flakey
lately, so if this has already been covered, could somebody please
mail me the resolution?

> In V7 Unix (and perhaps others) there is a diagnostic printf "proc on q"
> in the routine setrq() which puts its argument on the run queue.  The
> message is the result of finding the given process already on the run
> queue.  We were getting this message occasionally in early morning hours.

We were also getting this message, while running a complex application
under development.

>                  The panic comes when one process sends another an EMT
> signal to indicate that an I/O operation has completed.  The kernal traps
> the kill() system call, calls psignal(), which in turn calls setrq().
> setrq() finds that the proc is already on the runq (yet it's p_stat was
> SSLEEP!).  Sounds like a race condition to me.

This is almost exactly the same condition that caused it for us.  In
looking at things, I discovered that MIT had significantly changed the
way setrun() and wakeup() interact.  To be precise, in our version,
psignal() calls setrun(), which calls setrq().

Now, the way psignal() is written, there is indeed a race condition,
having to do with an interrupt causing the process to become runnable
between the check for the process being in SSLEEP state and the spl6()
in setrq().  In my copy of the standard PDP 11 V7 code, there is a
bizarre way of avoiding this condition which relies on wakeup() being
idempotent.  This way of doing things is admittedly inefficient and
could also cause other processes to wake up when they shouldn't, so
I don't blame the MIT people for changing it.

However, their change re-opens the race condition.  My fix was to
change setrun(), since I don't know what other nasty things might
be done in other procedures which call it.  It involves putting an
spl6() around slightly more code and checking for the presence of
the race.  It causes the priority to be high for slightly longer,
but I believe it is better than the original V7 code.

I have installed this fix just this week, and we haven't seen the
problem again, although it is pretty rare and heavily dependent
on timing conditions, so I can't vouch for the perfection of my
fix.  But if anyone wants the code, let me know.

				Dave Edwards
				FileNet Corp.
				Costa Mesa, Calif.
		{ucbvax,decvax,ihnp4,sdcrdcf}!trwrb!felix!edwards