[net.bugs.4bsd] setrq panic

root@ascvax.UUCP (vax) (10/03/85)

I am running 4.2 BSD on a VAX 750.  About once a day, our
system will crash with a setrq panic.  Sometimes the update()
from panic succeeds; sometimes it fails with a panic from
sleep.  The dump always indicates that the kernel has called
setrq from syscall after finding runrun set.  The panic is
supposed to indicate that p_rlink is nonzero for the proc structure
passed to setrq.  In fact, the proc referenced by u.u_procp
does have p_rlink set, and this is proc passed by syscall to setrq.
Since the parameter to setrq is not placed on the stack, I have
not been able to find it in the dump (yet).  One bizzare aspect
to the whole thing is that syscall is always just finishing up
with a getpagesize() SVC!

Other miscellaneous data:  The system has 2 Ampex Capricorn Disk Drives,
an SI disk controller, an Ampex tape drive, a line printer, and 4
DZ's.  The disk controller is attached to the MASBUS and the rest
of the peripherals are attached to the UNIBUS.  We have an out of
rev back plane, but no evidence implicating it directly, but of
course, until the problem is solved we can't be sure.

I would greatly appreciate any advice as to what the problem
might be or what to look at or what to look for.

If there is interest, I will post results to the net.

			Thanks in advance,
			Pat Keziah

			hao!ascvax!pjk

			Ampex Switcher Company
			10604 West 48th Avenue
			Wheatridge,  CO  80033
			USA

			(303)423-1300 x226

chris@umcp-cs.UUCP (Chris Torek) (10/04/85)

I cannot guess the cause of your setrq panic; but you can prevent
the secondary panic and subsequent update failure by altering the
sleep code.  Add the "if (panicstr) {" ... "}" part:

sleep(chan, pri)
	...
	s = spl6();
	if (panicstr) {
		/*
		 * Let interrupts in for a moment, then just return.
		 * The splnet() really ought to be spl0(), but I'm
		 * too timid to do that.
		 */
		(void) splnet();
		splx(s);
		return;
	}
	if (chan == 0 || rp->p_stat != SRUN || rp->p_rlink)
		panic("sleep");
	...
-- 
In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 4251)
UUCP:	seismo!umcp-cs!chris
CSNet:	chris@umcp-cs		ARPA:	chris@mimsy.umd.edu

pjk@ascvax.UUCP (Pat Keziah) (10/18/85)

In article <102@ascvax.UUCP>, I described a setrq panic I am getting
regularly from a VAX 750 running 4.2 BSD UNIX.  In that article,
I claimed that the system was always in the midst of servicing
a getpagesize() SVC.  Well ..  (blush) ... while the system
is indeed consistently servicing the same SVC, that SVC is vfork(),
not getpagesize().  The SVC number is 66 (decimal) in all my notes
and all my dumps.  Somehow, I got confused in tracing through syscall
and sysent.

Anyway, does a problem with vfork ring any new bells, or suggest
anything to anyone?  I am continuing my own investigation and
look forward to any suggestions.

Thanks in advance,
Pat Keziah
-- 

Pat Keziah				Ampex Switcher Company
{hao,boulder,avsdS}!ascvax!pjk		10604 West 48th Avenue
					Wheatridge,  CO  80033
					USA
					(303)423-1300 x226