root@ascvax.UUCP (vax) (10/03/85)
I am running 4.2 BSD on a VAX 750. About once a day, our system will crash with a setrq panic. Sometimes the update() from panic succeeds; sometimes it fails with a panic from sleep. The dump always indicates that the kernel has called setrq from syscall after finding runrun set. The panic is supposed to indicate that p_rlink is nonzero for the proc structure passed to setrq. In fact, the proc referenced by u.u_procp does have p_rlink set, and this is proc passed by syscall to setrq. Since the parameter to setrq is not placed on the stack, I have not been able to find it in the dump (yet). One bizzare aspect to the whole thing is that syscall is always just finishing up with a getpagesize() SVC! Other miscellaneous data: The system has 2 Ampex Capricorn Disk Drives, an SI disk controller, an Ampex tape drive, a line printer, and 4 DZ's. The disk controller is attached to the MASBUS and the rest of the peripherals are attached to the UNIBUS. We have an out of rev back plane, but no evidence implicating it directly, but of course, until the problem is solved we can't be sure. I would greatly appreciate any advice as to what the problem might be or what to look at or what to look for. If there is interest, I will post results to the net. Thanks in advance, Pat Keziah hao!ascvax!pjk Ampex Switcher Company 10604 West 48th Avenue Wheatridge, CO 80033 USA (303)423-1300 x226
chris@umcp-cs.UUCP (Chris Torek) (10/04/85)
I cannot guess the cause of your setrq panic; but you can prevent
the secondary panic and subsequent update failure by altering the
sleep code. Add the "if (panicstr) {" ... "}" part:
sleep(chan, pri)
...
s = spl6();
if (panicstr) {
/*
* Let interrupts in for a moment, then just return.
* The splnet() really ought to be spl0(), but I'm
* too timid to do that.
*/
(void) splnet();
splx(s);
return;
}
if (chan == 0 || rp->p_stat != SRUN || rp->p_rlink)
panic("sleep");
...
--
In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 4251)
UUCP: seismo!umcp-cs!chris
CSNet: chris@umcp-cs ARPA: chris@mimsy.umd.edu
pjk@ascvax.UUCP (Pat Keziah) (10/18/85)
In article <102@ascvax.UUCP>, I described a setrq panic I am getting
regularly from a VAX 750 running 4.2 BSD UNIX. In that article,
I claimed that the system was always in the midst of servicing
a getpagesize() SVC. Well .. (blush) ... while the system
is indeed consistently servicing the same SVC, that SVC is vfork(),
not getpagesize(). The SVC number is 66 (decimal) in all my notes
and all my dumps. Somehow, I got confused in tracing through syscall
and sysent.
Anyway, does a problem with vfork ring any new bells, or suggest
anything to anyone? I am continuing my own investigation and
look forward to any suggestions.
Thanks in advance,
Pat Keziah
--
Pat Keziah Ampex Switcher Company
{hao,boulder,avsdS}!ascvax!pjk 10604 West 48th Avenue
Wheatridge, CO 80033
USA
(303)423-1300 x226