[net.bugs.2bsd] Inexplicable 2.9BSD crashes

jwp@uwmacc.UUCP (jeffrey w percival) (05/27/85)

In my previous posting I neglected to mention console messages...

For the large majority of crashes, there is no console message.
Nothing at all.  In those cases I type a <cntl/P> H and then reboot.

For one of the crashes, though, there was this:

	ka6 = 7545
	aps = 141374
	pc = 343 ps = 110
	cpuerr = 160
	trap type 11
	panic: trap
	syncing disks...

This problem is really severe for us.  It *does* seem to be related
to CPU load; today I was doing a MAKEALL in /usr/src and crashed twice
(the mysterious no-message type).  If you have any thoughts about this,
even shots in the dark, please mail them to me.

-- 
	Jeff Percival ...!uwvax!uwmacc!jwp

jwp@uwmacc.UUCP (jeffrey w percival) (05/29/85)

Our crashes haven't abated, but many responses have increased my
awareness of the nature of our problem.  Let me summarize:
We have an 11/70, and are a brand new 2.9 site.  That is, we got
the 2.9BSD tape, followed the installation guide, recompiled a
local kernel, and have applied no patches of any kind to the
distributed source files.  Our changes to localopts.h were trivial;
2 DH's, etc.  No changes in subtle (to me) things like stack size.

It appears like there's been a lot of 2.9 analysis and debugging,
but because I'm new on the net (through this vax account), I've not
had the benefit of any previous interactions.  Some people have
sent me mail mentioning "segmentation code", xp and ht driver bugs
(we use both drivers), and so on, and I've asked a few respondees for more
info, but has anyone maintained a list of the key fixes I have to make
to keep from crashing two or three times a day?

Thanks for helping me on this!


-- 
	Jeff Percival ...!uwvax!uwmacc!jwp

johanw@ttds.UUCP (Johan Wide'n) (05/31/85)

One thing too watch out for is a bad /etc/init. /etc/init as delivered
to us did not reset several signals before execing other programs.

This bug was pointed out by (ihnp4!inuxc!isrnix!greg)

>Received: by isrnix.UUCP; Wed, 13 Jun 84 18:03:38 EST
>To: BERKELEY!2bsd-people
>Subject: 2.9 problem
>
>
>   I'm having a problem with 2.9 - I am getting spurious signals sent
>to all the processes - the signal is signal 9 (I'm almost sure) and
>init (proc 1) does not get it (although everything else does).  Has
>anyone else seen this?  I have VFORK and MENLO_JCL turned on f.y.i.
>Any ideas?
---------------
>Received: by isrnix.UUCP; Thu, 14 Jun 84 18:11:15 EST
>To: BERKELEY!2bsd-people
>Subject: My earlier message about interrupts.
>
>
>  Found the culprit -
>
> There is a bug in our distributed version of init.c.  In dofork and
>runcom you should add the signal calls
>
>                signal(SIGINT, DIG_DFL);
>                        and
>                signal(SIGTERM, SIG_DFL);
>
>Greg

So: check out your /usr/src/cmd/init.c.
Here is a context diff:
*** init.c.org	Wed May 18 20:54:04 1983
--- init.c	Fri Apr 26 16:13:28 1985
***************
*** 201,206
  
  	pid = fork();
  	if(pid == 0) {
  		open("/", 0);
  		dup(0);
  		dup(0);

--- 201,208 -----
  
  	pid = fork();
  	if(pid == 0) {
+ 		signal(SIGTERM, SIG_DFL);
+ 		signal(SIGINT, SIG_DFL);
  		open("/", 0);
  		dup(0);
  		dup(0);
***************
*** 205,210
  		dup(0);
  		dup(0);
  #ifdef	UCB_AUTOBOOT
  		if ((howto & RB_SINGLE) || (howto & RB_NOFSCK))
  			arg1 = "fastboot";
  		else

--- 207,213 -----
  		dup(0);
  		dup(0);
  #ifdef	UCB_AUTOBOOT
+ 		signal(SIGQUIT, SIG_DFL);
  		if ((howto & RB_SINGLE) || (howto & RB_NOFSCK))
  			arg1 = "fastboot";
  		else
***************
*** 413,418
  	pid = fork();
  	if(pid == 0) {
  		signal(SIGTERM, SIG_DFL);
  		signal(SIGHUP, SIG_IGN);
  		strcpy(tty, dev);
  		strncat(tty, p->line, LINSIZ);

--- 416,425 -----
  	pid = fork();
  	if(pid == 0) {
  		signal(SIGTERM, SIG_DFL);
+ 		signal(SIGINT, SIG_DFL);
+ #ifdef	UCB_AUTOBOOT
+ 		signal(SIGQUIT, SIG_DFL);
+ #endif
  		signal(SIGHUP, SIG_IGN);
  		strcpy(tty, dev);
  		strncat(tty, p->line, LINSIZ);

	johanw@ttds             Johan Widen