[net.unix-wizards] 4.2 crashes

root%cit-vax@sri-unix.UUCP (10/25/83)

From:  Root of All Evil <root@cit-vax>

    We just installed 4.2bsd and have since been getting  crashes  every
few  hours.  These  are  usually  trap  type 9 or 8, i.e. illegal memory
references, and are for the most part in the clist manipulation routines
(as  evidenced  by the "pc =" part of the trap message). Has anyone else
met with these problems in putting up 4.2bsd?
    Another, possibly related, problem is that every once in a  while  a
terminal  (on  a  dz11)  will  just  hang.  "pstat  -t"  says that it is
"awaiting output".  Killing  the  process  on  the  terminal  leaves  it
<exiting>,  presumably waiting for the terminal in a close. It isn't the
dz because it happens on several of them.
    Are these related?  Might  it  be  hardware?   Any  ideas  would  be
appreciated.  I have a hard time believing that 4.2 would be distributed
with such glaring bugs, but my hardware worked fine under 4.1.

		    * Eric Holstege  Caltech, Pasadena, CA.
		    * eric@cit-vax   eric@cit-20

sam@ucbvax.UUCP (10/30/83)

I have spoken to a number of people running 4.2 and none
have experienced the problem mentioned, nor have any bug
reports been mailed to Berkeley regarding such a problem
(I'm no longer there to get phone calls, so I wouldn't know
if you called).  Since you presumably have a crash dump,
you should be able to at least give more information than
it crashes with a trap 8/9 and/or lots of dz's hang.  Even
differentiating between trap type 8 and 9 would be useful.
It is almost always a simple matter to get a stack trace
from a crash dump.  Try the following after your system
reboots and savecore has recorded the dump;
	% cd /usr/crash		(or similar)
	% adb -k vmunix.XX vmcore.XX
	....
	*(scb-4)$c
This should give you a stack trace.  If it doesn't, then
you have to search for the stack frame on the interrupt or
system stack (there should always be a recognizable frame
on the interrupt stack if a dump was performed); once again
not too difficult, but more than I care to discuss here.

kolstad@parsec.UUCP (11/02/83)

#R:sri-arpa:-1296500:parsec:44200014:000:147
parsec!kolstad    Oct 31 20:59:00 1983

The guys at SMU have had these identical problems.  I assumed it was
their brand new (and hence unproven hardware).  Can anyone help???

						Rob

kolstad@parsec.UUCP (11/17/83)

#R:sri-arpa:-1296500:parsec:44200015:000:515
parsec!kolstad    Nov 16 11:58:00 1983

I should have posted a reply earlier.

As I understand, a half dozen or so "erroneous tapes" were mailed out
with a cleverer-but-defective routine to save some time calling a clock
routine.  I received almost instant help by calling Berkeley and had 
the SMU machine fixed post haste by changing less than a dozen lines of
code.

Since those fixes (which almost everyone else received in their standard
distribution), the SMU machine has neither crashed nor hung a DZ.

Good stuff, that BSD distribution.

					Rob