[net.unix-wizards] System V is Crashing

mlip@nadc.ARPA (08/13/85)

A colleague of mine is trying to bring up AT&T's UNIX System V on a VAX
11/780.  Periodically, the system will crash and halt the CPU with the
following message:

Machine software error: Protection violation on interrupt stack.

AT&T insists that the crash is due to a hardware error.  However, this
same VAX can run UNIX 4.1BSD and VMS with no problems.  Any clues to the
cause of this problem would be immensely appreciated.

Thank You,
mlip@nadc (Michael Lipczynski)

phil@qfdts.oz (Phil Chadwick) (08/23/85)

In article <666@brl-tgr.ARPA> mlip@nadc.ARPA (Michael Lipczynski) says:

    >A colleague of mine is trying to bring up AT&T's UNIX System V on a VAX
    >11/780.  Periodically, the system will crash and halt the CPU with the
    >following message:
    >
    >Machine software error: Protection violation on interrupt stack.

Our VAX 750 runs System V release 2 version 2.  I installed a new serial
printer and the system crashed (with a protection violation as above)
every time something was spooled to it.  I was experimenting with the
spooler interface program for this printer and changing the tty line
paramaters with ioctl to some unusual settings.  They were from memory:

	tio.c_iflag = IXON;
	tio.c_oflag = OPOST|ONLCR|NL1|CR3|TAB3|BS1|VT1|FF1;
	tio.c_cflag = B9600|CS8|CREAD;
	tio.c_lflag = 0;

By using crash(1m) I discovered zillions of entries in the callout array
which were to restart the printer's tty driver.  I can't remember the
exact circumstances, but I know that the callouts looked suspect and the
use of fill characters (add OFILL to tio.c_oflag above) for delays rather
than callouts cured the problem immediately.

----
Phil Chadwick		  Australia: (07) 2296500
Department of Forestry	  International: +61 7 2296500
PO Box 5		  ACSnet: phil@qfdts.oz
Brisbane, Roma Street	  ARPA: decvax!mulga!qfdts.oz!phil@UCB-VAX.ARPA
AUSTRALIA	4001	  UUCP: {decvax,vax135,eagle,pesnta}!mulga!qfdts.oz!phil

alan@drivax.UUCP (Alan Fargusson) (08/27/85)

> By using crash(1m) I discovered zillions of entries in the callout array
> which were to restart the printer's tty driver.  I can't remember the
> exact circumstances, but I know that the callouts looked suspect and the
> use of fill characters (add OFILL to tio.c_oflag above) for delays rather
> than callouts cured the problem immediately.

I had forgotten about that one. There is a bug in the DZ driver (dz.c)
in dzproc. The case T_OUTPUT there is a line like:
		if (tp->t_state & (BUSY|TTSTOP))

It should be:
		if (tp->t_state & (TIMEOUT|BUSY|TTSTOP))

I told someone at AT&T about it over a year ago, but I don't think they
wanted to know.
-- 

Alan Fargusson.

{ ihnp4, amdahl, mot }!drivax!alan