[net.unix-wizards] 4.3 hangs

stevens@hsi.UUCP (Richard Stevens) (07/31/86)

We just installed 4.3 BSD on a 11-785 (that has been running 4.2
for over 2 years without any problems).  Occasionally the entire
system will "hang" - it will complete jobs that are running,
the disk is being sync'ed, and it will echo characters on the
terminals.  However, the characters that are echo'ed are just
ignored and you aren't able to run any new jobs on the system.
We are then forced to reboot, and since it didn't panic, we don't
get a crash dump to look at, to try and find out what was going on.

Has anyone seen this before ??  We have heard from one 4.3 beta
test site that encountered this on an 8600, but they would then
reset their Systems Industries Massbus controller (for Eagles),
and haven't had the problem since then.  We do have 2 Eagles on an
Emulex SC780 and a lot of DMF terminal lines.  We did have to go back
to our 4.2 DMF driver (the one from Chris Maloney at MDDC), since the
4.3 DMF driver didn't work with our Emulex CS21/F and CS32/F.
Does anyone know how to force the kernel to generate a crash dump
from the console ??
Any ideas or suggestions would be appreciated.

	Richard Stevens
	Health Systems International, New Haven, CT
           ihnp4 ! hsi ! stevens

muller@sdcc7.ucsd.EDU (Keith Muller) (08/02/86)

i had that problem also. It is bogus microcode in the cs21/f2 version. The
board goes into a infinite loop while it has the unibus doing dma output.
The reason it never shows up on 4.2 is that driver did not do dma on output.
Next time it happens flip reset switch (position 1) on the cs21 from normal
to reset and back. When the offending dmf is found the system will come
back to life. Of course the dmfs are now in a weird state, but at least you
can shut down cleanly. Bug emulex for a prom upgrade.

	keith Muller
	university of california
	muller@sdcsvax.ucsd.edu

terryl@tekcrl.UUCP (08/03/86)

In article <460@sdcc7.ucsd.EDU> muller@sdcc7.ucsd.EDU (Keith Muller) writes:
>
>i had that problem also. It is bogus microcode in the cs21/f2 version. The
>board goes into a infinite loop while it has the unibus doing dma output.
>The reason it never shows up on 4.2 is that driver did not do dma on output.
>Next time it happens flip reset switch (position 1) on the cs21 from normal
>to reset and back. When the offending dmf is found the system will come
>back to life. Of course the dmfs are now in a weird state, but at least you
>can shut down cleanly. Bug emulex for a prom upgrade.

     Another UNIBUS board that can do the same thing as described above is the
Interlan NM10 ethernet card. Many times in the driver, the driver will issue a
command to the board, and instead of letting the board interrupt the cpu when
it's done with the command, will sit in a null loop waiting for the ready bit
to come true. If the ready bit never comes true, it just sits in this null loop
forever.

     A good thing to notice when the system just "seems to hang" is to try and
halt the system if you can. If you can halt the system, take a look at the PC
when it halted. This is how we found out about the Interlan card hanging the
system. It was always happening in the routine ilrint().

rogers@dadla.UUCP (Roger Southwick) (08/06/86)

In article <395@hsi.UUCP> stevens@hsi.UUCP (Richard Stevens) writes:
> ...
>
> Does anyone know how to force the kernel to generate a crash dump
> from the console ??

Sure...
    1) Sometime in multi-user, do the following command:

	$ nm /vmunix | grep doadump
	80000e00 T _doadump

	0E00 is the address of the start of the dump routine
	(on my 780 running 4.3BSD.  Your address may be different)

    2) When you need to get a crash dump (assumes Vax 11/780, 11/785:

	    a) Halt the system (don't do any other console commands, except:)
	    b) S 0E00
	       (system does dump, and halts)
	    c) Boot as normal

	
    If you want to look at the crash in single user rather than
    multi, you can boot the system into single user, and then do:

	$ /etc/savecore /usr/crash
    
    This is what happens in /etc/rc, and will move the core file from
    the swap disk (where doadump puts it) into /usr/crash.


Good luck finding your problem.

	-Roger Southwick

	Logic Analyzers Computer Resource Group
	Tektronix, Inc.

	rogers@dadla.tek.com

dlu%tektools.tek.csnet@CSNET-RELAY.arpa (08/06/86)

The problem you describe has not shown up, to my recolection, on the
785 here.  But I do know how to get a dump from the console.  Proceed
thusly:

	^P
	HALT
	START <address of doadump()>

doadump() is the routine in the kernel that produces dumps on panics
and you can restart the kernel there when the system hangs.  To find
the address of doadump() use nm(1).  We have configured all of the
systems a Tektronix so that they have doadump() at the same address
(0xe00) which makes life a bit simpler.

Doug Urner, UNIX Systems Support Group, Tektronix

chris@umcp-cs.UUCP (08/10/86)

In article <2836@brl-smoke.ARPA> dlu%tektools.tek.csnet@CSNET-RELAY.arpa writes:
>The problem you describe has not shown up, to my recolection, on the
>785 here.  But I do know how to get a dump from the console [...] :
>
>	^P
>	HALT
>	START <address of doadump()>

This does indeed provide a crash dump.  However, it is almost always
preferable to force an actual crash:

	^P
	H
	D PC -1		! N.B.: `@crash' does these three commands for you
	D PSL -1	! (and perhaps the HALT as well).  But this works
	CO		! even if you have no file `crash' on your floppy.

or, on a 750,

	D/G F FFFFFFFF
	D P FFFFFFFF
	C

This invalidates the PC, sets the PSL to kernel mode, IPL 31, and
continues, causing an immediate trap and a `panic: Segmentation
fault'.  The advantage this has over simply starting the CPU at
doadump() is that disk buffers are written, leaving the file systems
in a much more recoverable state.

(On the other hand, sometimes the disk syncing leads to another
hang.  A second trap invokes a crash dump without attempting to
sync, so you still need not know the address of doadump().)
-- 
In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 1516)
UUCP:	seismo!umcp-cs!chris
CSNet:	chris@umcp-cs		ARPA:	chris@mimsy.umd.edu