stevens@hsi.UUCP (Richard Stevens) (07/31/86)
We just installed 4.3 BSD on a 11-785 (that has been running 4.2 for over 2 years without any problems). Occasionally the entire system will "hang" - it will complete jobs that are running, the disk is being sync'ed, and it will echo characters on the terminals. However, the characters that are echo'ed are just ignored and you aren't able to run any new jobs on the system. We are then forced to reboot, and since it didn't panic, we don't get a crash dump to look at, to try and find out what was going on. Has anyone seen this before ?? We have heard from one 4.3 beta test site that encountered this on an 8600, but they would then reset their Systems Industries Massbus controller (for Eagles), and haven't had the problem since then. We do have 2 Eagles on an Emulex SC780 and a lot of DMF terminal lines. We did have to go back to our 4.2 DMF driver (the one from Chris Maloney at MDDC), since the 4.3 DMF driver didn't work with our Emulex CS21/F and CS32/F. Does anyone know how to force the kernel to generate a crash dump from the console ?? Any ideas or suggestions would be appreciated. Richard Stevens Health Systems International, New Haven, CT ihnp4 ! hsi ! stevens
muller@sdcc7.ucsd.EDU (Keith Muller) (08/02/86)
i had that problem also. It is bogus microcode in the cs21/f2 version. The board goes into a infinite loop while it has the unibus doing dma output. The reason it never shows up on 4.2 is that driver did not do dma on output. Next time it happens flip reset switch (position 1) on the cs21 from normal to reset and back. When the offending dmf is found the system will come back to life. Of course the dmfs are now in a weird state, but at least you can shut down cleanly. Bug emulex for a prom upgrade. keith Muller university of california muller@sdcsvax.ucsd.edu
terryl@tekcrl.UUCP (08/03/86)
In article <460@sdcc7.ucsd.EDU> muller@sdcc7.ucsd.EDU (Keith Muller) writes: > >i had that problem also. It is bogus microcode in the cs21/f2 version. The >board goes into a infinite loop while it has the unibus doing dma output. >The reason it never shows up on 4.2 is that driver did not do dma on output. >Next time it happens flip reset switch (position 1) on the cs21 from normal >to reset and back. When the offending dmf is found the system will come >back to life. Of course the dmfs are now in a weird state, but at least you >can shut down cleanly. Bug emulex for a prom upgrade. Another UNIBUS board that can do the same thing as described above is the Interlan NM10 ethernet card. Many times in the driver, the driver will issue a command to the board, and instead of letting the board interrupt the cpu when it's done with the command, will sit in a null loop waiting for the ready bit to come true. If the ready bit never comes true, it just sits in this null loop forever. A good thing to notice when the system just "seems to hang" is to try and halt the system if you can. If you can halt the system, take a look at the PC when it halted. This is how we found out about the Interlan card hanging the system. It was always happening in the routine ilrint().
rogers@dadla.UUCP (Roger Southwick) (08/06/86)
In article <395@hsi.UUCP> stevens@hsi.UUCP (Richard Stevens) writes: > ... > > Does anyone know how to force the kernel to generate a crash dump > from the console ?? Sure... 1) Sometime in multi-user, do the following command: $ nm /vmunix | grep doadump 80000e00 T _doadump 0E00 is the address of the start of the dump routine (on my 780 running 4.3BSD. Your address may be different) 2) When you need to get a crash dump (assumes Vax 11/780, 11/785: a) Halt the system (don't do any other console commands, except:) b) S 0E00 (system does dump, and halts) c) Boot as normal If you want to look at the crash in single user rather than multi, you can boot the system into single user, and then do: $ /etc/savecore /usr/crash This is what happens in /etc/rc, and will move the core file from the swap disk (where doadump puts it) into /usr/crash. Good luck finding your problem. -Roger Southwick Logic Analyzers Computer Resource Group Tektronix, Inc. rogers@dadla.tek.com
dlu%tektools.tek.csnet@CSNET-RELAY.arpa (08/06/86)
The problem you describe has not shown up, to my recolection, on the 785 here. But I do know how to get a dump from the console. Proceed thusly: ^P HALT START <address of doadump()> doadump() is the routine in the kernel that produces dumps on panics and you can restart the kernel there when the system hangs. To find the address of doadump() use nm(1). We have configured all of the systems a Tektronix so that they have doadump() at the same address (0xe00) which makes life a bit simpler. Doug Urner, UNIX Systems Support Group, Tektronix
chris@umcp-cs.UUCP (08/10/86)
In article <2836@brl-smoke.ARPA> dlu%tektools.tek.csnet@CSNET-RELAY.arpa writes: >The problem you describe has not shown up, to my recolection, on the >785 here. But I do know how to get a dump from the console [...] : > > ^P > HALT > START <address of doadump()> This does indeed provide a crash dump. However, it is almost always preferable to force an actual crash: ^P H D PC -1 ! N.B.: `@crash' does these three commands for you D PSL -1 ! (and perhaps the HALT as well). But this works CO ! even if you have no file `crash' on your floppy. or, on a 750, D/G F FFFFFFFF D P FFFFFFFF C This invalidates the PC, sets the PSL to kernel mode, IPL 31, and continues, causing an immediate trap and a `panic: Segmentation fault'. The advantage this has over simply starting the CPU at doadump() is that disk buffers are written, leaving the file systems in a much more recoverable state. (On the other hand, sometimes the disk syncing leads to another hang. A second trap invokes a crash dump without attempting to sync, so you still need not know the address of doadump().) -- In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 1516) UUCP: seismo!umcp-cs!chris CSNet: chris@umcp-cs ARPA: chris@mimsy.umd.edu