era@NIWOT.UCAR.EDU (04/28/89)
We have an occasional situation where our 90x running 4.4 freezes up. The system activity light indicates something going on, but there is no response to attempts to get into the machine thru the network or console. Question: is there some way, from one of the COS panels, to coerce OSx into dropping a core file? Thanks in advance for your assistance... Ed Arnold era@ncar.ucar.edu
steve@polyslo.CalPoly.EDU (Steve DeJarnett) (04/28/89)
In article <8904272154.AA00226@era.ucar.edu.UCAR.EDU> era@NIWOT.UCAR.EDU writes: >Question: is there some way, from one of the COS panels, to coerce >OSx into dropping a core file? Build a kernel with the Kernel debugger installed. To do this, edit your local Kernel configuration file (not conf.c but UCAR or whatever you call it) and add the line debugger Then, when this happens, from the console, type: CTRL-@ This will dump you into the kernel debugger, where you can do things like check which process is running, look at it's call stack, and just about anything else you'd probably ever want to do. One of the notable things you can also do is panic the kernel (from the <dbg> prompt, type pa (for panic, obviously). Hope that helps. ------------------------------------------------------------------------------- | Steve DeJarnett | Smart Mailers -> steve@polyslo.CalPoly.EDU | | Computer Systems Lab | Dumb Mailers -> ..!ucbvax!voder!polyslo!steve | | Cal Poly State Univ. |------------------------------------------------| | San Luis Obispo, CA 93407 | BITNET = Because Idiots Type NETwork | -------------------------------------------------------------------------------
cml@brachiosaur.cis.ohio-state.edu (Christopher Lott) (04/28/89)
In article <8904272154.AA00226@era.ucar.edu.UCAR.EDU> era@NIWOT.UCAR.EDU writes: >Question: is there some way, from one of the COS panels, to coerce >OSx into dropping a core file? Take this for what it's worth. These are some very, _very_ old instructions I once got from RTOC which were supposed to cause a pyramid machine (hung or otherwise) to panic. Takes some diddling, and be warned that it *never* worked for me. If this is hopelessly obsolete and wrong, would someone at Pyramid (hi, carl g?) please correct me? How to Force a Panic 1. If the system is hung, or you have a reason to cause a crash go to COS frame B and halt it by pushing the Z-key. 2. The system will stop. Make a note of the Program Counter in the system status line at the bottom of the screen. It will look like: FFxxxxxx 3. Alter memory word location following that address. If pc = FF150808 then change location FF15080C. (ie, add 0x4.) Store a 31000001 there. This will be the next instruction executed when the machine is restarted. Type 'M' to modify memory - you will see a display something like this: FF150800: 00000000 00000000 00000000 00000000 address ^800 ^804 ^808 ^80C The pyramid has long words (a 4 word boundary). 4. Next, alter GR0. Store the hex number 0000 0001 there. This is General Register 0 - displayed in frame B. Use command 'A' to modify registers. 5. Restart the CPU with the Z-key in frame B. The two new instructions force the computer to attempt exe- cution of a word instruction on a byte boundary. This forces a trap, and if savecore enabled (in /etc/rc), core will dump. All the contents of memory will be written to the swap device. 6. Hit <esc> 0, and watch to see that this happens. If the panic was caused by a disk error, the core-write may fail also. 7. Reboot. If savecore is enabled, the contents of the swap device will be copied once again into the directory specified; most usually, this is /usr/crash, but the custo- mers move it around. There must be enough free space in the file system to hold it. It will be as large as the memory copied, and in the case of repeated failures there will al- ready be other dumps stored there. my note: attempted 870817 to no avail. chris... -=- cml@cis.ohio-state.edu Computer Science Dept, OSU 614-292-1826 or: ...!{att,pyramid,killer}!osu-cis!cml <standard disclaimers>
csg@pyramid.pyramid.com (Carl S. Gutekunst) (04/28/89)
>Take this for what it's worth. These are some very, _very_ old instructions >I once got from RTOC which were supposed to cause a pyramid machine (hung or >otherwise) to panic. I remember going through this nonsense. I can't vouch for the exact procedure any more, but it did work. It was quite enough of a hassle that some kind soul added the `pa' command to the kernel debugger in OSx 4.0, as Steve described. True, you do need do need to build OSx with the kernel debugger in, and then wait for the *next* time the problem happens. I always build my kernels that way anyway; partly because I'm usually booting weird and fragile things in my kernels, and partly because I'm nosy. :-) The 'flags' field from COS Frame 1 can be set to control automatic entry into the debugger; see your System Admin Guide for details. Or call RTOC, if what you want isn't documented to your satisfaction. As far as the NCAR problem, by all means, if you're getting quiet lockups, build with the kernel debugger and take a core sample next time it happens. <csg>
generous@daitc.daitc.mil (Curtis Generous) (04/28/89)
In article <8904272154.AA00226@era.ucar.edu.UCAR.EDU> era@NIWOT.UCAR.EDU writes: >We have an occasional situation where our 90x running 4.4 freezes >up. .... >Question: is there some way, from one of the COS panels, to coerce >OSx into dropping a core file? > >Ed Arnold >era@ncar.ucar.edu If you have the kernel debugger compiled in (do a 'strings /vmunix' to check it out or look at the config file in /sys/kernel{_m}/HOSTNAME}, you can do a ^@ (as in <CTRL>@) at the console. This will throw you into the debugger. Just type 'panic' and then 'exit'. --curtis -- Curtis C. Generous DTIC Special Projects Office (DTIC-SPO) ARPA: generous@daitc.mil UUCP: {uunet,vrdxhq,lll-tis}!daitc!generous
karl@triceratops.cis.ohio-state.edu (Karl Kleinpaste) (04/28/89)
The suggestions for building a kernel w/debugger and then hitting ^@ to get control of it thereby, followed by a `pa' command, are fine...if the system is listening to the console. We had some problems with an early release of 4.4 where heavy network traffic caused the system to lock up so tight that it ignored the console - typing anything was rewarded with CPU BUSY in SSL #3, which usually says nothing more exciting than CPU[0]. It was necessary during these events to put the sort of operation which Chris Lott described to use. It worked, fortunately, and we did get usable core dumps out of it. The bugs causing the lockup were fixed months ago, of course. We're going to be upgrading several of our other 4.0 machines to 4.4 Real Soon Now. --Karl