[comp.unix.wizards] Making sense of a system coredump

martin@slsiris.harvard.edu (Mr. Science) (06/14/91)

Im in a bind, and perhaps some kind soul out there can help me (at
least conceptually).  Our system (SGI Iris 4D) crashed this A.M.;
there was no trouble in restarting it, and I have strong suspicions on
what precipitated the crash.  What got my little pea-sized brain
working was the notice of /usr/adm/crash/vmcore.1 and unix.1.
According to the man page on savecore, this coredump is created in
response to a system crash.  I would like to examine these files to
track down the problem, but HOW DO I DO THIS????   The savecore man
page gives no hints.  I have tried looking at this file with dbx/edge
to no avail.  My question is : am I looking in the wrong place??  What
do I do with a system core dump (besides delete it...).

None of this system admin stuff makes sense to a former VAX/VMS system
manager... perhaps I just need to be pointed in the right direction.

Advice appreciated,
...pkm
--
....................................................................
Patrick Martin
Martin@SLSVAX.Harvard.Edu
Martin@SLSIRIS.Harvard.Edu

Disclaimer: I get taxed too high to afford an opinion.
....................................................................

bout@convex.com (David Boutilier) (06/14/91)

I'm afraid you're pretty much stuck with using adb.  To the best of my
knowledge, no other debugger will work effectively with the kernel.

amirosh@shakespyr.Pyramid.COM (Alex Miroshnichenko) (06/14/91)

Normal (BSD - like Unix ) usually have either adb or crash (or both).
On SGI you may use dbx in the kernel mode :
	cd to directory with your dumps and
	dbx -k vmunix.0 vmcore.0
After that you may try "where" command in dbx to see you stack trace.
Printing kernel structures is a little bit of a hassle (I usually just dump
memory and have system header listings ready :-). 

NOTE : the first argument to a function in the stack trace is often a garbadge.

jfh@rpp386.cactus.org (John F Haugh II) (06/14/91)

In article <1991Jun13.220751.26994@convex.com> bout@convex.com (David Boutilier) writes:
>I'm afraid you're pretty much stuck with using adb.  To the best of my
>knowledge, no other debugger will work effectively with the kernel.

Not that his system actually has the "crash" command, but that is my
preferred tool.  You typically can find out what module was being
executed that caused the panic as well as the location (if you are
handy with ADB).
-- 
John F. Haugh II        | Distribution to  | UUCP: ...!cs.utexas.edu!rpp386!jfh
Ma Bell: (512) 255-8251 | GEnie PROHIBITED :-) |  Domain: jfh@rpp386.cactus.org
"UNIX signals are not interrupts.  Worse, SIGCHLD/SIGCLD is not even a UNIX
 signal, it's an abomination."  -- Doug Gwyn

lance@motcsd.csd.mot.com (lance.norskog) (06/19/91)

This does not answer his problem, but does explain how to crash on 386 UNIX.
(I cannot afford SGI gear, alas.)

When Unix does a panic and saves all its RAM to disk, it dumps it into
the swap area.  

After it reboots, it asks you if you want to save the core dump.
If you save it to tape, you get a raw disk dump of the swap area.
You can then 'dd' the tape into a Unix file, and run
	/etc/crash -d /tmp/bigcore -n /unix

I took this question out of the system start-up, and just run
	/etc/crash -d /dev/swap
after the system startup script does its endless disk preening.

There's a trick here: the system startup scripts run one program
at a time.  With even a 4 megabyte system, it never really needs to
page any data out before that first console login prompt.  So, your
panic dump is still intact in /dev/swap.

You can verify this by running '/etc/swap -l'.

Lance Norskog