[comp.unix.sysv386] Kernel core dumps

chap@art-sy.detroit.mi.us (j chapman flack) (05/04/91)

In article <450@bartal.BARTAL.COM> phillip@BARTAL.COM (Phillip M. Vogel) writes:
>When the kernel dumps core, it puts the core dump into the swap
>area ON THE PRIMARY DISK. Well, 8 megs of core dump into 5 megs

This reminded me of questions I've been meaning to ask.  I never knew where
the kernel core dump goes in a panic (and so far I've had no opportunity to
find out....).  This posting suggests it goes in the swap area, but that
brings up an immediate question:

  At what point does the kernel begin using the swap area on the next boot??
  How am I able to use `crash' to examine the core dump before the evidence
  is overwritten?  Or does something check for the presence of a core dump
  in the swap area at boot time and copy it to a file for later examination?
  In that case, my first question is back: Where does it go?

Here's another question: when I first installed this system, it would
constantly overflow the kernel file and inode tables, causing all sorts
of programs to fail unpredictably.  At the time, I didn't know if the
default table sizes were preposterous or if some runaway bug was filling the
tables.  It would have been handy to be able to run something as root that
forces a panic, then reboot and analyze the dump while the system is still
reasonably reliable.  Sort of like running OPCCRASH from the console on
a VAX.  Does anybody have a panic-forcing program?  This is SCO SysV 3.2.

(Btw, it turned out the table sizes were preposterous.  They came out of the
box ready to accommodate about as many files and inodes as the daemons have
open before I log in....)

Thanks!
-- 
Chap Flack                         Their tanks will rust.  Our songs will last.
chap@art-sy.detroit.mi.us                                   -Mikos Theodorakis

Nothing I say represents Appropriate Roles for Technology unless I say it does.

jackv@turnkey.tcc.com (Jack F. Vogel) (05/04/91)

In article <9105031411.aa04050@art-sy.detroit.mi.us> chap@art-sy.detroit.mi.us (j chapman flack) writes:

>This reminded me of questions I've been meaning to ask.  I never knew where
>the kernel core dump goes in a panic (and so far I've had no opportunity to
>find out....).  This posting suggests it goes in the swap area, but that
>brings up an immediate question:

Initial point: you say you are running SCO, since I run ISC what follows
may or may not apply. I am not sure how close the two systems are in handling
panic dumps...

>  At what point does the kernel begin using the swap area on the next boot??

Sometime before coming up fully multiuser a script is run, /etc/dumpsave,
which decides if there is a dump in the swap area and prompts you if you
want to save it. You can save either to floppy or tape. If you choose
floppy you must have enough formatted floppies to hold it (the size will 
be approximately your real memory size). If you choose not to save, the
kernel will begin using the swap device and the data will be overwritten.

>  How am I able to use `crash' to examine the core dump before the evidence
>  is overwritten? 

Since the swap device is used, there is no way to try to examine the dump
prior to saving and reloading it.

> It would have been handy to be able to run something as root that
>forces a panic, then reboot and analyze the dump while the system is still
>reasonably reliable.

Again, I can't speak for SCO's implementation, but one way to do this
given the AT&T standard is to have the kernel debugger linked into your
kernel (see debugger(8)) and then if you want to force a dump, enter
the debugger and give a 'sysdump' command. I must emphasize that I have
never done this, and given that it dumps all over your swap space I 
would presume this is fatal to the running system :-} :-}. Or one could
just use the debugger to branch the kernel to panic(). The ISC docs have
a cryptic 'BUGS' statement about 'sysdump' sometimes not working without
any details, perhaps someone at Interactive could comment?? But, in theory
at least, this should do what you want.

Just for comparison, no commercial plug really intended this is just for
an implementation example, AIX on the PS/2 allows you to create a dedicated
dump partition where multiple dumps can be stored. This allows you to take
"running dumps" of the system which can then be examined with crash at
your leisure. Alternately, you can configure the kernel for the dump device
to be the floppy, then if you ever panic the system will stop and prompt
you if you want to save the dump to insert the floppy, etc... From a
serious service point of view this functionality is essential. Of course,
this consumes disk space, but its an option if you so desire. Oh yes, 
there is also a key sequence on the console that lets you force either
a running dump or a panic at any particular time.

Disclaimer: I don't speak for the company!

-- 
Jack F. Vogel			jackv@locus.com
AIX370 Technical Support	       - or -
Locus Computing Corp.		jackv@turnkey.TCC.COM

vandys@sequent.com (Andrew Valencia) (05/04/91)

chap@art-sy.detroit.mi.us (j chapman flack) writes:
>This reminded me of questions I've been meaning to ask.  I never knew where
>the kernel core dump goes in a panic (and so far I've had no opportunity to
>find out....).

	This is the heart of the matter.  IMHO, SCO has done a pretty good job
of hammering their product into a state where it just runs and runs and runs
with little ado.  If you had a choice between a system that had a very nice and
powerful crash dumping and analysis system, and one that simply didn't crash in
the first place, which would you pick?

>  At what point does the kernel begin using the swap area on the next boot??
>  How am I able to use `crash' to examine the core dump before the evidence
>  is overwritten?

	The rest of these comments are from my ESIX system.

	As you boot up the script /etc/dumpsave is called and
goes about copying the crash dump to another place.  It is invoked by
/etc/bcheckrc if fsstat on the root device indicates that the root filesystem
needs cleaning (which indicates some sort of crash in the first place).
I usually see this message after a powerfail, so it's offering to save a dump
that doesn't even exist.  Oh, well.

	/etc/dumpsave is kind of a crock.  It's hard-coded to dump to
some sort of floppy/tape device.  I guess they didn't want to deal with
getting the other filesystems mounted first.  There'd be a definite danger
there, as fsck could well scribble on the swap area.

	Finally, they give you /etc/ldsysdump to copy these same floppies
back into the filesystem.  You run this after you get your system back up and
have clean, mounted filesystems to put the crash in.

>It would have been handy to be able to run something as root that
>forces a panic, then reboot and analyze the dump while the system is still
>reasonably reliable.

	Another strategy would be to run /etc/crash in one window and then
switch to another and run your programs.  When things get bad, switch back
and look around on your running kernel.  By having /etc/crash already running,
your inode, etc. shortages shouldn't keep you from looking.  Just an idea.
(In case you hadn't tried this, running crash without arguments
makes it run on /unix and /dev/mem, which means you're looking at the state of
your running system.)

						Regards,
						Andy Valencia
						vandys@sequent.com

Disclaimer: these are just my opinions, one and all.

mike@bria.UUCP (mike.stefanik) (05/05/91)

In an article, vandys@sequent.com (Andrew Valencia) writes:
|This is the heart of the matter.  IMHO, SCO has done a pretty good job
|of hammering their product into a state where it just runs and runs and runs
|with little ado.  If you had a choice between a system that had a very nice and
|powerful crash dumping and analysis system, and one that simply didn't crash in
|the first place, which would you pick?

Where is this mythical beast that SCO has given birth to?  Yes, the one
that "simply doesn't crash in the first place"?  Could you please enlighten
me as to what unique version of the operating system that you are using?

There is never any good excuse for any operating system to be without
the ability to dump itself when it crashes.  It is laziness, pure and simple.

-- 
Michael Stefanik, MGI Inc, Los Angeles | Opinions stated are never realistic
Title of the week: Systems Engineer    | UUCP: ...!uunet!bria!mike
-------------------------------------------------------------------------------
If MS-DOS didn't exist, who would UNIX programmers have to make fun of?

allbery@NCoast.ORG (Brandon S. Allbery KB8JRR/AA) (05/05/91)

As quoted from <1991May04.132158.17121@turnkey.tcc.com> by jackv@turnkey.tcc.com (Jack F. Vogel):
+---------------
| >  At what point does the kernel begin using the swap area on the next boot??
| 
| Sometime before coming up fully multiuser a script is run, /etc/dumpsave,
| which decides if there is a dump in the swap area and prompts you if you
| want to save it. You can save either to floppy or tape. If you choose
| floppy you must have enough formatted floppies to hold it (the size will 
| be approximately your real memory size). If you choose not to save, the
| kernel will begin using the swap device and the data will be overwritten.
+---------------

On SCO, it's /etc/sysdump and it tells you to run /etc/ldsysdump to read the
dump into a file for analysis.  Otherwise, it's the same.

+---------------
| > It would have been handy to be able to run something as root that
| >forces a panic, then reboot and analyze the dump while the system is still
| >reasonably reliable.
|  
| Again, I can't speak for SCO's implementation, but one way to do this
| given the AT&T standard is to have the kernel debugger linked into your
| kernel (see debugger(8)) and then if you want to force a dump, enter
+---------------

Alternatively, if all you want to do is look at running kernel information
without playing any games with it, try "/etc/crash /dev/kmem".  You don't even
have to panic the system (but you *will* if you try to change things).

+---------------
| Just for comparison, no commercial plug really intended this is just for
| an implementation example, AIX on the PS/2 allows you to create a dedicated
| dump partition where multiple dumps can be stored. This allows you to take
+---------------

Look at the value of DUMPDEV in the kernel configuration of any V7, SIII, or
SV.  This is nothing at all new....  The default, of course, is the same as
the swap device; you can, if you wish, set up a separate device and use that.

++Brandon
-- 
Me: Brandon S. Allbery			  Ham: KB8JRR/AA  10m,6m,2m,220,440,1.2
Internet: allbery@NCoast.ORG		       (restricted HF at present)
Delphi: ALLBERY				 AMPR: kb8jrr.AmPR.ORG [44.70.4.88]
uunet!usenet.ins.cwru.edu!ncoast!allbery       KB8JRR @ WA8BXN.OH

vandys@sequent.com (Andrew Valencia) (05/05/91)

mike@bria.UUCP (mike.stefanik) writes:
>Where is this mythical beast that SCO has given birth to?

	Hmmm, interesting.  If you've had bad experiences, then I'm
very sorry to hear about it.  My experiences with their products over the
years have been very good.  When you go try to put a system together, my
experience is that the Microport's, ESIX's, or vanilla AT&T releases just
can't be used as a problem-solving tool the way SCO's product can.  Lower
quality (yes, more crashes), less support, inferior documentation.

>There is never any good excuse for any operating system to be without
>the ability to dump itself when it crashes.  It is laziness, pure and simple.

	In the particular case of my comments, I noted that ESIX DOES
have crash dumping, though it isn't very elegant.  Could someone with the
latest SCO release let us know if they took it out?

	Speaking as a kernel developer, I can give you a better guess than
"laziness" for why certain things get left out (or even taken out).  It
usually ends up being time and quality trade-offs.  Would you rather have
crash dumping than, say, multi-screen with graphics?  Would you rather have
a really good crash dumping system with crashes once a week?  Or a mediocre
one with one crash a month?  Or one crash a year but you get your release three
months later?  Would you give it up entirely if your own "pet peeve" bug
could be fixed instead?  Now imagine one developer with several thousand
people pulling him in several thousand mutually exclusive directions.  That's
what it's like on the "other side."  Nobody sitting around and deciding not
to do anything for the next release out of laziness--not in my experience.

						Regards,
						Andy Valencia
						vandys@sequent.com

Disclaimer: I speak only for myself.

chap@art-sy.detroit.mi.us (j chapman flack) (05/13/91)

>| > [this was my original question]

In article <1991May4.232044.3487@NCoast.ORG> allbery@ncoast.ORG (Brandon S. Allbery KB8JRR/AA) writes:
>As quoted from <1991May04.132158.17121@turnkey.tcc.com> by jackv@turnkey.tcc.com (Jack F. Vogel):
>+---------------
>| > It would have been handy to be able to run something as root that
>| >forces a panic, then reboot and analyze the dump while the system is still
>| >reasonably reliable.
>|  
>| Again, I can't speak for SCO's implementation, but one way to do this
>| given the AT&T standard is to have the kernel debugger linked into your
>| kernel (see debugger(8)) and then if you want to force a dump, enter
>+---------------
>
>Alternatively, if all you want to do is look at running kernel information
>without playing any games with it, try "/etc/crash /dev/kmem".  

One person suggested using the kernel debugger; several suggested using
`crash' to look at the running system.  I don't have a development system
and I haven't found anything lying around that looks like a kernel debugger,
so I doubt that linking that in is an option for me.

I have, on occasion, used `crash' to look at the running system.  However,
I expect that the time I'll *really* want to look at things will be when
response time has suddenly gone to six minutes and rising, or the console
is being flooded with messages, or something else obnoxious is happening.
(I've had such experiences before, with other systems....)

What I want in a situation like that, if there's still any chance I can log
in as root and get one command executed, is one command that will simply
force a panic, like OPCCRASH did on the VAX 11/780 console.  After the
system is rebooted and stable (or I've taken the dump to a stable system)
...THEN I'll try to make sense of the dump.

>You don't even
>have to panic the system (but you *will* if you try to change things).

Hmm.  My man page for `crash' doesn't mention any way to change anything.
If it did, that would be just the ticket.  As I remember, OPCCRASH just
set the stack level indicator to the interrupt stack, put -1 into IP,
and resumed.  The deed was done....  Is there some undocumented way to
modify things with `crash'?
-- 
Chap Flack                         Their tanks will rust.  Our songs will last.
chap@art-sy.detroit.mi.us                                    -MIKHS 0EODWPAKHS

Nothing I say represents Appropriate Roles for Technology unless I say it does.

jackv@turnkey.tcc.com (Jack F. Vogel) (05/13/91)

In article <9105122137.aa00923@art-sy.detroit.mi.us> chap@art-sy.detroit.mi.us (j chapman flack) writes:
[ wants a way to force a system panic...]

>One person suggested using the kernel debugger; several suggested using
>`crash' to look at the running system.  I don't have a development system
>and I haven't found anything lying around that looks like a kernel debugger,
>so I doubt that linking that in is an option for me.

You don't need the development system, and the "debugger" is not some binary
that you would find "lying around". It should be an option in linking your
kernel. I don't know how far SCO varies from the AT&T standard, but if
you run 'kconfig' (or whatever SCO calls the kernel configurer program)
there should be an option to add facilities to the kernel, when you enter
that submenu one of the facilites you can add is the debugger. Then rebuild
a kernel and presto you have the debugger, you can drop into it at any
particular point by hitting <CTRL> <ALT> d, then enter the command:
sysdump. If SCO doesn't include this facility you should scream loudly :-}.

>Is there some undocumented way to
>modify things with `crash'?

NO. You could use adb on the running system but then 3.2 doesn't have
adb, oh well...

Disclaimer: I'm paid to fix bugs, not to speak for the company!

-- 
Jack F. Vogel			jackv@locus.com
AIX370 Technical Support	       - or -
Locus Computing Corp.		jackv@turnkey.TCC.COM

ni@hal.com (Nathaniel Ingersol) (05/14/91)

In article <1991May13.162909.20686@turnkey.tcc.com> jackv@turnkey.TCC.COM (Jack F. Vogel) writes:
:In article <9105122137.aa00923@art-sy.detroit.mi.us> chap@art-sy.detroit.mi.us (j chapman flack) writes:
:>[ wants a way to force a system panic...]

[...]

:kernel. I don't know how far SCO varies from the AT&T standard, but if
:you run 'kconfig' (or whatever SCO calls the kernel configurer program)
:there should be an option to add facilities to the kernel, when you enter
:that submenu one of the facilites you can add is the debugger. Then rebuild
:a kernel and presto you have the debugger, you can drop into it at any
:particular point by hitting <CTRL> <ALT> d, then enter the command:
:sysdump. If SCO doesn't include this facility you should scream loudly :-}.
:

Start screaming.
SCO will provide a kernel debugger to developers and so on who have an
"Engineering Support" contract or something like that, but otherwise
a kernel debugger is not part of the standard release.

:NO. You could use adb on the running system but then 3.2 doesn't have
:adb, oh well...

Then again, you can use /etc/_fst, which is a copy of adb that's used
for patching kernels...

dcon@cbnewsc.att.com (david.r.connet) (05/14/91)

In article <1991May13.162909.20686@turnkey.tcc.com> jackv@turnkey.TCC.COM (Jack F. Vogel) writes:
>In article <9105122137.aa00923@art-sy.detroit.mi.us> chap@art-sy.detroit.mi.us (j chapman flack) writes:
>[ wants a way to force a system panic...]
> 
>>One person suggested using the kernel debugger; several suggested using
>>`crash' to look at the running system.  I don't have a development system
>>and I haven't found anything lying around that looks like a kernel debugger,
>>so I doubt that linking that in is an option for me.
> 
>You don't need the development system, and the "debugger" is not some binary
>that you would find "lying around". It should be an option in linking your
>kernel. I don't know how far SCO varies from the AT&T standard, but if
>you run 'kconfig' (or whatever SCO calls the kernel configurer program)
>there should be an option to add facilities to the kernel, when you enter
>that submenu one of the facilites you can add is the debugger. Then rebuild
>a kernel and presto you have the debugger, you can drop into it at any
>particular point by hitting <CTRL> <ALT> d, then enter the command:
>sysdump. If SCO doesn't include this facility you should scream loudly :-}.
>
>>Is there some undocumented way to
>>modify things with `crash'?
>
>NO. You could use adb on the running system but then 3.2 doesn't have
>adb, oh well...
>
>Disclaimer: I'm paid to fix bugs, not to speak for the company!
>
>
>-- 
>Jack F. Vogel			jackv@locus.com
>AIX370 Technical Support	       - or -
>Locus Computing Corp.		jackv@turnkey.TCC.COM

The kernel debugger for AT&T (as far as I know) is not available
for the general public.  (I would assume a source license gets it,
but that type of stuff is out of my realm.)  If you did have it
though, you get into the same way as above.  (There is also a
call you can put into a program.  This is convenient when you are
running on an alternate console and don't have the keyboard to do
a ctrl-alt-d.)

The debugger gives you basically the same abilities as crash, though
in a very different syntax (I don't know crash's syntax).

Dave Connet
dcon@iwtng.att.com

marc@ekhomeni.austin.ibm.com (Marc Wiz) (05/14/91)

The following is not a flame.

I had the experience to work with the ATT kernel debugger for 3.2.
I found that most of the other developers where I used to work
thought it was rather user hostile.

It also lacked some features which (IMHO) I thought were badly
needed.  It is possible for someone to write their own kernel debugger
and link it in with the kernel.

Marc Wiz 				MaBell (512)823-4780

Yes that really is my last name.
The views expressed are my own.

marc@aixwiz.austin.ibm.com 
or
uunet!cs.utexas.edu!ibmchs!auschs!ekhomeni.austin.ibm.com!marc

john@jwt.UUCP (John Temples) (05/14/91)

In article <1991May13.204435.3138@cbnewsc.att.com> dcon@cbnewsc.att.com (david.r.connet) writes:
>The kernel debugger for AT&T (as far as I know) is not available
>for the general public.

It's there on ISC 2.0.2; it doesn't seem to be there on ESIX.

>The debugger gives you basically the same abilities as crash, though
>in a very different syntax (I don't know crash's syntax).

What useful things can be done with the debugger?  If I've got a
program that crashes the system, can the debugger help me find the
problem?

I only played with it briefly, but it looked like the debugger could be
a security hole.  You could bring up a debugger session without being
logged on, and probably poke a 0 into the appropriate place in your
uarea...
-- 
John W. Temples -- john@jwt.UUCP (uunet!jwt!john)

pjh@mccc.edu (Pete Holsberg) (05/14/91)

In article <1991May13.202745.10925@hal.com> ni@hal.com (Nathaniel Ingersol) writes:
=In article <1991May13.162909.20686@turnkey.tcc.com> jackv@turnkey.TCC.COM (Jack F. Vogel) writes:
=:NO. You could use adb on the running system but then 3.2 doesn't have
=:adb, oh well...
=
=Then again, you can use /etc/_fst, which is a copy of adb that's used
=for patching kernels...

???  Not present in AT&T SV R3.2.2.

Pete
-- 
Prof. Peter J. Holsberg      Mercer County Community College
Voice: 609-586-4800          Engineering Technology, Computers and Math
UUCP:...!princeton!mccc!pjh  1200 Old Trenton Road, Trenton, NJ 08690
Internet: pjh@mccc.edu	     Trenton Computer Festival -- 4/??-??/92

marc@ekhomeni.austin.ibm.com (Marc Wiz) (05/15/91)

 
> I only played with it briefly, but it looked like the debugger could be
> a security hole.  You could bring up a debugger session without being
> logged on, and probably poke a 0 into the appropriate place in your
> uarea...
>

Yes that is true that you can poke a 0 into the appropriate place.
However IMHO one of the biggest problems with the debugger was that you
only had addressability to the current process.

If you wanted to look around in another process' address space you were
in for an interesting time.  The debugger didn't have that capability.


Marc Wiz 				MaBell (512)823-4780

Yes that really is my last name.
The views expressed are my own.

marc@aixwiz.austin.ibm.com 
or
uunet!cs.utexas.edu!ibmchs!auschs!ekhomeni.austin.ibm.com!marc

dcon@cbnewsc.att.com (david.r.connet) (05/16/91)

In article <3909@d75.UUCP> marc@ekhomeni.austin.ibm.com (Marc Wiz) writes:
>> I only played with it briefly, but it looked like the debugger could be
>> a security hole.  You could bring up a debugger session without being
>> logged on, and probably poke a 0 into the appropriate place in your
>> uarea...
>Yes that is true that you can poke a 0 into the appropriate place.
>However IMHO one of the biggest problems with the debugger was that you
>only had addressability to the current process.
>
>If you wanted to look around in another process' address space you were
>in for an interesting time.  The debugger didn't have that capability.
>

With AT&Ts debugger, you can basically do anything you want to the system.
The only security you have is physical, i/o is done with the console.

marc@ekhomeni.austin.ibm.com (Marc Wiz) (05/17/91)

>
> >If you wanted to look around in another process' address space you were
> >in for an interesting time.  The debugger didn't have that capability.
> >
> 
> With AT&Ts debugger, you can basically do anything you want to the system.
> The only security you have is physical, i/o is done with the console.

Yes that is true.  But the 386 kernel debugger did not allow references
to another process.  You had to determine where in physical memory the
data was and then give that physical address to the debugger.

It would have been nice to be able to supply an extra parameter to the
debugger to specify which process you wanted to look at.

IMHO the kernel debugger should have had a few more commands to make it
useful.

Before I forget the debugger that I am discussing is the postfix debugger.


Marc Wiz 				MaBell (512)823-4780

Yes that really is my last name.
The views expressed are my own.

marc@aixwiz.austin.ibm.com 
or
uunet!cs.utexas.edu!ibmchs!auschs!ekhomeni.austin.ibm.com!marc

chap@art-sy.detroit.mi.us (j chapman flack) (05/22/91)

In article <1991May14.155746.19084@mccc.edu> pjh@mccc.edu (Pete Holsberg) writes:
>=Then again, you can use /etc/_fst, which is a copy of adb that's used
>
>???  Not present in AT&T SV R3.2.2.

I'm fairly certain it's a SCOism.  It is, quite simply, adb.  They provide it
so they have the option of annoucing patches to binaries and you can apply
them, but they put it in the /etc directory under a weird name so you won't
find it and do anything useful with it without paying for the development
system.  There is no documentation for it.  I just happened to see a patch
described in the printed release notes that involved using some program named
/etc/_fst that took a command syntax that reminded me very much of adb...

The non-development system also includes as, ld, and cpp, all tucked away in
strange places that wouldn't be in your path.  And that's just what I've
found so far.

I'd really like to know *how* they arrived at the name _fst though.  I
suppose this_is_not_adb wouldn't fit in 14 characters.  ;-)
-- 
Chap Flack                         Their tanks will rust.  Our songs will last.
chap@art-sy.detroit.mi.us                                    -MIKHS 0EODWPAKHS

Nothing I say represents Appropriate Roles for Technology unless I say it does.