chap@art-sy.detroit.mi.us (j chapman flack) (05/04/91)
In article <450@bartal.BARTAL.COM> phillip@BARTAL.COM (Phillip M. Vogel) writes: >When the kernel dumps core, it puts the core dump into the swap >area ON THE PRIMARY DISK. Well, 8 megs of core dump into 5 megs This reminded me of questions I've been meaning to ask. I never knew where the kernel core dump goes in a panic (and so far I've had no opportunity to find out....). This posting suggests it goes in the swap area, but that brings up an immediate question: At what point does the kernel begin using the swap area on the next boot?? How am I able to use `crash' to examine the core dump before the evidence is overwritten? Or does something check for the presence of a core dump in the swap area at boot time and copy it to a file for later examination? In that case, my first question is back: Where does it go? Here's another question: when I first installed this system, it would constantly overflow the kernel file and inode tables, causing all sorts of programs to fail unpredictably. At the time, I didn't know if the default table sizes were preposterous or if some runaway bug was filling the tables. It would have been handy to be able to run something as root that forces a panic, then reboot and analyze the dump while the system is still reasonably reliable. Sort of like running OPCCRASH from the console on a VAX. Does anybody have a panic-forcing program? This is SCO SysV 3.2. (Btw, it turned out the table sizes were preposterous. They came out of the box ready to accommodate about as many files and inodes as the daemons have open before I log in....) Thanks! -- Chap Flack Their tanks will rust. Our songs will last. chap@art-sy.detroit.mi.us -Mikos Theodorakis Nothing I say represents Appropriate Roles for Technology unless I say it does.
jackv@turnkey.tcc.com (Jack F. Vogel) (05/04/91)
In article <9105031411.aa04050@art-sy.detroit.mi.us> chap@art-sy.detroit.mi.us (j chapman flack) writes: >This reminded me of questions I've been meaning to ask. I never knew where >the kernel core dump goes in a panic (and so far I've had no opportunity to >find out....). This posting suggests it goes in the swap area, but that >brings up an immediate question: Initial point: you say you are running SCO, since I run ISC what follows may or may not apply. I am not sure how close the two systems are in handling panic dumps... > At what point does the kernel begin using the swap area on the next boot?? Sometime before coming up fully multiuser a script is run, /etc/dumpsave, which decides if there is a dump in the swap area and prompts you if you want to save it. You can save either to floppy or tape. If you choose floppy you must have enough formatted floppies to hold it (the size will be approximately your real memory size). If you choose not to save, the kernel will begin using the swap device and the data will be overwritten. > How am I able to use `crash' to examine the core dump before the evidence > is overwritten? Since the swap device is used, there is no way to try to examine the dump prior to saving and reloading it. > It would have been handy to be able to run something as root that >forces a panic, then reboot and analyze the dump while the system is still >reasonably reliable. Again, I can't speak for SCO's implementation, but one way to do this given the AT&T standard is to have the kernel debugger linked into your kernel (see debugger(8)) and then if you want to force a dump, enter the debugger and give a 'sysdump' command. I must emphasize that I have never done this, and given that it dumps all over your swap space I would presume this is fatal to the running system :-} :-}. Or one could just use the debugger to branch the kernel to panic(). The ISC docs have a cryptic 'BUGS' statement about 'sysdump' sometimes not working without any details, perhaps someone at Interactive could comment?? But, in theory at least, this should do what you want. Just for comparison, no commercial plug really intended this is just for an implementation example, AIX on the PS/2 allows you to create a dedicated dump partition where multiple dumps can be stored. This allows you to take "running dumps" of the system which can then be examined with crash at your leisure. Alternately, you can configure the kernel for the dump device to be the floppy, then if you ever panic the system will stop and prompt you if you want to save the dump to insert the floppy, etc... From a serious service point of view this functionality is essential. Of course, this consumes disk space, but its an option if you so desire. Oh yes, there is also a key sequence on the console that lets you force either a running dump or a panic at any particular time. Disclaimer: I don't speak for the company! -- Jack F. Vogel jackv@locus.com AIX370 Technical Support - or - Locus Computing Corp. jackv@turnkey.TCC.COM
vandys@sequent.com (Andrew Valencia) (05/04/91)
chap@art-sy.detroit.mi.us (j chapman flack) writes: >This reminded me of questions I've been meaning to ask. I never knew where >the kernel core dump goes in a panic (and so far I've had no opportunity to >find out....). This is the heart of the matter. IMHO, SCO has done a pretty good job of hammering their product into a state where it just runs and runs and runs with little ado. If you had a choice between a system that had a very nice and powerful crash dumping and analysis system, and one that simply didn't crash in the first place, which would you pick? > At what point does the kernel begin using the swap area on the next boot?? > How am I able to use `crash' to examine the core dump before the evidence > is overwritten? The rest of these comments are from my ESIX system. As you boot up the script /etc/dumpsave is called and goes about copying the crash dump to another place. It is invoked by /etc/bcheckrc if fsstat on the root device indicates that the root filesystem needs cleaning (which indicates some sort of crash in the first place). I usually see this message after a powerfail, so it's offering to save a dump that doesn't even exist. Oh, well. /etc/dumpsave is kind of a crock. It's hard-coded to dump to some sort of floppy/tape device. I guess they didn't want to deal with getting the other filesystems mounted first. There'd be a definite danger there, as fsck could well scribble on the swap area. Finally, they give you /etc/ldsysdump to copy these same floppies back into the filesystem. You run this after you get your system back up and have clean, mounted filesystems to put the crash in. >It would have been handy to be able to run something as root that >forces a panic, then reboot and analyze the dump while the system is still >reasonably reliable. Another strategy would be to run /etc/crash in one window and then switch to another and run your programs. When things get bad, switch back and look around on your running kernel. By having /etc/crash already running, your inode, etc. shortages shouldn't keep you from looking. Just an idea. (In case you hadn't tried this, running crash without arguments makes it run on /unix and /dev/mem, which means you're looking at the state of your running system.) Regards, Andy Valencia vandys@sequent.com Disclaimer: these are just my opinions, one and all.
mike@bria.UUCP (mike.stefanik) (05/05/91)
In an article, vandys@sequent.com (Andrew Valencia) writes: |This is the heart of the matter. IMHO, SCO has done a pretty good job |of hammering their product into a state where it just runs and runs and runs |with little ado. If you had a choice between a system that had a very nice and |powerful crash dumping and analysis system, and one that simply didn't crash in |the first place, which would you pick? Where is this mythical beast that SCO has given birth to? Yes, the one that "simply doesn't crash in the first place"? Could you please enlighten me as to what unique version of the operating system that you are using? There is never any good excuse for any operating system to be without the ability to dump itself when it crashes. It is laziness, pure and simple. -- Michael Stefanik, MGI Inc, Los Angeles | Opinions stated are never realistic Title of the week: Systems Engineer | UUCP: ...!uunet!bria!mike ------------------------------------------------------------------------------- If MS-DOS didn't exist, who would UNIX programmers have to make fun of?
allbery@NCoast.ORG (Brandon S. Allbery KB8JRR/AA) (05/05/91)
As quoted from <1991May04.132158.17121@turnkey.tcc.com> by jackv@turnkey.tcc.com (Jack F. Vogel): +--------------- | > At what point does the kernel begin using the swap area on the next boot?? | | Sometime before coming up fully multiuser a script is run, /etc/dumpsave, | which decides if there is a dump in the swap area and prompts you if you | want to save it. You can save either to floppy or tape. If you choose | floppy you must have enough formatted floppies to hold it (the size will | be approximately your real memory size). If you choose not to save, the | kernel will begin using the swap device and the data will be overwritten. +--------------- On SCO, it's /etc/sysdump and it tells you to run /etc/ldsysdump to read the dump into a file for analysis. Otherwise, it's the same. +--------------- | > It would have been handy to be able to run something as root that | >forces a panic, then reboot and analyze the dump while the system is still | >reasonably reliable. | | Again, I can't speak for SCO's implementation, but one way to do this | given the AT&T standard is to have the kernel debugger linked into your | kernel (see debugger(8)) and then if you want to force a dump, enter +--------------- Alternatively, if all you want to do is look at running kernel information without playing any games with it, try "/etc/crash /dev/kmem". You don't even have to panic the system (but you *will* if you try to change things). +--------------- | Just for comparison, no commercial plug really intended this is just for | an implementation example, AIX on the PS/2 allows you to create a dedicated | dump partition where multiple dumps can be stored. This allows you to take +--------------- Look at the value of DUMPDEV in the kernel configuration of any V7, SIII, or SV. This is nothing at all new.... The default, of course, is the same as the swap device; you can, if you wish, set up a separate device and use that. ++Brandon -- Me: Brandon S. Allbery Ham: KB8JRR/AA 10m,6m,2m,220,440,1.2 Internet: allbery@NCoast.ORG (restricted HF at present) Delphi: ALLBERY AMPR: kb8jrr.AmPR.ORG [44.70.4.88] uunet!usenet.ins.cwru.edu!ncoast!allbery KB8JRR @ WA8BXN.OH
vandys@sequent.com (Andrew Valencia) (05/05/91)
mike@bria.UUCP (mike.stefanik) writes: >Where is this mythical beast that SCO has given birth to? Hmmm, interesting. If you've had bad experiences, then I'm very sorry to hear about it. My experiences with their products over the years have been very good. When you go try to put a system together, my experience is that the Microport's, ESIX's, or vanilla AT&T releases just can't be used as a problem-solving tool the way SCO's product can. Lower quality (yes, more crashes), less support, inferior documentation. >There is never any good excuse for any operating system to be without >the ability to dump itself when it crashes. It is laziness, pure and simple. In the particular case of my comments, I noted that ESIX DOES have crash dumping, though it isn't very elegant. Could someone with the latest SCO release let us know if they took it out? Speaking as a kernel developer, I can give you a better guess than "laziness" for why certain things get left out (or even taken out). It usually ends up being time and quality trade-offs. Would you rather have crash dumping than, say, multi-screen with graphics? Would you rather have a really good crash dumping system with crashes once a week? Or a mediocre one with one crash a month? Or one crash a year but you get your release three months later? Would you give it up entirely if your own "pet peeve" bug could be fixed instead? Now imagine one developer with several thousand people pulling him in several thousand mutually exclusive directions. That's what it's like on the "other side." Nobody sitting around and deciding not to do anything for the next release out of laziness--not in my experience. Regards, Andy Valencia vandys@sequent.com Disclaimer: I speak only for myself.
chap@art-sy.detroit.mi.us (j chapman flack) (05/13/91)
>| > [this was my original question] In article <1991May4.232044.3487@NCoast.ORG> allbery@ncoast.ORG (Brandon S. Allbery KB8JRR/AA) writes: >As quoted from <1991May04.132158.17121@turnkey.tcc.com> by jackv@turnkey.tcc.com (Jack F. Vogel): >+--------------- >| > It would have been handy to be able to run something as root that >| >forces a panic, then reboot and analyze the dump while the system is still >| >reasonably reliable. >| >| Again, I can't speak for SCO's implementation, but one way to do this >| given the AT&T standard is to have the kernel debugger linked into your >| kernel (see debugger(8)) and then if you want to force a dump, enter >+--------------- > >Alternatively, if all you want to do is look at running kernel information >without playing any games with it, try "/etc/crash /dev/kmem". One person suggested using the kernel debugger; several suggested using `crash' to look at the running system. I don't have a development system and I haven't found anything lying around that looks like a kernel debugger, so I doubt that linking that in is an option for me. I have, on occasion, used `crash' to look at the running system. However, I expect that the time I'll *really* want to look at things will be when response time has suddenly gone to six minutes and rising, or the console is being flooded with messages, or something else obnoxious is happening. (I've had such experiences before, with other systems....) What I want in a situation like that, if there's still any chance I can log in as root and get one command executed, is one command that will simply force a panic, like OPCCRASH did on the VAX 11/780 console. After the system is rebooted and stable (or I've taken the dump to a stable system) ...THEN I'll try to make sense of the dump. >You don't even >have to panic the system (but you *will* if you try to change things). Hmm. My man page for `crash' doesn't mention any way to change anything. If it did, that would be just the ticket. As I remember, OPCCRASH just set the stack level indicator to the interrupt stack, put -1 into IP, and resumed. The deed was done.... Is there some undocumented way to modify things with `crash'? -- Chap Flack Their tanks will rust. Our songs will last. chap@art-sy.detroit.mi.us -MIKHS 0EODWPAKHS Nothing I say represents Appropriate Roles for Technology unless I say it does.
jackv@turnkey.tcc.com (Jack F. Vogel) (05/13/91)
In article <9105122137.aa00923@art-sy.detroit.mi.us> chap@art-sy.detroit.mi.us (j chapman flack) writes: [ wants a way to force a system panic...] >One person suggested using the kernel debugger; several suggested using >`crash' to look at the running system. I don't have a development system >and I haven't found anything lying around that looks like a kernel debugger, >so I doubt that linking that in is an option for me. You don't need the development system, and the "debugger" is not some binary that you would find "lying around". It should be an option in linking your kernel. I don't know how far SCO varies from the AT&T standard, but if you run 'kconfig' (or whatever SCO calls the kernel configurer program) there should be an option to add facilities to the kernel, when you enter that submenu one of the facilites you can add is the debugger. Then rebuild a kernel and presto you have the debugger, you can drop into it at any particular point by hitting <CTRL> <ALT> d, then enter the command: sysdump. If SCO doesn't include this facility you should scream loudly :-}. >Is there some undocumented way to >modify things with `crash'? NO. You could use adb on the running system but then 3.2 doesn't have adb, oh well... Disclaimer: I'm paid to fix bugs, not to speak for the company! -- Jack F. Vogel jackv@locus.com AIX370 Technical Support - or - Locus Computing Corp. jackv@turnkey.TCC.COM
ni@hal.com (Nathaniel Ingersol) (05/14/91)
In article <1991May13.162909.20686@turnkey.tcc.com> jackv@turnkey.TCC.COM (Jack F. Vogel) writes: :In article <9105122137.aa00923@art-sy.detroit.mi.us> chap@art-sy.detroit.mi.us (j chapman flack) writes: :>[ wants a way to force a system panic...] [...] :kernel. I don't know how far SCO varies from the AT&T standard, but if :you run 'kconfig' (or whatever SCO calls the kernel configurer program) :there should be an option to add facilities to the kernel, when you enter :that submenu one of the facilites you can add is the debugger. Then rebuild :a kernel and presto you have the debugger, you can drop into it at any :particular point by hitting <CTRL> <ALT> d, then enter the command: :sysdump. If SCO doesn't include this facility you should scream loudly :-}. : Start screaming. SCO will provide a kernel debugger to developers and so on who have an "Engineering Support" contract or something like that, but otherwise a kernel debugger is not part of the standard release. :NO. You could use adb on the running system but then 3.2 doesn't have :adb, oh well... Then again, you can use /etc/_fst, which is a copy of adb that's used for patching kernels...
dcon@cbnewsc.att.com (david.r.connet) (05/14/91)
In article <1991May13.162909.20686@turnkey.tcc.com> jackv@turnkey.TCC.COM (Jack F. Vogel) writes: >In article <9105122137.aa00923@art-sy.detroit.mi.us> chap@art-sy.detroit.mi.us (j chapman flack) writes: >[ wants a way to force a system panic...] > >>One person suggested using the kernel debugger; several suggested using >>`crash' to look at the running system. I don't have a development system >>and I haven't found anything lying around that looks like a kernel debugger, >>so I doubt that linking that in is an option for me. > >You don't need the development system, and the "debugger" is not some binary >that you would find "lying around". It should be an option in linking your >kernel. I don't know how far SCO varies from the AT&T standard, but if >you run 'kconfig' (or whatever SCO calls the kernel configurer program) >there should be an option to add facilities to the kernel, when you enter >that submenu one of the facilites you can add is the debugger. Then rebuild >a kernel and presto you have the debugger, you can drop into it at any >particular point by hitting <CTRL> <ALT> d, then enter the command: >sysdump. If SCO doesn't include this facility you should scream loudly :-}. > >>Is there some undocumented way to >>modify things with `crash'? > >NO. You could use adb on the running system but then 3.2 doesn't have >adb, oh well... > >Disclaimer: I'm paid to fix bugs, not to speak for the company! > > >-- >Jack F. Vogel jackv@locus.com >AIX370 Technical Support - or - >Locus Computing Corp. jackv@turnkey.TCC.COM The kernel debugger for AT&T (as far as I know) is not available for the general public. (I would assume a source license gets it, but that type of stuff is out of my realm.) If you did have it though, you get into the same way as above. (There is also a call you can put into a program. This is convenient when you are running on an alternate console and don't have the keyboard to do a ctrl-alt-d.) The debugger gives you basically the same abilities as crash, though in a very different syntax (I don't know crash's syntax). Dave Connet dcon@iwtng.att.com
marc@ekhomeni.austin.ibm.com (Marc Wiz) (05/14/91)
The following is not a flame. I had the experience to work with the ATT kernel debugger for 3.2. I found that most of the other developers where I used to work thought it was rather user hostile. It also lacked some features which (IMHO) I thought were badly needed. It is possible for someone to write their own kernel debugger and link it in with the kernel. Marc Wiz MaBell (512)823-4780 Yes that really is my last name. The views expressed are my own. marc@aixwiz.austin.ibm.com or uunet!cs.utexas.edu!ibmchs!auschs!ekhomeni.austin.ibm.com!marc
john@jwt.UUCP (John Temples) (05/14/91)
In article <1991May13.204435.3138@cbnewsc.att.com> dcon@cbnewsc.att.com (david.r.connet) writes: >The kernel debugger for AT&T (as far as I know) is not available >for the general public. It's there on ISC 2.0.2; it doesn't seem to be there on ESIX. >The debugger gives you basically the same abilities as crash, though >in a very different syntax (I don't know crash's syntax). What useful things can be done with the debugger? If I've got a program that crashes the system, can the debugger help me find the problem? I only played with it briefly, but it looked like the debugger could be a security hole. You could bring up a debugger session without being logged on, and probably poke a 0 into the appropriate place in your uarea... -- John W. Temples -- john@jwt.UUCP (uunet!jwt!john)
pjh@mccc.edu (Pete Holsberg) (05/14/91)
In article <1991May13.202745.10925@hal.com> ni@hal.com (Nathaniel Ingersol) writes: =In article <1991May13.162909.20686@turnkey.tcc.com> jackv@turnkey.TCC.COM (Jack F. Vogel) writes: =:NO. You could use adb on the running system but then 3.2 doesn't have =:adb, oh well... = =Then again, you can use /etc/_fst, which is a copy of adb that's used =for patching kernels... ??? Not present in AT&T SV R3.2.2. Pete -- Prof. Peter J. Holsberg Mercer County Community College Voice: 609-586-4800 Engineering Technology, Computers and Math UUCP:...!princeton!mccc!pjh 1200 Old Trenton Road, Trenton, NJ 08690 Internet: pjh@mccc.edu Trenton Computer Festival -- 4/??-??/92
marc@ekhomeni.austin.ibm.com (Marc Wiz) (05/15/91)
> I only played with it briefly, but it looked like the debugger could be > a security hole. You could bring up a debugger session without being > logged on, and probably poke a 0 into the appropriate place in your > uarea... > Yes that is true that you can poke a 0 into the appropriate place. However IMHO one of the biggest problems with the debugger was that you only had addressability to the current process. If you wanted to look around in another process' address space you were in for an interesting time. The debugger didn't have that capability. Marc Wiz MaBell (512)823-4780 Yes that really is my last name. The views expressed are my own. marc@aixwiz.austin.ibm.com or uunet!cs.utexas.edu!ibmchs!auschs!ekhomeni.austin.ibm.com!marc
dcon@cbnewsc.att.com (david.r.connet) (05/16/91)
In article <3909@d75.UUCP> marc@ekhomeni.austin.ibm.com (Marc Wiz) writes: >> I only played with it briefly, but it looked like the debugger could be >> a security hole. You could bring up a debugger session without being >> logged on, and probably poke a 0 into the appropriate place in your >> uarea... >Yes that is true that you can poke a 0 into the appropriate place. >However IMHO one of the biggest problems with the debugger was that you >only had addressability to the current process. > >If you wanted to look around in another process' address space you were >in for an interesting time. The debugger didn't have that capability. > With AT&Ts debugger, you can basically do anything you want to the system. The only security you have is physical, i/o is done with the console.
marc@ekhomeni.austin.ibm.com (Marc Wiz) (05/17/91)
> > >If you wanted to look around in another process' address space you were > >in for an interesting time. The debugger didn't have that capability. > > > > With AT&Ts debugger, you can basically do anything you want to the system. > The only security you have is physical, i/o is done with the console. Yes that is true. But the 386 kernel debugger did not allow references to another process. You had to determine where in physical memory the data was and then give that physical address to the debugger. It would have been nice to be able to supply an extra parameter to the debugger to specify which process you wanted to look at. IMHO the kernel debugger should have had a few more commands to make it useful. Before I forget the debugger that I am discussing is the postfix debugger. Marc Wiz MaBell (512)823-4780 Yes that really is my last name. The views expressed are my own. marc@aixwiz.austin.ibm.com or uunet!cs.utexas.edu!ibmchs!auschs!ekhomeni.austin.ibm.com!marc
chap@art-sy.detroit.mi.us (j chapman flack) (05/22/91)
In article <1991May14.155746.19084@mccc.edu> pjh@mccc.edu (Pete Holsberg) writes: >=Then again, you can use /etc/_fst, which is a copy of adb that's used > >??? Not present in AT&T SV R3.2.2. I'm fairly certain it's a SCOism. It is, quite simply, adb. They provide it so they have the option of annoucing patches to binaries and you can apply them, but they put it in the /etc directory under a weird name so you won't find it and do anything useful with it without paying for the development system. There is no documentation for it. I just happened to see a patch described in the printed release notes that involved using some program named /etc/_fst that took a command syntax that reminded me very much of adb... The non-development system also includes as, ld, and cpp, all tucked away in strange places that wouldn't be in your path. And that's just what I've found so far. I'd really like to know *how* they arrived at the name _fst though. I suppose this_is_not_adb wouldn't fit in 14 characters. ;-) -- Chap Flack Their tanks will rust. Our songs will last. chap@art-sy.detroit.mi.us -MIKHS 0EODWPAKHS Nothing I say represents Appropriate Roles for Technology unless I say it does.