det@hawkmoon.MN.ORG (Derek E. Terveer) (06/15/91)
My machine has been crashing at least once per day with the following panic which is preceded by a number of NOTICEs. I am not sure what the getcpages message indicates other than the obvious; that somewhere the kernel required at least one page of contiguous "memory". And because it was in swap (and swapchunk) when it died, i presume that the swapper paniced when it couldn't swap in? a page of memory from the swap device? I say "in" because I have 16MB of swap space allocated and i have never seen that dip below about 10MB available, even with X-windows running and lots of stuff going on. So, i find it hard to believe that Esix had trouble swapping out to the swap area on disk. That doesn't make much sense. But, i didn't have a lot of processes running; X-windows with some windows, vi, a compressed tape backup, news batching. I realize that 8MB of primary memory isn't a lot (and i intend to obtain more asap), but i kind of hoped that the system would simply get slower and page more of those 4k chunks in and out to accomplish its tasks instead of crashing. Am i deluded in this expectation? Also, on a related subject, if i am in x-windows these NOTICEs don't appear on any window that is visible from within X-windows and so i am unaware of the problems until after the panic and i have run crash after rebooting. Is there any way of getting an xterm to record the messages that go to the console? "Xterm -help" says that "xterm -C" is supposed to put the xterm into console mode, but "xterm -C" returns "bad command line option." I have tried various other possibilities like -c, -console, etc., with no luck. Sigh... NOTICE: getcpages - waiting for 1 contiguous pages total real mem = 7995392 total avail mem = 5586944 ESIX System 5.3.2 Rev.D NOTICE: getcpages - waiting for 1 contiguous pages NOTICE: getcpages - waiting for 1 contiguous pages NOTICE: getcpages - waiting for 1 contiguous pages NOTICE: getcpages - waiting for 1 contiguous pages NOTICE: getcpages - waiting for 1 contiguous pages NOTICE: getcpages - waiting for 1 contiguous pages PANIC: cr0 0xFFFFFFED cr2 0x0000000C cr3 0x00002000 tlb 0xFFFFF120 ss 0x00000001 uesp 0x00000001 efl 0x00010206 ipl 0x00000006 cs 0x00000158 eip 0xD007ED54 err 0x00000000 trap 0x0000000E eax 0x00000000 ecx 0xFFFFFFFF edx 0x00000021 ebx 0x00000001 esp 0xE0000F00 ebp 0xE0000F30 esi 0x00000000 edi 0xD023AA00 ds 0x00000160 es 0x00000160 fs 0x00000000 gs 0x00000000 PANIC: Kernel mode trap. Type 0x0000000E Trying to dump 1952 Pages .......... Panic String: Kernel mode trap. Type 0x%x Kernel Trap. Kernel Registers saved at e0000ed0 ERR=0, TRAPNO=14 cs:eip=0158:d007ed54 Flags=10206 ds = 0160 es = 0160 fs = 0000 gs = 0000 esi= 00000000 edi= d023aa00 ebp= e0000f30 esp= e0000f00 eax= 00000000 ebx= 00000001 ecx= ffffffff edx= 00000021 Kernel Stack before Trap: STKADDR FRAMEPTR FUNCTION POSSIBLE ARGUMENTS e0000f00 e0000f30 swap (d023aa00,1,0,0) e0000f38 e0000f6c swapchun (d01fbad8,0,c03fe020,d01fbad8) e0000f74 e0000f84 addspg (d01fbad8,c03fe020,0,0) e0000f8c e0000fe0 getpages (d01fbad8,0,0,0) e0000fe8 e0000ff8 vhand (43b95c,0,0,0) -- Derek "Tigger" Terveer det@hawkmoon.MN.ORG -- U of MN Women's Lax I am the way and the truth and the light, I know all the answers; don't need your advice. -- "I am the way and the truth and the light" -- The Legendary Pink Dots
jackv@turnkey.tcc.com (Jack F. Vogel) (06/16/91)
In article <1991Jun14.202957.1408@hawkmoon.MN.ORG> det@hawkmoon.MN.ORG (Derek E. Terveer) writes: |My machine has been crashing at least once per day with the following panic |which is preceded by a number of NOTICEs. I am not sure what the getcpages |message indicates other than the obvious; that somewhere the kernel required |at least one page of contiguous "memory". And because it was in swap (and |swapchunk) when it died, i presume that the swapper paniced when it couldn't |swap in? a page of memory from the swap device? I say "in" because I have 16MB |of swap space allocated and i have never seen that dip below about 10MB |available, even with X-windows running and lots of stuff going on. So, i find |it hard to believe that Esix had trouble swapping out to the swap area on disk. |Kernel Stack before Trap: |STKADDR FRAMEPTR FUNCTION POSSIBLE ARGUMENTS |e0000f00 e0000f30 swap (d023aa00,1,0,0) |e0000f38 e0000f6c swapchun (d01fbad8,0,c03fe020,d01fbad8) |e0000f74 e0000f84 addspg (d01fbad8,c03fe020,0,0) |e0000f8c e0000fe0 getpages (d01fbad8,0,0,0) |e0000fe8 e0000ff8 vhand (43b95c,0,0,0) Sorry, you are wrong. This stack trace shows that it is the pager "vhand" that is running, and what it is doing is stealing unreferenced pages and sending them out to the swap device. The pager and swapper share certain routines like getpages(), swapchunk(), and swap(). I am not familiar with the SVR3.2 source but AIX is SVR2 based and it has these routines as well. I have never heard of the routine getcpages() nor addspg(), so I am not sure what these do. The way these routines work within the AIX kernel is that getpages() scans page tables for stealable pages, it will accumulate a "chunk" of contiguous pte's, then swapchunk() is called to allocate space on the swap device, the blocks on the swap device MUST be contiguous due to the way that swap() works. Finally, swap() is the routine where the actual I/O is done by setting up the physio buffers and then calling the stragegy routine for the device. What is odd is the NOTICE seems to indicate that the pager can't get even a single page on the swap device, and generally if you were really out of space I would think the kernel would kill certain processes. Also, the panic you actually got was a kernel page fault, looks kind of like swap() had a bogus buffer pointer or something. All in all, I would say you need to talk to the Esix support folks, where somebody can check the source and tell you exactly what's going on. I don't believe running out of swap is your problem here, and if it were I don't think it should cause a panic. Good Luck! Disclaimer: I'm paid to hack the kernel, not to speak for the company. -- Jack F. Vogel jackv@locus.com AIX370 Technical Support - or - Locus Computing Corp. jackv@turnkey.TCC.COM
bill@unixland.natick.ma.us (Bill Heiser) (06/16/91)
In article <1991Jun15.205327.10904@turnkey.tcc.com> jackv@turnkey.TCC.COM (Jack F. Vogel) writes: >In article <1991Jun14.202957.1408@hawkmoon.MN.ORG> det@hawkmoon.MN.ORG (Derek E. Terveer) writes: >|My machine has been crashing at least once per day with the following panic >|which is preceded by a number of NOTICEs. I am not sure what the getcpages >|message indicates other than the obvious; that somewhere the kernel required >|at least one page of contiguous "memory". And because it was in swap (and >Sorry, you are wrong. This stack trace shows that it is the pager "vhand" >that is running, and what it is doing is stealing unreferenced pages and I'm seeing these "can't get x contiguous pages" too. The machine hasn't been panicing as described above ... but I've been seeing some strange hangs lately -- like only the past couple of weeks! One symptom I've seen a couple of times is that one of the modems stops showing DSR (I think that's the one, forget offhand; in any case, the getting can't be killed and the port is INOP. If I try a second time to kill the getty, the machine hangs rock solid right after hitting return on the kill -9 command. It requires a cold system reset. This is what "crash" shows right now: # crash dumpfile = /dev/mem, namelist = /unix, outfile = stdout > panic System Messages: total real mem = 7995392 total avail mem = 5361664 ESIX System 5.3.2 Rev.D Copyright (c) 1984, 1986, 1987, 1988 AT&T Copyright (c) 1987, 1988 Microsoft Corp. Copyright (c) 1988, 1989, 1990 Everex Systems Inc. All Rights Reserved FAS 2.08.0 async driver: Unit 0-5 init state is [******] WARNING: Excessive modem status interrupts on FAS unit 3 (check the cabling). NOTICE: getcpages - waiting for 1 contiguous pages No Panic > quit script done on Sat Jun 15 22:12:35 1991 The "modem status interrupts" happened after one of the serial ports hung and I cycled power on the modem (this was a different scenario than that described above) -- the modem reset cleared this particular problem. I have also seen the modem staus messages when powering on a Microterm 5510 terminal I have configured on the system with a null modem adapter (the terminal is acting up, I have to poower cycle it sveral times to get it to come up). (Sigh, so many problems :-( p.s. I've not seen the getcpages on the console. Only crash shows them. This is bad. Having to cold reset the machine usually requires booting from floppy to get the fs usable again. -- bill@unixland.natick.ma.us ...!uunet!think!unixland!bill OR ..!uunet!world!unixland!bill heiser@world.std.com Public Access Unix 508-655-3848(2400) 508-651-8723(HST) 508-651-8733(PEP-V32)
pim@cti-software.nl (Pim Zandbergen) (06/16/91)
bill@unixland.natick.ma.us (Bill Heiser) writes: >p.s. I've not seen the getcpages on the console. Only crash >shows them. The getcpages notice does not show on the console. It does in /dev/osm after you have configured Operating System Messages into the kernel. Same thing on the AT&T 3B2, so I suspect this is normal behaviour for all true System V Release 3 ports. I often get this notice when using huge amounts of memory, like while running gcc and X11 at the same time. Nothing seems to go wrong, though. Here's an explanation of getcpages I got from someone at AT&T: ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ getcpages is the name of a function in the operating system. This function gets physically contiguous pages for a process to run. The paging daemon, vhand, is responsible for freeing up the memory as the need arises. In a busy system with limited memory and swap space and with several large processes, when the system is unable to get a contiguous page for a process the NOTICE is printed on the console. In the chapter on Performance Management of the 3B2 System Administrator's Guide there is a section on Tunable Parameters. The paging parameters in that section describe the operation of the paging daemon vhand. In the system that exhibits the problem, increase swap space and modify GPGSLO and GPGSHI as necessary. ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ -- Pim Zandbergen domain : pim@cti-software.nl CTI Software BV uucp : uunet!mcsun!hp4nl!ctisbv!pim Laan Copes van Cattenburch 70 phone : +31 70 3542302 2585 GD The Hague, The Netherlands fax : +31 70 3512837
bill@unixland.natick.ma.us (Bill Heiser) (06/18/91)
In article <1991Jun16.123322.10406@cti-software.nl> pim@cti-software.nl (Pim Zandbergen) writes: > >The getcpages notice does not show on the console. It does >in /dev/osm after you have configured Operating System Messages >into the kernel. What's /dev/osm and "Operating System Messages?" I don't think ESIX has such a thing to be configured into the kernel. Anyone out there know about this? -- bill@unixland.natick.ma.us ...!uunet!think!unixland!bill OR ..!uunet!world!unixland!bill heiser@world.std.com Public Access Unix 508-655-3848(2400) 508-651-8723(HST) 508-651-8733(PEP-V32)
jackv@turnkey.tcc.com (Jack F. Vogel) (06/18/91)
In article <1991Jun18.002018.1899@unixland.natick.ma.us> bill@unixland.natick.ma.us (Bill Heiser) writes: > >What's /dev/osm and "Operating System Messages?" > >I don't think ESIX has such a thing to be configured into the kernel. >Anyone out there know about this? "Operating System Messages" is a facility that can be installed in your kernel. It is basically a special set of kernel printf()'s, I am not sure about its implementation in SvR3, but in the AIX kernel these are called 'ncprintf()'s. The idea is that these message strings are not displayed on the console, rather they are written to a kernel internal circular data buffer (called osmbuf in AIX anyway). These are messages that are generally more technical than the average administrator would probably care about, but at times particularly when debugging a problem they might be useful. /dev/osm is the entry point to allow you to access that kernel buffer, you would typically 'cat /dev/osm' to see its current contents. At least with AIX if you have syslogd running it also writes that content into /usr/adm/messages I believe. I can't say for Esix but ISC does have the OSM option, it is one of the facilities that can be installed when running kconfig. And there is a /dev/osm on my system, although I don't have the facility installed. Disclaimer: I'm a kernel hacker, not a company spokesweenie :-}. -- Jack F. Vogel jackv@locus.com AIX370 Technical Support - or - Locus Computing Corp. jackv@turnkey.TCC.COM