[comp.unix.sysv386] NOTICE: getcpages - waiting for 1 contiguous pages

det@hawkmoon.MN.ORG (Derek E. Terveer) (06/15/91)

My machine has been crashing at least once per day with the following panic
which is preceded by a number of NOTICEs.  I am not sure what the getcpages
message indicates other than the obvious; that somewhere the kernel required
at least one page of contiguous "memory".  And because it was in swap (and
swapchunk) when it died, i presume that the swapper paniced when it couldn't
swap in? a page of memory from the swap device?  I say "in" because I have 16MB
of swap space allocated and i have never seen that dip below about 10MB
available, even with X-windows running and lots of stuff going on.  So, i find
it hard to believe that Esix had trouble swapping out to the swap area on disk.
That doesn't make much sense.  But, i didn't have a lot of processes running;
X-windows with some windows, vi, a compressed tape backup, news batching.  I
realize that 8MB of primary memory isn't a lot (and i intend to obtain more
asap), but i kind of hoped that the system would simply get slower and page
more of those 4k chunks in and out to accomplish its tasks instead of crashing.
Am i deluded in this expectation?

Also, on a related subject, if i am in x-windows these NOTICEs don't appear on
any window that is visible from within X-windows and so i am unaware of the
problems until after the panic and i have run crash after rebooting.  Is there
any way of getting an xterm to record the messages that go to the console?
"Xterm -help" says that "xterm -C" is supposed to put the xterm into console
mode, but "xterm -C" returns "bad command line option."  I have tried various
other possibilities like -c, -console, etc., with no luck.  Sigh...


NOTICE: getcpages - waiting for 1 contiguous pages

total real mem  = 7995392
total avail mem = 5586944

ESIX System 5.3.2 Rev.D

NOTICE: getcpages - waiting for 1 contiguous pages

NOTICE: getcpages - waiting for 1 contiguous pages

NOTICE: getcpages - waiting for 1 contiguous pages

NOTICE: getcpages - waiting for 1 contiguous pages

NOTICE: getcpages - waiting for 1 contiguous pages

NOTICE: getcpages - waiting for 1 contiguous pages
PANIC:
cr0 0xFFFFFFED     cr2  0x0000000C     cr3 0x00002000     tlb  0xFFFFF120
ss  0x00000001     uesp 0x00000001     efl 0x00010206     ipl  0x00000006
cs  0x00000158     eip  0xD007ED54     err 0x00000000     trap 0x0000000E
eax 0x00000000     ecx  0xFFFFFFFF     edx 0x00000021     ebx  0x00000001
esp 0xE0000F00     ebp  0xE0000F30     esi 0x00000000     edi  0xD023AA00
ds  0x00000160     es   0x00000160     fs  0x00000000     gs   0x00000000

PANIC: Kernel mode trap. Type 0x0000000E
Trying to dump 1952 Pages
..........

Panic String: Kernel mode trap. Type 0x%x

Kernel Trap. Kernel Registers saved at e0000ed0
ERR=0, TRAPNO=14
cs:eip=0158:d007ed54 Flags=10206
ds = 0160   es = 0160   fs = 0000   gs = 0000
esi= 00000000   edi= d023aa00   ebp= e0000f30   esp= e0000f00
eax= 00000000   ebx= 00000001   ecx= ffffffff   edx= 00000021

Kernel Stack before Trap:
STKADDR   FRAMEPTR  FUNCTION  POSSIBLE ARGUMENTS
e0000f00  e0000f30  swap     (d023aa00,1,0,0)
e0000f38  e0000f6c  swapchun (d01fbad8,0,c03fe020,d01fbad8)
e0000f74  e0000f84  addspg   (d01fbad8,c03fe020,0,0)
e0000f8c  e0000fe0  getpages (d01fbad8,0,0,0)
e0000fe8  e0000ff8  vhand    (43b95c,0,0,0)
-- 
Derek "Tigger" Terveer	det@hawkmoon.MN.ORG -- U of MN Women's Lax
I am the way and the truth and the light, I know all the answers; don't need
your advice.  -- "I am the way and the truth and the light" -- The Legendary Pink Dots

jackv@turnkey.tcc.com (Jack F. Vogel) (06/16/91)

In article <1991Jun14.202957.1408@hawkmoon.MN.ORG> det@hawkmoon.MN.ORG (Derek E. Terveer) writes:
|My machine has been crashing at least once per day with the following panic
|which is preceded by a number of NOTICEs.  I am not sure what the getcpages
|message indicates other than the obvious; that somewhere the kernel required
|at least one page of contiguous "memory".  And because it was in swap (and
|swapchunk) when it died, i presume that the swapper paniced when it couldn't
|swap in? a page of memory from the swap device?  I say "in" because I have 16MB
|of swap space allocated and i have never seen that dip below about 10MB
|available, even with X-windows running and lots of stuff going on.  So, i find
|it hard to believe that Esix had trouble swapping out to the swap area on disk.
 
|Kernel Stack before Trap:
|STKADDR   FRAMEPTR  FUNCTION  POSSIBLE ARGUMENTS
|e0000f00  e0000f30  swap     (d023aa00,1,0,0)
|e0000f38  e0000f6c  swapchun (d01fbad8,0,c03fe020,d01fbad8)
|e0000f74  e0000f84  addspg   (d01fbad8,c03fe020,0,0)
|e0000f8c  e0000fe0  getpages (d01fbad8,0,0,0)
|e0000fe8  e0000ff8  vhand    (43b95c,0,0,0)


Sorry, you are wrong. This stack trace shows that it is the pager "vhand"
that is running, and what it is doing is stealing unreferenced pages and
sending them out to the swap device. The pager and swapper share certain
routines like getpages(), swapchunk(), and swap(). I am not familiar with
the SVR3.2 source but AIX is SVR2 based and it has these routines as well.
I have never heard of the routine getcpages() nor addspg(), so I am not
sure what these do. The way these routines work within the AIX kernel is
that getpages() scans page tables for stealable pages, it will accumulate
a "chunk" of contiguous pte's, then swapchunk() is called to allocate
space on the swap device, the blocks on the swap device MUST be contiguous
due to the way that swap() works. Finally, swap() is the routine where
the actual I/O is done by setting up the physio buffers and then calling
the stragegy routine for the device.

What is odd is the NOTICE seems to indicate that the pager can't get even
a single page on the swap device, and generally if you were really out
of space I would think the kernel would kill certain processes. Also, the
panic you actually got was a kernel page fault, looks kind of like swap()
had a bogus buffer pointer or something. 

All in all, I would say you need to talk to the Esix support folks, where
somebody can check the source and tell you exactly what's going on. I
don't believe running out of swap is your problem here, and if it were
I don't think it should cause a panic.

Good Luck!

Disclaimer: I'm paid to hack the kernel, not to speak for the company.

-- 
Jack F. Vogel			jackv@locus.com
AIX370 Technical Support	       - or -
Locus Computing Corp.		jackv@turnkey.TCC.COM

bill@unixland.natick.ma.us (Bill Heiser) (06/16/91)

In article <1991Jun15.205327.10904@turnkey.tcc.com> jackv@turnkey.TCC.COM (Jack F. Vogel) writes:
>In article <1991Jun14.202957.1408@hawkmoon.MN.ORG> det@hawkmoon.MN.ORG (Derek E. Terveer) writes:
>|My machine has been crashing at least once per day with the following panic
>|which is preceded by a number of NOTICEs.  I am not sure what the getcpages
>|message indicates other than the obvious; that somewhere the kernel required
>|at least one page of contiguous "memory".  And because it was in swap (and

>Sorry, you are wrong. This stack trace shows that it is the pager "vhand"
>that is running, and what it is doing is stealing unreferenced pages and

I'm seeing these "can't get x contiguous pages" too.  The machine hasn't
been panicing as described above ... but I've been seeing some strange
hangs lately -- like only the past couple of weeks!  One symptom I've
seen a couple of times is that one of the modems stops showing 
DSR (I think that's the one, forget offhand;  in any case, the getting
can't be killed and the port is INOP.    If I try a second time to kill
the getty, the machine hangs rock solid right after hitting return on
the kill -9 command.  It requires a cold system reset.

This is what "crash" shows right now:


# crash
dumpfile = /dev/mem, namelist = /unix, outfile = stdout
> panic
System Messages:

total real mem  = 7995392
total avail mem = 5361664

ESIX System 5.3.2 Rev.D

Copyright (c) 1984, 1986, 1987, 1988 AT&T
Copyright (c) 1987, 1988 Microsoft Corp.
Copyright (c) 1988, 1989, 1990 Everex Systems Inc.
All Rights Reserved


FAS 2.08.0 async driver: Unit 0-5 init state is [******]


WARNING: Excessive modem status interrupts on FAS unit 3 (check the cabling).

NOTICE: getcpages - waiting for 1 contiguous pages


No Panic
> quit

script done on Sat Jun 15 22:12:35 1991


The "modem status interrupts" happened after one of the serial
ports hung and I cycled power on the modem (this was a different
scenario than that described above) -- the modem reset cleared this
particular problem.  I have also seen the modem staus messages
when powering on a Microterm 5510 terminal I have configured on the
system with a null modem adapter (the terminal is acting up, I have
to poower cycle it sveral times to get it to come up).  (Sigh, so 
many problems :-(


p.s.  I've not seen the getcpages on the console.  Only crash
shows them.

This is bad.  Having to cold reset the machine usually requires 
booting from floppy to get the fs usable again.



-- 
bill@unixland.natick.ma.us     ...!uunet!think!unixland!bill
OR ..!uunet!world!unixland!bill     heiser@world.std.com
Public Access Unix 508-655-3848(2400)   508-651-8723(HST)  508-651-8733(PEP-V32)

pim@cti-software.nl (Pim Zandbergen) (06/16/91)

bill@unixland.natick.ma.us (Bill Heiser) writes:

>p.s.  I've not seen the getcpages on the console.  Only crash
>shows them.

The getcpages notice does not show on the console. It does
in /dev/osm after you have configured Operating System Messages
into the kernel. 

Same thing on the AT&T 3B2, so I suspect this is normal behaviour
for all true System V Release 3 ports.

I often get this notice when using huge amounts of memory,
like while running gcc and X11 at the same time.
Nothing seems to go wrong, though.

Here's an explanation of getcpages I got from someone at AT&T:

++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

getcpages is the name of a function in the operating system. This
function gets physically contiguous pages for a process to run.
The paging daemon, vhand, is responsible for freeing up the memory
as the need arises. In a busy system with limited memory and swap
space and with several large processes, when the system is unable
to get a contiguous page for a process the NOTICE is printed on
the console.

In the chapter on Performance Management of the 3B2 System
Administrator's Guide there is a section on Tunable Parameters.
The paging parameters in that section describe the operation
of the paging daemon vhand.

In the system that exhibits the problem, increase swap space
and modify GPGSLO and GPGSHI as necessary.

++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
-- 
Pim Zandbergen                          domain : pim@cti-software.nl
CTI Software BV                         uucp   : uunet!mcsun!hp4nl!ctisbv!pim
Laan Copes van Cattenburch 70           phone  : +31 70 3542302
2585 GD The Hague, The Netherlands      fax    : +31 70 3512837

bill@unixland.natick.ma.us (Bill Heiser) (06/18/91)

In article <1991Jun16.123322.10406@cti-software.nl> pim@cti-software.nl (Pim Zandbergen) writes:
>
>The getcpages notice does not show on the console. It does
>in /dev/osm after you have configured Operating System Messages
>into the kernel. 

What's /dev/osm and "Operating System Messages?"  

I don't think ESIX has such a thing to be configured into the kernel.
Anyone out there know about this?


-- 
bill@unixland.natick.ma.us     ...!uunet!think!unixland!bill
OR ..!uunet!world!unixland!bill     heiser@world.std.com
Public Access Unix 508-655-3848(2400)   508-651-8723(HST)  508-651-8733(PEP-V32)

jackv@turnkey.tcc.com (Jack F. Vogel) (06/18/91)

In article <1991Jun18.002018.1899@unixland.natick.ma.us> bill@unixland.natick.ma.us (Bill Heiser) writes:
>
>What's /dev/osm and "Operating System Messages?"  
>
>I don't think ESIX has such a thing to be configured into the kernel.
>Anyone out there know about this?
 
"Operating System Messages" is a facility that can be installed in your
kernel. It is basically a special set of kernel printf()'s, I am not
sure about its implementation in SvR3, but in the AIX kernel these are
called 'ncprintf()'s. The idea is that these message strings are not
displayed on the console, rather they are written to a kernel internal
circular data buffer (called osmbuf in AIX anyway). These are messages
that are generally more technical than the average administrator would
probably care about, but at times particularly when debugging a problem
they might be useful. /dev/osm is the entry point to allow you to
access that kernel buffer, you would typically 'cat /dev/osm' to see
its current contents. At least with AIX if you have syslogd running
it also writes that content into /usr/adm/messages I believe.

I can't say for Esix but ISC does have the OSM option, it is one of the
facilities that can be installed when running kconfig. And there is a
/dev/osm on my system, although I don't have the facility installed.

Disclaimer: I'm a kernel hacker, not a company spokesweenie :-}.


-- 
Jack F. Vogel			jackv@locus.com
AIX370 Technical Support	       - or -
Locus Computing Corp.		jackv@turnkey.TCC.COM