[comp.unix.aix] Paging-space problems

staff@cadlab.sublink.ORG (Alex Martelli) (11/08/90)

Maybe it's the pageable kernel, who knows, but for sure AIX 3 is rather
funny wrt programs that do dynamic allocation (don't they all?-).  We
have an interactive program (a solid modeler) which allocates memory
dynamically, depending on what the user is doing, with straight calls to
malloc().  On most Unix platforms (...all of them, except AIX 3...!),
if the program (=its user) is too ambitious for the amount of paging
space, malloc() will eventually return NULL - the program religiously
tests for this (to free all that's freeable, retry the malloc(), and
if it still fails gracefully inform the user).  On some machines the
failing malloc()'s give console messages suggesting expansion of paging
space, which is bothersome enough (we DO NOT want to expand paging space
to infinity, there's NO limit on the complexity of solid models that a
user will *attempt* to build, we JUST want to inform the user that a
given model takes more memory than he/she's got, is that such an unusual
approach on our part, grumble grumble!-).

On AIX 3, things are worse.  It appears that malloc() does succeed, BUT
then "system paging space" gets low, and funny things happen.  If our
application does not catch SIGDANGER, it gets killed; if it DOES catch
SIGDANGER, the *X Window System* (under which the app's running) gets
killed instead!  The app does not appear to be able to really "free"
memory to the system, i.e. normal free() probably does not sbrk()
(this happens on MANY platforms... the malloc()/free() pair seems to
attempt to minimize system-call overhead).  We could try funneling
our malloc()'s through a safemalloc() which will check psdanger() and
refuse to allocate if this would take paging-space too low, but this
does not appear to solve the problem: I believe Xlib, Xt, Ingres, and
whatever else we link with our app, use raw malloc()'s.  Ok, so I COULD
completely rewrite the malloc() package and fix things for OUR process,
but this STILL would not solve it - it's quite likely that the malloc()
from OUR process happens when everything's fine, but right after that
any process running some application which source we don't have might
well allocate more memory and cause the danger condition!

It upsets me that an application programmer is supposed to fix these
low-level, system-oriented things, and the only alternative appears to
forego dynamic memory allocation completely!  Just having a program
(or the X Window System indispensible for interfacing to it) die on
the user when he/she attempts construction of a complex solid model
does NOT appear to be a viable approach for a commercial application!!!

A system-level solution would be best, but I can't find a good one.
I think we now have ALL the documentation IBM supplies, but I don't
see a way there to reserve some amount of resource (paging space) for
the kernel, or for root-owned processes, or whatever.  The limits file
allows things like fixing maximum amount of data area PER PROCESS, but
what good will this do me???  I can't predict how many processes WILL
be running when a dangerous situation approaches!  Why oh why can't
brk()/sbrk() just REFUSE to expand space to a dangerous situation?

Suggestions will be appreciated, particularly on how to AVOID danger
situations, but also on how to GRACEFULLY HANDLE them.  Our plight
does NOT appear to me to be a very strange one, so I would really hope
somebody else's "been there before"!	Thanks in advance.

-- 
Alex Martelli - CAD.LAB s.p.a., v. Stalingrado 45, Bologna, Italia
Email: (work:) staff@cadlab.sublink.org, (home:) alex@am.sublink.org
Phone: (work:) ++39 (51) 371099, (home:) ++39 (51) 250434; 
Fax: ++39 (51) 366964 (work only), Fidonet: 332/401.3 (home only).

richard@locus.com (Richard M. Mathews) (11/13/90)

staff@cadlab.sublink.ORG (Alex Martelli) writes:

>On AIX 3, things are worse.  It appears that malloc() does succeed, BUT
>then "system paging space" gets low, and funny things happen.

It seems that some customers insist on this behavior because they malloc
huge areas which are only sparsely used.  They want to have a virtual
process size much greater than the paging space of the system.  At least
that is what I was told when I was asked to write the SIGDANGER code for
AIX 1.x.

Richard M. Mathews			Freedom for Lithuania
Locus Computing Corporation		       Laisve!
richard@locus.com
lcc!richard@seas.ucla.edu
...!{uunet|ucla-se|turnkey}!lcc!richard

marc@arnor.uucp (11/15/90)

malloc fails when the request causes the heap to exceed the 
ulimit for data.  It has nothing to do with paging space.

In AIX V3 the default data limit is quite large, which is why
it appears to behave differently.

Marc Auslander

rhoover@arnor.uucp (11/15/90)

In article <MARC.90Nov14153807@marc.watson.ibm.com>, marc@arnor.uucp writes:
|> malloc fails when the request causes the heap to exceed the 
|> ulimit for data.  It has nothing to do with paging space.
|> 
|> In AIX V3 the default data limit is quite large, which is why
|> it appears to behave differently.
|> 
|> Marc Auslander

Well, this is not true under sunos.  For example, consider the following program (called big.c):

#include <stdio.h>

main()
{
    while (malloc(1024*1024*4) != NULL)
	fprintf(stderr,"Another 4 meg\n");
    fprintf(stderr,"That's all folks\n");
}

cirrus% limit
cputime         unlimited
filesize        unlimited
datasize        524280 kbytes
stacksize       8192 kbytes
coredumpsize    unlimited
memoryuse       unlimited
cirrus% /etc/pstat -s
15312k allocated + 4816k reserved = 20128k used, 29772k available
cirrus% big
Another 4 meg
Another 4 meg
Another 4 meg
Another 4 meg
Another 4 meg
Another 4 meg
Another 4 meg
That's all folks
cirrus% 

Every unix system that I have ever used has returned 0 when malloc can no longer allocate usable memory.  When I malloc storage, I check for NULL and if my application has files to be written out, etc, I free some storage and clean up.

I see this malloc issue as one of compatability.  Programs should not have to be rewritten in order to run on IBM machines.  If the /6000 version of malloc is faster, then a new call ( vmalloc() ? ) should be provided for fast memory allocation under the new semantics.

Would you have been a happy camper if berkeley had replaced fork() with the vfork() semantics and had provided psfork() for compatability?

roger
rhoover@ibm.com

dennis@gpu.utcs.utoronto.ca (Dennis Ferguson) (11/16/90)

In article <1990Nov14.223820.29154@arnor.uucp> rhoover@cirrus.watson.ibm.com (Roger Hoover) writes:
>In article <MARC.90Nov14153807@marc.watson.ibm.com>, marc@arnor.uucp writes:
>|> malloc fails when the request causes the heap to exceed the 
>|> ulimit for data.  It has nothing to do with paging space.
>|> 
> Well, this is not true under sunos.  For example, consider the
> following program (called big.c):
[...]

It has been a while since I understood this really well, but I think
what is being described is a System V versus BSD Unix variation.

Under BSD Unix (and very old System V derivatives?), sufficient space on
the paging device to back all active virtual memory is allocated when
the memory is allocated, no matter whether the paging space is ever used
or not.  The effect of this is that you run out of paging space only when
allocating new virtual memory, i.e. when exec()ing a new program (in
which case the shell probably emits a "No memory" message) or when growing
an existing process (in which case malloc() returns a NULL value).  If
a process is started successfully it will never be terminated due to
paging space exhaustion, though requests for more memory may be denied.
Other implications are that you can't run a BSD Unix system with no
paging space, and if the size of the paging space doesn't exceed the
size of your physical memory you won't be able to use all of the latter.

System V (or at least the release I was familiar with) doesn't do this.
Instead it allocates page space dynamically, when you need to page
something out.  Running processes have no page space allocation unless
they actually have pages out on the backing store.  The good effects
of this are that you can run System V systems with no page space at
all if need be, and that the total in-use memory allowed is related
to (physical memory + page space) rather than just page space.  The
problem is that System V doesn't know if page space is exhausted
when it allocates new memory, but rather finds this out only when
it needs to page something out.  To avoid deadlock, the process which
is being paged out is killed.

I think AIX exhibits the latter behaviour exactly.  Malloc() never
returns NULL because the kernel doesn't know page space is exhausted
at that point.  Sometime later, however, a process will die to pay
for this.  Note that the process which dies is hardly ever the
process which grew itself, since the latter process is obviously
active and needs its pages, but rather something that was recently
active but which is now idle, like your shell, the window system
or a daemon.  Something has to die when you get to this point, since
it isn't normally possible to free memory back to the system.  It
sounds like the SIGDANGER thing was added to AIX to give the (more
likely to be guilty) active process a chance to commit hara kiri
before an innocent dies.

To tell the truth, I too like the BSD behaviour a lot better (though
the implementation in a vanilla BSD Unix is old, grotty, and still
suspects all the world is a vax).  Having random processes die is
truly annoying.

Dennis Ferguson
University of Toronto

mccalpin@perelandra.cms.udel.edu (John D. McCalpin) (11/16/90)

>On 15 Nov 90 18:09:19 GMT,dennis@gpu.utcs.utoronto.ca (Dennis Ferguson) said:

Dennis> In article <1990Nov14.223820.29154@arnor.uucp> rhoover@cirrus.watson.ibm.com (Roger Hoover) writes:
>In article <MARC.90Nov14153807@marc.watson.ibm.com>, marc@arnor.uucp writes:
>|> malloc fails when the request causes the heap to exceed the 
>|> ulimit for data.  It has nothing to do with paging space.
>|> 

Dennis> System V (or at least the release I was familiar with) doesn't
Dennis> do this.  Instead it allocates page space dynamically, when
Dennis> you need to page something out.  Running processes have no
Dennis> page space allocation unless they actually have pages out on
Dennis> the backing store.  The good effects of this are that you can
Dennis> run System V systems with no page space at all if need be, and
Dennis> that the total in-use memory allowed is related to (physical
Dennis> memory + page space) rather than just page space. 

Dennis> I think AIX exhibits the latter behaviour exactly.  

This does not mesh with my experience with AIX.  Under AIX 3.1 on my
RS/6000, I find that I cannot run jobs for which there is not enough
paging space available on the disk --- even though there is plenty of
memory to contain the job.

Examples:
(1) With 16 MB paging space, the O/S used 12 MB and reported 4 MB
free.  With either 8 MB or 32 MB installed in the machine, I was
unable to run jobs with an *active working set* larger than 4 MB.

(2) With 36 MB paging space, the O/S used 12 MB and reported 24 MB
free.  With 32 MB RAM, I was able to run jobs with *active working
sets* right up to the 24 MB paging space limit.  A check with 'ps v'
showed that the jobs were completely in RAM.

So what do I mean by *active working set*?  Well, I'm not sure how the
O/S figures it out, but the following program runs until the part of
the array that is *actually used* gets too big for the currently
available paging space:

	parameter (n = 2**22)
	doubleprecision a(n)

	do length=65536,n,65536
		do i=1,length
			a(i) = float(i)
		end do
		print *,'Size (MB) :',float(length*8)/float(2**20)
	end do
	end

So somewhere this datum has to fit into theories on how AIX does
paging....
--
John D. McCalpin			mccalpin@perelandra.cms.udel.edu
Assistant Professor			mccalpin@vax1.udel.edu
College of Marine Studies, U. Del.	J.MCCALPIN/OMNET

dennis@gpu.utcs.utoronto.ca (Dennis Ferguson) (11/16/90)

In article <MCCALPIN.90Nov15150308@pereland.cms.udel.edu> mccalpin@perelandra.cms.udel.edu (John D. McCalpin) writes:
>>On 15 Nov 90 18:09:19 GMT,dennis@gpu.utcs.utoronto.ca (Dennis Ferguson) said:
>Dennis> run System V systems with no page space at all if need be, and
>Dennis> that the total in-use memory allowed is related to (physical
>Dennis> memory + page space) rather than just page space. 
>
>Dennis> I think AIX exhibits the latter behaviour exactly.  

>This does not mesh with my experience with AIX.  Under AIX 3.1 on my
>RS/6000, I find that I cannot run jobs for which there is not enough
>paging space available on the disk --- even though there is plenty of
>memory to contain the job.
[some interesting examples]
>So what do I mean by *active working set*?  Well, I'm not sure how the
>O/S figures it out, but the following program runs until the part of
>the array that is *actually used* gets too big for the currently
>available paging space:

John,

Figuring out what memory has been modified is fairly simple since the
memory management hardware keeps track of this.  Note that the only
memory which may need to go out to paging space is modified data.  Text
and unmodified, initialized data pages can be paged in from the binary, while
unmodified, uninitialized data pages need not exist at all until they
are touched.  BSD kernels don't worry so much about any of this (i.e. they
allocate page space in advance for all (potentially modifiable) data
pages), but I think System V kernels do.  If you allocate page space
on the fly you can do things like allocate huge chunks of memory which
you only use little bits of without having to have a huge, mostly
unused swap area to back it.  Of course, if you do this what you
lose are the nice "No memory" messages and NULL return values from
malloc() when you run out.

You are right that your examples do not match my memory of the
behaviour of System V kernels, in fact they seem to exhibit a combination
of some of the worst characteristics of both System V and BSD paging
strategies.  What would be interesting to know, however, is whether yo
were ignoring or catching the SIGwhatever which indicates low memory when
you were running these and, if you weren't, whether the behaviour you see
changes if you do ignore this.  If the latter is true what you are seeing may
be the result of an IBM value-added feature rather than a characteristic
of the underlying System V kernel.

Indeed, what would be really interesting is if someone who actually
knew what they were talking about would explain how AIX paging works
to both of us.  What ever they have done, the results are often
unpleasant and are not helped by the greedy memory consumption
of AIX and some of its utilities.

Dennis Ferguson
University of Toronto

madd@world.std.com (jim frost) (11/17/90)

marc@arnor.uucp writes:
>malloc fails when the request causes the heap to exceed the 
>ulimit for data.  It has nothing to do with paging space.

You are mistaken -- either can cause malloc to fail.

jim frost
saber software
jimf@saber.com

geoff@edm.uucp (Geoff Coleman) (11/17/90)

From article <311@cadlab.sublink.ORG>, by staff@cadlab.sublink.ORG (Alex Martelli):
> 
> A system-level solution would be best, but I can't find a good one.
> I think we now have ALL the documentation IBM supplies, but I don't
> see a way there to reserve some amount of resource (paging space) for
> the kernel, or for root-owned processes, or whatever.  The limits file
> allows things like fixing maximum amount of data area PER PROCESS, but
> what good will this do me???  I can't predict how many processes WILL
> be running when a dangerous situation approaches!  Why oh why can't
> brk()/sbrk() just REFUSE to expand space to a dangerous situation?

	But you really should read the man page for the limits file
It resides in hardcopy in "Files Reference". It is without a doubt the
funniest AIX man page I've found yet. Of 6 paramaters all but fsize
are marked as "not used" (so why are they there). So even if data=xxxx
would theoretically solve your problem it wouldn't really because
the value is ignored.


Geoff Coleman

> -- 
> Alex Martelli - CAD.LAB s.p.a., v. Stalingrado 45, Bologna, Italia

rogers@rogers.austin.ibm.com (Mark D. Rogers/100000) (11/21/90)

Regarding Dennis Ferguson's request for an explanation
of how AIX paging space allocation works, here goes:

- paging disk slots are allocated on first reference to a page
  (early disk allocation, similar to the BSD-style Dennis described,
   although we do not implement quotas).

- malloc() (actually brk()/sbrk()) Do permit over-allocation of 
  paging space. SIGDANGER was quite correctly interpreted by Mr. Ferguson
  in a prior posting as being something that allows a process to
  `gracefully' exit. I forget what the algorithm for determining which
   process to kill, is, but you understand the mechanism correctly.

SIGDANGER was actually invented for the IBM RT, and migrated to
the RISC System/6000. The RT Virtual Memory Subsystem did Late Paging
Space Allocation, and allowed malloc() to over-allocate, exactly like
the System V description.

For the RISC System/6000, a completely new file system which has directory
journalling and is tightly coupled with the paging supervisor was written.
The pager does most of the file i/o via internally mapped files.
Early allocation was done on the new pager, essentially, for two reasons:

- to allow for potential future accounting mechanisms to be implemented
  which take advantage of the fact that virtual memory is always backed
  one-to-one with a disk slot (what you guys want). I don't know what we
  are going to do in this area yet (if anything).

- the new journalled file system/paging subsystem is quite complex, 
  having many objects with many states in and among themselves, to manage.

Early allocation simplified the paging subsystem design, and allows for
potential to implement accounting of sorts. 
As to whether we will or will not do accounting/quotas, I really don't
know at the moment, as I am not the person tracking that.

With regard to the malloc() issue -

- you are quite right in saying that we have customers who
  want `sparse' virtual memory. That really is a big deal among
  certain applications. Being able to have a large memory object
  and only pay for what you really use can be a nice programming
  construct.

- We did provide a `safemalloc()' which goes and touches all the
  pages & checks for SIGDANGER. I thought we shipped that as a sample.

We have at Least two very distinct classes of customers where
paging space allocation is concerned:

1.  Those such as yourselves, who want your application to either
    run, or not, based upon how much backing storage you have.
    If your app. doesn't run, go buy another disk & add paging space.

2.  Customers who like to allocate all the virtual memory they can,
    knowing that it will never all be used. This, alleviates the
    need for any complicated run-time memory management schemes in
    their model. It is a very convenient programming construct.

From a historical perspective (for your information) we had
a number of `type 2' customers Early on on the RT, and
that influenced the System V-like behaviour of the RT somewhat.

It also has something to do with why we, on the RISC System/6000
still want to allow large virtual memory objects efficiently.

Basically what we have on the RISC System/6000, is a hybrid attempt
at allowing both types of customers. SIGDANGER is a compromise attempt
to allow them to co-exist. Admittedly, it is not perfect, however,
no matter what route one goes in, on this issue, we have found
thorns in the path. We are continuing to investigate the entire
issue, and welcome any comments.

Mark D. Rogers
AIX Operating System Architecture
Austin, Texas

alex@am.sublink.org (Alex Martelli) (11/23/90)

rogers@rogers.austin.ibm.com (Mark D. Rogers/100000) writes:
	...
[after a clear explanation of what AIX is doing re malloc()]
>We have at Least two very distinct classes of customers where
>paging space allocation is concerned:
>1.  Those such as yourselves, who want your application to either
>    run, or not, based upon how much backing storage you have.
>    If your app. doesn't run, go buy another disk & add paging space.
>2.  Customers who like to allocate all the virtual memory they can,
>    knowing that it will never all be used. This, alleviates the
>    need for any complicated run-time memory management schemes in
>    their model. It is a very convenient programming construct.

Thanks for the explanation.  However, in our case it's not really
that the application "would not run" - it would run fine as long
as the user was only doing solid models whose complexity were
compatible with his/her amount of paging space, and would give
a harmless warning if a model turned out to be too complex for
that (and in many cases there will be some work-around for such
resource limitations).
I would assume that many applications where the user interactively
asks for memory-consuming operations, from symbolic maths to
statistical data analysis, would be similar to solid modeling in
this respect.  Pity that this class of apps has been sacrificed
to others who apparently need garbage-collection stuff and appear
to be too lazy to do it:-).  
Another consideration is that many applications will be PORTED from
the huge existing Unix base to AIX; these will NOT expect unreliable,
over-committed malloc()!  For newly developed apps it may be a draw,
but considering potential portings to AIX, I believe the 2 vs 1
choice was inferior.  Pity!

Touching all pages and calling psdanger() on EACH malloc() is
WAY too much overhead, I think.  I believe I will have to rewrite a
malloc()/free()/realloc() package, putting all overhead only on the
sbrk(); it's either that, or have to explain to customers how and why
our solid modeler is so fragile on AIX, while it's solid as a rock
on HP/UX, Ultrix, Sun/Os, SONY/NeWS, and so on and on!

>From a historical perspective (for your information) we had
>a number of `type 2' customers Early on on the RT, and
>that influenced the System V-like behaviour of the RT somewhat.

>It also has something to do with why we, on the RISC System/6000
>still want to allow large virtual memory objects efficiently.

>Basically what we have on the RISC System/6000, is a hybrid attempt
>at allowing both types of customers. SIGDANGER is a compromise attempt
>to allow them to co-exist. Admittedly, it is not perfect, however,
>no matter what route one goes in, on this issue, we have found
>thorns in the path. We are continuing to investigate the entire
>issue, and welcome any comments.

I hope this input is some use!  If you would just include, say,
a libmalloc.a which does all necessary checks on sbrk(), for
"compatibility with non-overcommitting Unices", it would be nice.
Best, I believe, would be a tunable parameter to force sbrk() to
non-overcommittance on a system-wide basis; I don't really see how
you could make SOME processes overcommit while all defaults to
safe allocation, or viceversa, but if you could, that would
definitely be a jump upwards in quality, to go with others that
AIX has undoubtedly.

Feel free to follow this up either here or by email (but in this
case to staff@cadlab.sublink.org, please - it's not really a
"personal interest" thing, although I'm replying from home - I
can't really afford a RS/6000 machine at home...:-).  And thanks
again for your clear explanation and comments.
-- 
Alex Martelli - (home snailmail:) v. Barontini 27, 40138 Bologna, ITALIA
Email: (work:) staff@cadlab.sublink.org, (home:) alex@am.sublink.org
Phone: (work:) ++39 (51) 371099, (home:) ++39 (51) 250434; 
Fax: ++39 (51) 366964 (work only), Fidonet: 332/401.3 (home only).