rudy@chukran.austin.ibm.com (02/01/91)
> > Now that we have 64 Mbytes, does anyone know how to configure > > the kernel such that malloc will allocate 64 mbytes (or more) ? > > Right now, malloc refuses to allocate more than 32 mbytes and > > we have not been able to figure out how to change this. > > > > I think that you need to look at the 'ulimit' command. At least on my > machine, the limit for memory is 32M. More specifically, check the ulimit subcommand of sh or the limit subcommand of csh. You should be interested in the data size. Be aware that you cannot get more than 256M of memory, no matter what ulimit tells you (mine tells me 262000 kilobytes, which is impossible) due to architectural limit of segment size. Be aware also that repetitive mallocs of small sizes will fragment the freespace so that there may be an agregate of 32M , but not a contiguous amount. To test this, try a test program which mallocs 64M the first thing it does. Keep in mind that all program stack , static data, external data all come out of the data segment. Id suggest checking your program sizes with the size command to see how much data is allocated from the data segment before your program starts to run. It may indeed be that there is not 32M left to malloc. ********************************************************************* IBM AIX Porting Center | RSCS: CHUKRAN at AUSTIN 11400 Burnet Rd. | AWDnet: rudy@chukran.austin.ibm.com Internal ZIP 2830 | internet: chukran@austin.iinus1.ibm.com Austin, Texas 78758 | Voice: 512-838-4674 Tieline: 678-4674 *********************************************************************
pa@appmag.com (Pierre Asselin) (04/09/91)
The problem: as you all remember, malloc() returns NULL only when the process exceeds its datasize limit. If malloc returns a non-null pointer, the memory may turn out to be exceedingly virtual: there won't be any paging space behind it. AIX runs out of paging space when the process actually uses the memory. Various processes die. In Info, see `List of Books', `General Concepts and Procedures', scroll ~1/3 down, `Paging Space Overview'. See also psmalloc.c in /usr/lpp/bos/samples. Etc etc etc. Personally, I think it's a bug. If there is no memory left, malloc should return a NULL. IBM says it's a feature, catch SIGDANGER if you don't like it. At least one person at SDRC agrees with me: > IBM RS/6000 Memory Allocation Problem September, 1990 > > ...The operating system on the IBM RS/6000 allows CAEDS to > allocate more memory than is actually available... Now, we meet with IBM representatives every Thursday so that all our problems may get solved. At those meetings I am encouraged to report these things to IBM, even if they involve design changes. (This is where the `design APAR' mythology started. As pointed out later on comp.unix.aix, there is no such thing as a design APAR.) The predictable result came out today. 1) It's not a bug. It's a feature. Therefore, there is no problem. 2) If I want a change, I am to file a `DCR' with my marketing representative. 3) This behaviour of malloc is an IBM design and was not mandated by an external standard such as SVID. The only interesting piece of information is (3). General conclusion from this exercise: o IBM doesn't listen. It doesn't know how. o IMB doesn't respond with accurate information either. It doesn't know how. General conclusions from earlier exercises: o Software Defect Support is officially limited to its narrow mandate. o Technical support is available for the RISC-6000's. It's called comp.unix.aix. o Accurate information on the RISC-6000's is available, but only on comp.unix.aix. o Accurate information on the IBM support structure is available, but only on comp.unix.aix. o To this day, IBM is convinced that it's doing a fine job. o Hardware support does work. Beats me. Austin: STAY ON THE NET! You're the only way we'll ever get the straight dope. (Anyone care to give us the straight dope on this one?) Good day. --Pierre Asselin, R&D, Applied Magnetics Corp. I speak for me.
mbrown@testsys.austin.ibm.com (Mark Brown) (04/13/91)
| The problem: as you all remember, malloc() returns NULL only | when the process exceeds its datasize limit. If malloc returns a | non-null pointer, the memory may turn out to be exceedingly | virtual: there won't be any paging space behind it. AIX runs | out of paging space when the process actually uses the memory. | Various processes die. In Info, see `List of Books', `General | Concepts and Procedures', scroll ~1/3 down, `Paging Space | Overview'. See also psmalloc.c in /usr/lpp/bos/samples. Etc etc | etc. | | Personally, I think it's a bug. If there is no memory left, | malloc should return a NULL. IBM says it's a feature, catch | SIGDANGER if you don't like it. Yeah, I've heard complaints (and roses) on this one. The Rationale: Rather than panic the machine, we'd like for it to keep running as long as possible. Hence, we try to keep running at all costs, including doing things like this. So, when we do get close to the limit, we send a warning, than as we go over we start killing the biggest memory users. (Warning - this processes involved have been overly simplified). The Idea was to make the machine 'more reliable'. Our research led us to believe that many processes allocated more memory than actually used in page space (I think) and we used this knowledge. Understandably, many UNIX users either a) want the machine to panic, "like UNIX does"; or b) hate our algorithm for killing jobs. I also think we don't advertise/ document the process involved enough to make it useful to users. So, do we go back to blowing up processes that allocate too much memory, even though that memory may actually be there by the time the process actually uses it? Do we go back to 'panic' when page space fills? There are reasonable arguments for doing this... Mark Brown IBM PSP Austin, TX. (512) 823-3741 VNET: MBROWN@AUSVMQ MAIL: mbrown@testsys.austin.ibm.com OR uunet!testsys.austin.ibm.com!mbrown Which came first: The Chicken or the Legba? DISCLAIMER: Any personal opinions stated here are just that.
dennis@gpu.utcs.utoronto.ca (Dennis Ferguson) (04/14/91)
In article <6644@awdprime.UUCP> mbrown@testsys.austin.ibm.com (Mark Brown) writes: >| The problem: as you all remember, malloc() returns NULL only >| when the process exceeds its datasize limit. If malloc returns a >| non-null pointer, the memory may turn out to be exceedingly >| virtual: there won't be any paging space behind it. AIX runs >| out of paging space when the process actually uses the memory. >| Various processes die. In Info, see `List of Books', `General >| Concepts and Procedures', scroll ~1/3 down, `Paging Space >| Overview'. See also psmalloc.c in /usr/lpp/bos/samples. Etc etc >| etc. >| >| Personally, I think it's a bug. If there is no memory left, >| malloc should return a NULL. IBM says it's a feature, catch >| SIGDANGER if you don't like it. > >Yeah, I've heard complaints (and roses) on this one. >The Rationale: Rather than panic the machine, we'd like for it to keep >running as long as possible. Hence, we try to keep running at all costs, >including doing things like this. So, when we do get close to the limit, >we send a warning, than as we go over we start killing the biggest memory >users. (Warning - this processes involved have been overly simplified). > >The Idea was to make the machine 'more reliable'. Our research led us >to believe that many processes allocated more memory than actually used in >page space (I think) and we used this knowledge. Understandably, many >UNIX users either a) want the machine to panic, "like UNIX does"; or >b) hate our algorithm for killing jobs. I also think we don't advertise/ >document the process involved enough to make it useful to users. > >So, do we go back to blowing up processes that allocate too much memory, >even though that memory may actually be there by the time the process >actually uses it? Do we go back to 'panic' when page space fills? There are >reasonable arguments for doing this... I'm old enough to have used vanilla Version 7 Unix when PDP-11s were in vogue, and to be brutally frank the only Unix I can remember using which panic'd when it ran out of memory was an early AIX on an RT, a system which I hardly think qualifies as The Definitive Unix. The behaviour of AIX is, from the user's perspective, a whole lot like the behaviour of vanilla System V Unix, which also kills off random processes when it runs out of memory (or used to, at least, I haven't paid attention much recently). The only IBM value-added bit in this is the signal (to be fair, I do understand that the backing store allocation policy is different internally than System V, and is actually more conservative. Looks pretty similar from the user's perspective, though). BSD Unix doesn't (a) panic, or (b) kill processes, I suspect what the users who are complaining want is (c) malloc() to return NULL when the machine runs out of memory, without panicing and without random processes being killed (it is actually easier to do it this way than to do either what System V or what AIX does). Better to explain more exactly why AIX does what it does. It's so vendors who want to sell crufty old Fortran programs which have no way to do dynamic memory allocation, can ship binaries with huge static arrays compiled in for people who want to solve big problems and still have the same binaries run on small machines to solve small problems. To implement this you don't allocate backing store until a page is touched, which means malloc() can't return NULL since it can't, in general, know if the Fortran program running at the same time is actually going to use his pages or not. You should understand, however, that killing off processes isn't the "real" problem. People have used System V machines which do this for years without complaining because, on your typical Unix box being put to typical uses, running out of memory/page space is a rare occurance. On an AIX machine, however, with its humungous kernel and things like the compiler and loader which consume prodigious amounts of memory when running, running out of memory can be a daily occurance. People don't complain about System V because they never find out what happens when memory runs out. With AIX, however, your average user ends up painfully aware of how the system behaves when memory is used up, and so he complains. The real bug is that AIX is a memory pig. It would be useful to fix this one. Dennis Ferguson University of Toronto
marc@ibmpa.awdpa.ibm.com (Marc Pawliger) (04/15/91)
In article <1991Apr9.024814.1141@appmag.com>, pa@appmag.com (Pierre Asselin) writes: |> The problem: as you all remember, malloc() returns NULL only |> when the process exceeds its datasize limit. If malloc returns a |> non-null pointer, the memory may turn out to be exceedingly |> virtual: there won't be any paging space behind it. AIX runs |> out of paging space when the process actually uses the memory. |> Various processes die. In Info, see `List of Books', `General |> Concepts and Procedures', scroll ~1/3 down, `Paging Space |> Overview'. See also psmalloc.c in /usr/lpp/bos/samples. Etc etc |> etc. |> |> Personally, I think it's a bug. If there is no memory left, |> malloc should return a NULL. IBM says it's a feature, catch |> SIGDANGER if you don't like it. [ ... ] Note that Mach also allows you to get in over your head. You can malloc all you want, but until you actually touch those alloc'd pages, they _will_ _not_ _exist_. So I can malloc 2GB on my 4M machine with 10M swap space and not see one NULL pointer return. If I try and _use_ all that memory, though, my program will die, after thouroughly thrashing my machine. I have heard tales that this was due to a Lisp project at CMU that used the high bits of an address to store that data type, so they had addresses that were clustered in a huge virtual memory space, but they were sparse enough so that the actual sum of all the space _used_ fit into physical memory and swap space. |> Austin: STAY ON THE NET! You're the only way we'll ever get the |> straight dope. (Anyone care to give us the straight dope on this one?) And us Palo Alto folks? And Rochester and Kingston folks? And Research? +--Marc Pawliger----IBM Advanced Workstations Division----Palo Alto, CA---+ | Internet: marc@ibminet.awdpa.ibm.com VNET: MARCP at AUSVM6 | | UUCP: uunet!ibminet.awdpa.ibm.com!marc Phone: (415) 855-3493 | +-----IBMinet: marc@ibmpa.awdpa.ibm.com----------IBM T/L: 465-3493------+ These are my opinions, not IBM's etc etc etc
jfh@greenber.austin.ibm.com (John F Haugh II) (04/16/91)
In article <1991Apr14.030748.18052@gpu.utcs.utoronto.ca> dennis@gpu.utcs.utoronto.ca (Dennis Ferguson) writes: >I'm old enough to have used vanilla Version 7 Unix when PDP-11s were in >vogue, and to be brutally frank the only Unix I can remember using >which panic'd when it ran out of memory was an early AIX on an RT, a system >which I hardly think qualifies as The Definitive Unix. UNIX v7 would panic if it ran out of swap space, as would System III, 4.0, 5.0, and every swapping UNIX AT&T released. The PDP-11/45 I learned UNIX on seldom panic'd because it seldom had the load needed to run out of swap space. Other v7-based systems, such as Microsoft's original Xenix, would run on machines which were capable of being overloaded to the point of running out of swap space. I regularly saw a client's MC68000-based Xenix system run out of swap space. It had 768K RAM and 2MB of swap. This is not to serve an as excuse for any vendor's kernel bloat or utility creeping featurism, but rather to simply point out that if you use more than what you have, you will always see some bizarre behavior, and always have seen same. Disclaimer: I speak for myself only. All trademarks are property of their respective trademark owners. -- John F. Haugh II | I've Been Moved | MaBellNet: (512) 838-4340 SneakerNet: 809/1D064 | AGAIN ! | VNET: LCCB386 at AUSVMQ BangNet: ..!cs.utexas.edu!ibmchs!auschs!snowball.austin.ibm.com!jfh (e-i-e-i-o)
dennis@gpu.utcs.utoronto.ca (Dennis Ferguson) (04/17/91)
In article <6670@awdprime.UUCP> jfh@greenber.austin.ibm.com (John F Haugh II) writes: >In article <1991Apr14.030748.18052@gpu.utcs.utoronto.ca> dennis@gpu.utcs.utoronto.ca (Dennis Ferguson) writes: >>I'm old enough to have used vanilla Version 7 Unix when PDP-11s were in >>vogue, and to be brutally frank the only Unix I can remember using >>which panic'd when it ran out of memory was an early AIX on an RT, a system >>which I hardly think qualifies as The Definitive Unix. > >UNIX v7 would panic if it ran out of swap space, as would System III, >4.0, 5.0, and every swapping UNIX AT&T released. The PDP-11/45 I >learned UNIX on seldom panic'd because it seldom had the load needed >to run out of swap space. Other v7-based systems, such as Microsoft's >original Xenix, would run on machines which were capable of being >overloaded to the point of running out of swap space. I regularly >saw a client's MC68000-based Xenix system run out of swap space. It >had 768K RAM and 2MB of swap. This is indeed correct, certainly as far as V7 (and V6, for that matter) is concerned (we have those on line). I should have looked before leaping. I do note, however, that the oldest BSD source we have around (4.1, circa 1981) doesn't panic, nor does the oldest AT&T source (System V release 1? Files are all dated February, 1985). This is not a problem which was only recently fixed. >This is not to serve an as excuse for any vendor's kernel bloat or >utility creeping featurism, but rather to simply point out that if >you use more than what you have, you will always see some bizarre >behavior, and always have seen same. This is very true. Dennis Ferguson University of Toronto
rcd@ico.isc.com (Dick Dunn) (04/19/91)
mbrown@testsys.austin.ibm.com (Mark Brown) writes: [lost the previous attribution for problem statement] > | The problem: as you all remember, malloc() returns NULL only > | when the process exceeds its datasize limit. If malloc returns a > | non-null pointer, the memory may turn out to be exceedingly > | virtual... ... > | Personally, I think it's a bug. If there is no memory left, > | malloc should return a NULL. IBM says it's a feature, catch > | SIGDANGER if you don't like it. The way I read this, the complaint is from the normal-programmer point of view: There's a defined way to indicate that there's no more memory available--return NULL from malloc(). SIGDANGER is an IBM invention. > Yeah, I've heard complaints (and roses) on this one. > The Rationale: Rather than panic the machine, we'd like for it to keep > running as long as possible. Hence, we try to keep running at all costs, > including doing things like this. So, when we do get close to the limit, > we send a warning, than as we go over we start killing the biggest memory > users. (Warning - this processes involved have been overly simplified). As various folks have pointed out, various UNIX systems have had more-or- less graceless responses to running out of (memory+swap). One might ask therefore that a new behavior be better, instead of just different. The "mistake" (if I may call it that) in what Mark is saying, is that the overcommitment of memory/pagespace is a kernel problem. The kernel created the problem by overallocating, so the kernel (being that piece of code responsible for allocating/managing the hardware!) should solve it rather than handing it back to the applications. Look at the problem from the application point of view. > The Idea was to make the machine 'more reliable'... I'll object to the idea that killing some arbitrary process makes the machine "more reliable". If you want "more reliable", don't overcommit! >...Our research led us > to believe that many processes allocated more memory than actually used in > page space (I think) and we used this knowledge... There's something wrong with this. What type of programs were studied in this "research"? I know that typical style in C is: p = (struct whatzit *)malloc(sizeof(struct whatzit)); ... p->thing1 = stuff1; p->thing2 = stuff2; where "..." is rarely more than a check for NULL. The trouble with SIGDANGER is that it occurs at a time which makes no sense to the programmer. Just because you happened to touch some particular piece of memory (and it's unlikely you really know where your page boundaries are) for the first time...or worse yet, some *other* process touched memory for the first time!...you get SIGDANGERed up 'side the head? What do you do? How did you get there? It's fiendishly difficult to tie it back to a real event in terms of what the program knows. Add to that two other considerations: - SIGDANGER is not portable. While IBM may not mind having people write IBM-specific code, many programmers find that requirement objectionable (especially since it's hard to use; it's an anti- feature). - There's a defined way to report insufficient memory to a program (NULL from malloc()), and it happens in a way/place a programmer can use. ...and you can see why a programmer would get upset. > So, do we go back to blowing up processes that allocate too much memory, > even though that memory may actually be there by the time the process > actually uses it?... In the case of C programs and malloc(), yes. If you can't allocate usable memory (meaning "usable" at the point of return from malloc()), you should return NULL. That doesn't "blow up" the process; it gives it a fair chance to decide what to do. -- Dick Dunn rcd@ico.isc.com -or- ico!rcd Boulder, CO (303)449-2870 ...While you were reading this, Motif grew by another kilobyte.
marc@ekhomeni.austin.ibm.com (Marc Wiz) (04/19/91)
IBM is not the only company that decided to allocate page space until the memory was acutally used. There is at least one other Unix implementation where this was done. Marc Wiz MaBell (512)823-4780 Yes that really is my last name. The views expressed are my own. marc@aixwiz.austin.ibm.com or uunet!cs.utexas.edu!ibmchs!auschs!ekhomeni.austin.ibm.com!marc
christos@theory.tn.cornell.edu (Christos S. Zoulas) (04/19/91)
In article <3800@d75.UUCP> marc@ekhomeni.austin.ibm.com (Marc Wiz) writes: >IBM is not the only company that decided to allocate >page space until the memory was acutally used. > >There is at least one other Unix implementation where >this was done. At least couldn't the process itself choose the page space allocation behavior using some system call similar to vadvise(2)? For example a process that wanted to make sure that the space it allocated actually exists should call vadvise(VA_ALLOCATE) or something along those lines before calling sbrk(). christos -- Christos Zoulas | 389 Theory Center, Electrical Engineering, christos@ee.cornell.edu | Cornell University, Ithaca NY 14853. christos@crnlee.bitnet | Phone: (607) 255 0302, Fax: (607) 255 9072
tif@doorstop.austin.ibm.com (Paul Chamberlain) (04/23/91)
jfh@greenber.austin.ibm.com (John F Haugh II) writes: >Better would be a way to say "over commit" or "don't over commit" via >some configuration (like SMIT) option. How about a flag, a compile option, a special routine, or something that an application can do, to say "Please over-commit memory for me and you can kill me if you run out." Then make that nasty fortran compiler do it automatically. :-) Paul Chamberlain | I do NOT speak for IBM. IBM VNET: PAULCC AT AUSTIN 512/838-9748 | ...!cs.utexas.edu!ibmchs!auschs!doorstop.austin.ibm.com!tif
pa@curly.appmag.com (Pierre Asselin) (04/24/91)
My newsfeed only worked one way, so I had to be quiet for a while. Now I'm worldwide again. Dick Dunn (rcd@ico.isc.com) summarized my own position very well. > The way I read this, the complaint is from the normal-programmer point of > view: There's a defined way to indicate that there's no more memory > available--return NULL from malloc(). SIGDANGER is an IBM invention. [... and more statements that I fully endorse] >> So, do we go back to blowing up processes that allocate too much memory, >> even though that memory may actually be there by the time the process >> actually uses it?... > > In the case of C programs and malloc(), yes. If you can't allocate usable > memory (meaning "usable" at the point of return from malloc()), you should > return NULL. That doesn't "blow up" the process; it gives it a fair chance > to decide what to do. I have nothing against the SIGDANGER mechanism per se. It sure beats what SysV and Mach have to offer. I have nothing against a sparse allocator that doesn't lock paging space right away. But it shouldn't be called malloc. QUESTIONS: Let's say I #ifdef _AIX and I use psmalloc. 1) Can I touch everything it gives me? 2) Can I still use it twenty minutes later? 3) Can I still get burned by routines in libc.a that call the regular malloc? 4) Am I still subject to sudden death if some unrelated process bloats up ? dennis@gpu.utcs.utoronto.ca (Dennis Ferguson) writes: > Better to explain more exactly why AIX does what it does. It's so vendors > who want to sell crufty old Fortran programs which have no way to do > dynamic memory allocation, can ship binaries with huge static arrays > compiled in for people who want to solve big problems and still have > the same binaries run on small machines to solve small problems. To > implement this you don't allocate backing store until a page is touched, > which means malloc() can't return NULL since it can't, in general, know > if the Fortran program running at the same time is actually going to > use his pages or not. 5) OK IBM'ers. Is this true? If so, does the Fortran run-time support catch SIGDANGER? I write:: > Austin: STAY ON THE NET! You're the only way we'll ever get the > straight dope. (Anyone care to give us the straight dope on this one?) marc@ibmpa.awdpa.ibm.com (Marc Pawliger): > And us Palo Alto folks? And Rochester and Kingston folks? And Research? I didn't know you existed because I rely too much on official channels. Weeeell, OK, you can stay too. --Pierre Asselin, R&D, Applied Magnetics. I speak for me.
martelli@cadlab.sublink.ORG (Alex Martelli) (04/26/91)
mbrown@testsys.austin.ibm.com (Mark Brown) writes:
...
:The Rationale: Rather than panic the machine, we'd like for it to keep
:running as long as possible. Hence, we try to keep running at all costs,
That's fine with me! I just want malloc() to return NULL when there is
no more memory, rather than telling me a lie. Our solid modeler allocates
space dynamically depending on the complexity of the scene that the user
is interactively defining; if/when it detects an out-of-memory condition,
it simply informs the user, who will then have to limit his/her modeling
ambitions, buy more memory, or whatever. We regularly stress-test this
approach on each of the many WSs we support; often as paging space fills
a machine slows down a lot, often the console fills with warning about
page space exhausted, often a system will refuse to run some other process
when this happens (we have printouts of logs with 'Out of memory: cannot
allocate 7 bytes'...!) - but NOWHERE, EVER, did we get a panic. On AIX3,
we catch SIGDANGER, but what can we do about it??? We can't even save
the modeler's state to disk - there is no virtual memory left for the
complex I/O necessary!!! So, do we just die and waste the user's work
for all of a complex modeling session??? If we don't die, the X server
does, so there is no way left to communicate with our application.
Setting process limits so that this won't happen in some "normal situation"
(when the modeler is all alone on the machine) will still make this happen
as soon as any other significant process is alive while the modeling is
done - an xclock that isn't "usually" there can easily be enough! Or an
ftp session from somewhere else. Bah! We have to print a special warning
in our application manual for AIX3, NOT to use it to build models that may
get to be very large or complex, save things often if you do, etc - and we
STILL get to hear from our customers that are bit by this "feature".
Since no other WS seems to feel it necessary to panic - we simply get a
NULL from malloc() at some point and all is peachy - I really DO hope that
IBM will see the light on this - possibly before our large installed base
on IBM 6150's get tired to wait for a 'solid' version of our solid modeler
on some machine with decent performance, and migrate en masse to something
like HP's new 9000/700 machines, which, of course, we also support.
--
Alex Martelli - CAD.LAB s.p.a., v. Stalingrado 53, Bologna, Italia
Email: (work:) martelli@cadlab.sublink.org, (home:) alex@am.sublink.org
Phone: (work:) ++39 (51) 371099, (home:) ++39 (51) 250434;
Fax: ++39 (51) 366964 (work only), Fidonet: 332/401.3 (home only).