rds95@leah.Albany.Edu (Robert Seals) (01/07/89)
How much can a single malloc get? And why does Ultrix have 2 kinds of
malloc, while 4.3-tahoe (the first one, I think) only has 1?
Here's a little thing I wrote to see how big a chunk I could get from a
single malloc...
Script started on Fri Jan 6 10:06:45 1989
14 more days of President Reagan
% cat tst.c
#include <stdio.h>
#include <malloc.h> /* not in 4.3-tahoe */
void main()
{
char *x;
unsigned int i, j;
for (i=10000; (x=malloc(i)) != NULL; i+=10000)
free(x);
i -= 10000;
printf("%d\n", i);
for (j=i; (x=malloc(j)) != NULL; j+=10)
free(x);
printf("%d\n", j - 10);
}
% gcc -O -s -Wall -o tst tst.c
tst.c: In function main:
tst.c:14: warning: implicit declaration of function `printf'
% time tst
4190000
4194300
0.1u 0.3s 0:00 83% 3+19k 0+0io 5pf+0w
% gcc -O -s -Wall -o tst tst.c -lmalloc
tst.c: In function main:
tst.c:14: warning: implicit declaration of function `printf'
% time tst
6620000
6619990
0.2u 1.3s 0:01 89% 8+780k 0+0io 9pf+0w
% ^D
script done on Fri Jan 6 10:08:34 1989
Eh? I thought that there must be some resource limit defined by the system,
and in Ultrix (1.2 and 2.2), there is a "ulimit" call, but 4.3-tahoe
only has "getrlimit", so I wrote a little smidge to find out what my
"RLIMIT_DATA" and "RLIMIT_RSS" were. They were way bigger than the
amount I got from malloc. Are they the right thing to check?
So I thought mebbe a physical memory limit? Nah, why would I (consistently)
get 2 different values depending on the library version I used?
Like, what good is virtual memory if I can't malloc New Jersey? );
AND, why did the 3x version (the one used when -lmalloc is included)
have such extravagant core(?) requirements - 780k vs. 19k for libc?
AND, why isn't there a (old-style) prototype for malloc anywhere in
/usr/include, even though it doesn't return an int? Isn't that the
way it's supposed to go? Like /usr/include/malloc.h...char *malloc();...
Remember, this is the NEOPHYTES group, so keep the fires low.
rob
debra@alice.UUCP (Paul De Bra) (01/07/89)
In article <1418@leah.Albany.Edu> rds95@leah.Albany.Edu (Robert Seals) writes: >How much can a single malloc get? And why does Ultrix have 2 kinds of >malloc, while 4.3-tahoe (the first one, I think) only has 1? > >Here's a little thing I wrote to see how big a chunk I could get from a >single malloc... For every process the kernel has to keep table of the pages of virtual memory used by that process. The size of this table is usually fixed. It means that a process cannot grow beyond that limit, regardless of what any system call for finding out limits will tell you. If your system would be configured with big tables for each process you would hit another limit: your virtual memory, which is determined by the size of your swap space (and possibly also your physical memory, on some Unix versions), again, regardless of other imposed limits. Only if you are not exceeding any of those limits the system will look at system-call-level limits. Paul. -- ------------------------------------------------------ |debra@research.att.com | uunet!research!debra | ------------------------------------------------------
crossgl@ingr.com (Gordon Cross) (01/09/89)
In article <1418@leah.Albany.Edu> rds95@leah.Albany.Edu (Robert Seals) writes: >How much can a single malloc get? And why does Ultrix have 2 kinds of >malloc, while 4.3-tahoe (the first one, I think) only has 1? The difference between malloc(3C) and malloc (3X) is speed: the latter version runs on average about 9 times faster. The algorithm used is optimized for speed at the expense of memory efficiency. If you are going to do LOTS of mallocs, the 3X version is usually preferable. -- Gordon Cross UUCP: uunet!ingr!crossgl "all opinions are 111 Westminister Way INTERNET: crossgl@ingr.com mine and not those Madison, AL 35758 MA BELL: (205) 772-7842 of my employer."
tony@hp-sdd.hp.com (Tony Parkhurst_TEMP) (01/10/89)
In article <1418@leah.Albany.Edu> rds95@leah.Albany.Edu (Robert Seals) writes: >How much can a single malloc get? As much as a bunch of small mallocs added together? No, actually, a single malloc can get as much memory as a process can get less malloc's overhead. Malloc is just a library routine that uses sbrk() to get memory for the process. (Note that sbrk() is NOT in the SVID but is in AT&T System V implementation). >And why does Ultrix have 2 kinds of >malloc, while 4.3-tahoe (the first one, I think) only has 1? Because Ultrix is using the SVID for malloc(), while 4.3-tahoe is bazerkley(BSD) So, the question could be, why does SVID have two definitions for malloc()? Because the default one is for compatibility, and the other (newer) one is for enhanced control. First off, some background.... Kernighan & Ritchie in "The C Programming Language" present a simple version of a malloc() and free() that does the memory compaction (or coalescing) during the free() routine. These are a modified first fit algorithm that was probably taken from Knuth Vol ??. One problem with these routines is that a realloc() would not be too trivial. At some point someone wrote the routines where the compaction of free blocks was done during malloc() instead of free(), along with some other enhancements. This made for an easier implementation of realloc(), and provided some interesting features. So, at least by Version 6 UNIX, there were some interesting points in the manual pages for these routines: 1) "...after 'free' is performed this space is made available for further allocation, but its contents are left undisturbed." Note: I would not write programs that expect this feature, would you? 2) "'Malloc' allocates the first big enough contiguous reach of free space found in a circular search from the last block allocated or freed, coalescing adjacent free blocks as it searches." Note: the "last block allocated" pointer is also the "last block freed" pointer. So either a malloc or a free will change this pointer. and 3) "'Realloc' also works if [the pointer argument] points to a block freed since the last call of 'malloc', 'realloc', or 'calloc'; thus sequences of 'free', 'malloc' and 'realloc' can exploit the search strategy of 'malloc' to do storage compaction." Note that this does not really make logical sense: That is, "a block freed since the last call of 'malloc'", then "sequences of 'free', 'malloc' and 'realloc'" are not compatable statements. It is not clear that a program should have the ability to continue to use memory after it is freed. Also, it is not clear why one would need to do routine storage compaction as in point 3, because the compaction will occur whenever you need more memory. But, this notwithstanding, these points led to a very questionable coding practice, and you will still see code like this: ... free(myblock); /* free memory to be realloced (pt 1) */ free(dummy); /* small block used only for the purpose of moving the "last block" pointer in the malloc routines away from "myblock" which forces a search in realloc (pt 2) */ dummy = malloc(1); /* go ahead and get the dummy back, will be the same block as just freed, see point 2 */ myblock = realloc(myblock, newsize); /* Now, get a new (or same) block after the malloc routine makes a search for a large enough block which will cause coalescing during the search. The returned block may be right after "dummy" in the free block list, so the "amount" of coalescing may be minimal anyway. (pts 2 and 3) */ ... If you don't believe me, look in the source for utilities like 'dc' and 'diff'. (Of course you won't see worthwhile comments). Both of these programs will break if the malloc routines do NOT allow the above practice. Now, this coding practice, and those sections of the manual pages (Point 1 about preserving freed blocks, point 2, which actually makes a statement about the implementation of malloc, and point 3 which suggests a method for taking advantage of a particular implementation of memory management) is terribly non-portible, and writing malloc routines to comply with both that manual page and those code fragments is a pain. What happened to the "UNIX philosophy" here?! So, to implement new versions of malloc(), BSD has one that someone put much effort into preserving the compatibility and has a higher performance (relative since this is in user context anyway), and is more efficient with usage of smaller blocks. It is bucket pool type of implementation. System V on the other hand, has a nice elaborate scheme with lots of control and statistics collecting code, but since it is not compatible, and is big, it is relegated to a separate library instead of the default libc. >Eh? I thought that there must be some resource limit defined by the system, >and in Ultrix (1.2 and 2.2), there is a "ulimit" call, but 4.3-tahoe >only has "getrlimit", so I wrote a little smidge to find out what my >"RLIMIT_DATA" and "RLIMIT_RSS" were. They were way bigger than the >amount I got from malloc. Are they the right thing to check? Did you account for malloc's overhead which is in user space along with the requested memory. RLIMIT_DATA should at least have been close. Of course this will also include all the static variable space. >So I thought mebbe a physical memory limit? Nah, why would I (consistently) >get 2 different values depending on the library version I used? They are different implementations with much different constraints and overhead. >AND, why did the 3x version (the one used when -lmalloc is included) >have such extravagant core(?) requirements - 780k vs. 19k for libc? Because it is came from AT&T which has complexity on the brain. >AND, why isn't there a (old-style) prototype for malloc anywhere in >/usr/include, even though it doesn't return an int? Isn't that the >way it's supposed to go? Like /usr/include/malloc.h...char *malloc();... Usually a header file is provided if there are implementation defined constants: #define M_MXFAST... #define M_NLBLKS... #define M_GRAIN... or structures involved: struct mallinfo... which are needed for the 3x version of malloc(), but the libc version of malloc doesn't have any. >Remember, this is the NEOPHYTES group, so keep the fires low. These are good questions that you asked. Just remember when you are trying to figure these things out that routines in section 3 of the manual are library routines, not system calls, so system limits and constraints do not corelate directly with these routines. Hope this helps. -- Tony P.S. Someone with more knowledge of malloc's history may want to fill in gaps. -- Tony Parkhurst ( tony@hp-sdd.HP.COM )
chris@mimsy.UUCP (Chris Torek) (01/10/89)
In article <1818@hp-sdd.HP.COM> tony@hp-sdd.hp.com (Tony Parkhurst) writes a nice explanation of why there are Too Many Mallocs (with apologies to ... to ... well, I know I stole% the phrase from somewhere!). But it does not quite tell why Robert Seals got the results he did. ----- % `Lesser artists borrow. Great artists steal.' (And I have forgotton who said that, too! This is not my night....) ----- >So, to implement new versions of malloc(), BSD has one that someone put >much effort into preserving the compatibility and has a higher performance >(relative since this is in user context anyway), and is more efficient with >usage of smaller blocks. It is bucket pool type of implementation. The new malloc was originally written by Chris Kingsley, then at Caltech. It is extremely fast, but rather wasteful of space. It has one notable `feature': it never coalesces small blocks, and it never breaks up large blocks. Since it rounds up each request to the nearest power of two, a sequence like: for (i = 10000;; i <<= 1) { if ((p = malloc(i)) == NULL) break; free(p); } winds up allocating one 16384-byte block (for malloc(10000)), one 32768-byte block (for malloc(20000)), one 65536-byte block (40000), one 131072-byte block (80000), and so forth. After these first four allocations, your process has consumed not 80000 bytes of its limit (as shown by the csh `limit' built-in), but rather (32+64+128)*1024 = 229376 bytes. But it has done so very efficiently: only three pages have actually been touched, so only three pages have really been created. The rest is all merely virtual space (swap backing) and PTEs. Various people have submitted modified versions of the Caltech malloc to Berkeley, some of which can scavenge some of the space currently wasted. John Linderman (allegra!jpl) probably did the most thorough of these. For whatever reasons, the changes have never been installed. -- In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 7163) Domain: chris@mimsy.umd.edu Path: uunet!mimsy!chris
chris@mimsy.UUCP (Chris Torek) (01/10/89)
In article <3405@ingr.com> crossgl@ingr.com (Gordon Cross) writes: >The difference between malloc(3C) and malloc (3X) is speed: the latter version >runs on average about 9 times faster. ... Not so (see Keywords). Malloc(3c) is slower only in SysV environments. (In other words, Gordon is right, but not in the context under discussion.) -- In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 7163) Domain: chris@mimsy.umd.edu Path: uunet!mimsy!chris
mangler@cit-vax.Caltech.Edu (Don Speck) (01/28/89)
In article <15365@mimsy.UUCP>, chris@mimsy.UUCP (Chris Torek) writes: > The new malloc was originally written by Chris Kingsley, then at Caltech. > It is extremely fast, but rather wasteful of space. He wrote it in 1982 for a VLSI layout langauge interpreter which allocated hundreds of thousands of objects with an average size of 32 bytes with about 2/3 as many free's as malloc's, growing at a rate of typically a megabyte per minute. No attempt was made to leave recently-freed blocks intact. It was really quite simple. When I diff'd the version we had against the 4.2 BSD one, it was clear that major additions had been made to support the storage compaction semantics documented in malloc(3). I recall a scathing critique of dead code, spurious comments, etc. in Unix-Wizards years ago, none of which was Chris's fault. This all fell in my lap, and as our chips grew to need more than 16 megabytes, I had to switch to something less wasteful of memory. From University of Washington came a malloc that beat Chris's allocator in CPU time, memory, and page faults. Here's the identifying comments. "whois" says that the author's current address is almes@rice.edu. Don't make me feel bad by asking me for the code. Try asking the author. /* An implementation of malloc(3), free(3) using the QuickFit method. * Guy Almes, May 1983. * * Cf. MALLOC(3) in the Unix manual for external specifications. * * Cf. Chuck Weinstock's PhD thesis for a discussion of the techniques * used. (Charles Weinstock, Dynamic Storage Allocation Techniques, * April 1976, CMU)
steinar@fdmetd.uucp (Steinar Overbeck Cook) (03/14/89)
We have an application which uses dynamic SQL against an Oracle database. After extracting approx 480 records, the application core dumps because of SIGSEGV (Segment violation). The stack trace in sdb comes out thus: malloc: address 0x18a54 *malloc(1032,199,147564) _findbuf(147564,147564,14679000) _filbuf(147564,0,102) fgets(14679000,200,147564) upiref(157555,1403,1403) [upiexn.c] upigem(157280,157544,157280) sqlgem(1403,1403,14679284) sqlret(0,14679332,3818) sqlfch(151460,151464,171892) FETCHKol() [DYNSEL.c:852] main(argc=4,argv=0xdffd58,14679404) [DYNSEL.c:595] * Does anybody have a clue as to what is wrong ? My theory is that Oracle overwrites the internal tables of malloc(3c). -- Steinar Overbeck Cook, Fellesdata a.s, P.O. Box 248, 0212 OSLO 2, NORWAY Phone : +47 2 52 80 80 Fax : +47 2 52 85 10 E-mail : ...!mcvax!ndosl!fdmetd!steinar or steinar@fdmetd.uucp <The opinions expressed, if any, do not represent Fellesdata a.s>
gregg@ihlpb.ATT.COM (Wonderly) (03/17/89)
From article <364@fdmetd.uucp>, by steinar@fdmetd.uucp (Steinar Overbeck Cook): > We have an application which uses dynamic SQL against an Oracle > database. After extracting approx 480 records, the application core > dumps because of SIGSEGV (Segment violation). > > The stack trace in sdb comes out thus: While the trace is useful, the registers and the offending instruction are usually a little more enlightening to me. Try the 'x' command too and look for a register being used as a pointer with a value in hex that looks like part of a string as in 0x43546546. -- Gregg Wonderly DOMAIN: gregg@ihlpb.att.com AT&T Bell Laboratories UUCP: att!ihlpb!gregg
refson@castle.ed.ac.uk (Keith Refson) (08/24/90)
Does anybody have or know of a FAST replacement for malloc() which runs on Crays under UNICOS? I have a program in which memory allocation appears to take an inordinate proportion of the execution time, and need to speed it up. I would appreciate any advice or hints from anybody with experience in memory allocation under UNICOS. Thanks in advance. ---------------------------------------------------------------------------- | Keith Refson |ROYAL MAIL: | |-----------------------------------------| Department of Earth Sciences | | JANET : keith@uk.ac.ox.earth | Parks Road | | INTERNET: keith@earth.ox.ac.uk | Oxford OX1 3PR | | BITNET : keith%uk.ac.ox.earth@ukacrl | UK | | UUCP : K.Refson@ed.uucp |PHONE : +44 865 272026/272016 | | : keith%uk.ac.ox.earth@ukc.uucp |FAX : +44 865 272072 | ----------------------------------------------------------------------------