[comp.unix.questions] Malloc

rds95@leah.Albany.Edu (Robert Seals) (01/07/89)

How much can a single malloc get? And why does Ultrix have 2 kinds of
malloc, while 4.3-tahoe (the first one, I think) only has 1?

Here's a little thing I wrote to see how big a chunk I could get from a
single malloc...

Script started on Fri Jan  6 10:06:45 1989
14 more days of President Reagan
% cat tst.c
#include <stdio.h>
#include <malloc.h>		/* not in 4.3-tahoe */

void main()
{
    char *x;
    unsigned int i, j;


    for (i=10000; (x=malloc(i)) != NULL; i+=10000) 
	free(x);

    i -= 10000;
    printf("%d\n", i);

    for (j=i; (x=malloc(j)) != NULL; j+=10)
	free(x);
    printf("%d\n", j - 10);
}
% gcc -O -s -Wall -o tst tst.c
tst.c: In function main:
tst.c:14: warning: implicit declaration of function `printf'
% time tst
4190000
4194300
0.1u 0.3s 0:00 83% 3+19k 0+0io 5pf+0w
% gcc -O -s -Wall -o tst tst.c -lmalloc
tst.c: In function main:
tst.c:14: warning: implicit declaration of function `printf'
% time tst
6620000
6619990
0.2u 1.3s 0:01 89% 8+780k 0+0io 9pf+0w
% ^D
script done on Fri Jan  6 10:08:34 1989

Eh? I thought that there must be some resource limit defined by the system,
and in Ultrix (1.2 and 2.2), there is a "ulimit" call, but 4.3-tahoe
only has "getrlimit", so I wrote a little smidge to find out what my
"RLIMIT_DATA" and "RLIMIT_RSS" were. They were way bigger than the
amount I got from malloc. Are they the right thing to check?

So I thought mebbe a physical memory limit? Nah, why would I (consistently)
get 2 different values depending on the library version I used?

Like, what good is virtual memory if I can't malloc New Jersey? );

AND, why did the 3x version (the one used when -lmalloc is included)
have such extravagant core(?) requirements - 780k vs. 19k for libc?

AND, why isn't there a (old-style) prototype for malloc anywhere in
/usr/include, even though it doesn't return an int? Isn't that the
way it's supposed to go? Like /usr/include/malloc.h...char *malloc();...

Remember, this is the NEOPHYTES group, so keep the fires low.

rob

debra@alice.UUCP (Paul De Bra) (01/07/89)

In article <1418@leah.Albany.Edu> rds95@leah.Albany.Edu (Robert Seals) writes:
>How much can a single malloc get? And why does Ultrix have 2 kinds of
>malloc, while 4.3-tahoe (the first one, I think) only has 1?
>
>Here's a little thing I wrote to see how big a chunk I could get from a
>single malloc...

For every process the kernel has to keep table of the pages of virtual
memory used by that process. The size of this table is usually fixed.
It means that a process cannot grow beyond that limit, regardless of
what any system call for finding out limits will tell you.

If your system would be configured with big tables for each process you
would hit another limit: your virtual memory, which is determined by the
size of your swap space (and possibly also your physical memory, on some
Unix versions), again, regardless of other imposed limits.

Only if you are not exceeding any of those limits the system will look
at system-call-level limits.

Paul.

-- 
------------------------------------------------------
|debra@research.att.com   | uunet!research!debra     |
------------------------------------------------------

crossgl@ingr.com (Gordon Cross) (01/09/89)

In article <1418@leah.Albany.Edu> rds95@leah.Albany.Edu (Robert Seals) writes:
>How much can a single malloc get? And why does Ultrix have 2 kinds of
>malloc, while 4.3-tahoe (the first one, I think) only has 1?

The difference between malloc(3C) and malloc (3X) is speed:  the latter version
runs on average about 9 times faster.  The algorithm used is optimized for
speed at the expense of memory efficiency.  If you are going to do LOTS of
mallocs, the 3X version is usually preferable.
-- 

Gordon Cross             UUCP:      uunet!ingr!crossgl     "all opinions are
111 Westminister Way     INTERNET:  crossgl@ingr.com        mine and not those
Madison, AL 35758        MA BELL:   (205) 772-7842          of my employer."

tony@hp-sdd.hp.com (Tony Parkhurst_TEMP) (01/10/89)

In article <1418@leah.Albany.Edu> rds95@leah.Albany.Edu (Robert Seals) writes:

>How much can a single malloc get? 

As much as a bunch of small mallocs added together?

No, actually, a single malloc can get as much memory as a process can get
less malloc's overhead.  Malloc is just a library routine that uses
sbrk() to get memory for the process.  (Note that sbrk() is NOT in the SVID
but is in AT&T System V implementation).

>And why does Ultrix have 2 kinds of
>malloc, while 4.3-tahoe (the first one, I think) only has 1?

Because Ultrix is using the SVID for malloc(), while 4.3-tahoe is bazerkley(BSD)

So, the question could be, why does SVID have two definitions for malloc()?

Because the default one is for compatibility, and the other (newer) one is
for enhanced control.

First off, some background....

   Kernighan & Ritchie in "The C Programming Language" present a simple version
of a malloc() and free() that does the memory compaction (or coalescing)
during the free() routine.  These are a modified first fit algorithm that was
probably taken from Knuth Vol ??.  One problem with these routines is that
a realloc() would not be too trivial.  At some point someone wrote the 
routines where the compaction of free blocks was done during malloc() instead 
of free(), along with some other enhancements.  This made for an easier 
implementation of realloc(), and provided some interesting features.  So, at 
least by Version 6 UNIX, there were some interesting points in the manual 
pages for these routines:

1)  "...after 'free' is performed this space is made available for further
    allocation, but its contents are left undisturbed."

    Note:  I would not write programs that expect this feature, would you?

2)  "'Malloc' allocates the first big enough contiguous reach of free space
    found in a circular search from the last block allocated or freed,
    coalescing adjacent free blocks as it searches."

    Note:  the "last block allocated" pointer is also the "last block freed"
    pointer.  So either a malloc or a free will change this pointer.

and
3)  "'Realloc' also works if [the pointer argument] points to a block freed
    since the last call of 'malloc', 'realloc', or 'calloc'; thus sequences
    of 'free', 'malloc' and 'realloc' can exploit the search strategy of
    'malloc' to do storage compaction."

    Note that this does not really make logical sense:  That is, "a block
    freed since the last call of 'malloc'", then "sequences of 'free',
    'malloc' and 'realloc'" are not compatable statements.

It is not clear that a program should have the ability to continue to
use memory after it is freed.  Also, it is not clear why one would need to
do routine storage compaction as in point 3, because the compaction will occur
whenever you need more memory.  But, this notwithstanding, these points led to
a very questionable coding practice, and you will still see code like this:

...

	free(myblock);		/* free memory to be realloced (pt 1) */

	free(dummy);		/* small block used only for the purpose of
				   moving the "last block" pointer in the
				   malloc routines away from "myblock" 
				   which forces a search in realloc (pt 2) */

	dummy = malloc(1);	/* go ahead and get the dummy back, will be
				   the same block as just freed, see point 2 */

	myblock = realloc(myblock, newsize);

				/* Now, get a new (or same) block after the
				   malloc routine makes a search for a large
				   enough block which will cause coalescing
				   during the search.  The returned block
				   may be right after "dummy" in the free
				   block list, so the "amount" of coalescing
				   may be minimal anyway.  (pts 2 and 3) */

...

If you don't believe me, look in the source for utilities like 'dc'
and 'diff'.  (Of course you won't see worthwhile comments).  Both of these
programs will break if the malloc routines do NOT allow the above practice.

Now, this coding practice, and those sections of the manual pages

(Point 1 about preserving freed blocks,
 point 2, which actually makes a statement about the implementation of malloc, 
 and point 3 which suggests a method for taking advantage of a particular
 implementation of memory management)

is terribly non-portible, and writing malloc routines to comply with both
that manual page and those code fragments is a pain.  What happened
to the "UNIX philosophy" here?!

So, to implement new versions of malloc(), BSD has one that someone put
much effort into preserving the compatibility and has a higher performance
(relative since this is in user context anyway), and is more efficient with
usage of smaller blocks.  It is bucket pool type of implementation.

System V on the other hand, has a nice elaborate scheme with lots of 
control and statistics collecting code, but since it is not compatible,
and is big, it is relegated to a separate library instead of the default libc.

>Eh? I thought that there must be some resource limit defined by the system,
>and in Ultrix (1.2 and 2.2), there is a "ulimit" call, but 4.3-tahoe
>only has "getrlimit", so I wrote a little smidge to find out what my
>"RLIMIT_DATA" and "RLIMIT_RSS" were. They were way bigger than the
>amount I got from malloc. Are they the right thing to check?

Did you account for malloc's overhead which is in user space along with
the requested memory.  RLIMIT_DATA should at least have been close.  Of
course this will also include all the static variable space.

>So I thought mebbe a physical memory limit? Nah, why would I (consistently)
>get 2 different values depending on the library version I used?

They are different implementations with much different constraints and overhead.

>AND, why did the 3x version (the one used when -lmalloc is included)
>have such extravagant core(?) requirements - 780k vs. 19k for libc?

Because it is came from AT&T which has complexity on the brain.

>AND, why isn't there a (old-style) prototype for malloc anywhere in
>/usr/include, even though it doesn't return an int? Isn't that the
>way it's supposed to go? Like /usr/include/malloc.h...char *malloc();...

Usually a header file is provided if there are implementation defined constants:

#define	M_MXFAST...
#define	M_NLBLKS...
#define M_GRAIN...

or structures involved:

struct mallinfo...

which are needed for the 3x version of malloc(), but the libc version of malloc
doesn't have any.

>Remember, this is the NEOPHYTES group, so keep the fires low.

These are good questions that you asked.  Just remember when you are trying
to figure these things out that routines in section 3 of the manual are
library routines, not system calls, so system limits and constraints do
not corelate directly with these routines.

Hope this helps.

-- Tony

P.S.  Someone with more knowledge of malloc's history may want to fill in gaps.

-- 

Tony Parkhurst	( tony@hp-sdd.HP.COM )

chris@mimsy.UUCP (Chris Torek) (01/10/89)

In article <1818@hp-sdd.HP.COM> tony@hp-sdd.hp.com (Tony Parkhurst)
writes a nice explanation of why there are Too Many Mallocs (with
apologies to ... to ... well, I know I stole% the phrase from somewhere!).
But it does not quite tell why Robert Seals got the results he did.

-----
% `Lesser artists borrow.  Great artists steal.'  (And I have forgotton
who said that, too!  This is not my night....)
-----

>So, to implement new versions of malloc(), BSD has one that someone put
>much effort into preserving the compatibility and has a higher performance
>(relative since this is in user context anyway), and is more efficient with
>usage of smaller blocks.  It is bucket pool type of implementation.

The new malloc was originally written by Chris Kingsley, then at Caltech.
It is extremely fast, but rather wasteful of space.  It has one notable
`feature': it never coalesces small blocks, and it never breaks up
large blocks.  Since it rounds up each request to the nearest power of two,
a sequence like:

	for (i = 10000;; i <<= 1) {
		if ((p = malloc(i)) == NULL)
			break;
		free(p);
	}

winds up allocating one 16384-byte block (for malloc(10000)), one
32768-byte block (for malloc(20000)), one 65536-byte block (40000), one
131072-byte block (80000), and so forth.  After these first four
allocations, your process has consumed not 80000 bytes of its limit (as
shown by the csh `limit' built-in), but rather (32+64+128)*1024 =
229376 bytes.  But it has done so very efficiently: only three
pages have actually been touched, so only three pages have really
been created.  The rest is all merely virtual space (swap backing)
and PTEs.

Various people have submitted modified versions of the Caltech malloc
to Berkeley, some of which can scavenge some of the space currently
wasted.  John Linderman (allegra!jpl) probably did the most thorough
of these.  For whatever reasons, the changes have never been installed.
-- 
In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 7163)
Domain:	chris@mimsy.umd.edu	Path:	uunet!mimsy!chris

chris@mimsy.UUCP (Chris Torek) (01/10/89)

In article <3405@ingr.com> crossgl@ingr.com (Gordon Cross) writes:
>The difference between malloc(3C) and malloc (3X) is speed:  the latter version
>runs on average about 9 times faster. ...

Not so (see Keywords).  Malloc(3c) is slower only in SysV environments.

(In other words, Gordon is right, but not in the context under discussion.)
-- 
In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 7163)
Domain:	chris@mimsy.umd.edu	Path:	uunet!mimsy!chris

mangler@cit-vax.Caltech.Edu (Don Speck) (01/28/89)

In article <15365@mimsy.UUCP>, chris@mimsy.UUCP (Chris Torek) writes:
> The new malloc was originally written by Chris Kingsley, then at Caltech.
> It is extremely fast, but rather wasteful of space.

He wrote it in 1982 for a VLSI layout langauge interpreter which
allocated hundreds of thousands of objects with an average size of
32 bytes with about 2/3 as many free's as malloc's, growing at a
rate of typically a megabyte per minute.  No attempt was made to
leave recently-freed blocks intact.  It was really quite simple.

When I diff'd the version we had against the 4.2 BSD one, it was
clear that major additions had been made to support the storage
compaction semantics documented in malloc(3).  I recall a scathing
critique of dead code, spurious comments, etc. in Unix-Wizards
years ago, none of which was Chris's fault.

This all fell in my lap, and as our chips grew to need more than
16 megabytes, I had to switch to something less wasteful of memory.
From University of Washington came a malloc that beat Chris's allocator
in CPU time, memory, and page faults.  Here's the identifying comments.
"whois" says that the author's current address is almes@rice.edu.  Don't
make me feel bad by asking me for the code.  Try asking the author.

/*  An implementation of malloc(3), free(3) using the QuickFit method.
 *  Guy Almes, May 1983.
 *
 *  Cf. MALLOC(3) in the Unix manual for external specifications.
 *
 *  Cf. Chuck Weinstock's PhD thesis for a discussion of the techniques
 *  used.  (Charles Weinstock, Dynamic Storage Allocation Techniques,
 *  April 1976, CMU)

steinar@fdmetd.uucp (Steinar Overbeck Cook) (03/14/89)

We have an application which uses dynamic SQL against an Oracle
database. After extracting approx 480 records, the application core
dumps because of SIGSEGV (Segment violation).

The stack trace in sdb comes out thus:

	malloc: address 0x18a54
	*malloc(1032,199,147564)
	_findbuf(147564,147564,14679000)
	_filbuf(147564,0,102)
	fgets(14679000,200,147564)
	upiref(157555,1403,1403)   [upiexn.c]
	upigem(157280,157544,157280)
	sqlgem(1403,1403,14679284)
	sqlret(0,14679332,3818)
	sqlfch(151460,151464,171892)
	FETCHKol()   [DYNSEL.c:852]
	main(argc=4,argv=0xdffd58,14679404)   [DYNSEL.c:595]
	*

Does anybody have a clue as to what is wrong ? My theory is that Oracle
overwrites the internal tables of malloc(3c).

-- 
Steinar Overbeck Cook, Fellesdata a.s, P.O. Box 248, 0212 OSLO 2, NORWAY
Phone : +47 2 52 80 80                            Fax   : +47 2 52 85 10
E-mail : ...!mcvax!ndosl!fdmetd!steinar  or       steinar@fdmetd.uucp
<The opinions expressed, if any, do not represent Fellesdata a.s>

gregg@ihlpb.ATT.COM (Wonderly) (03/17/89)

From article <364@fdmetd.uucp>, by steinar@fdmetd.uucp (Steinar Overbeck Cook):
> We have an application which uses dynamic SQL against an Oracle
> database. After extracting approx 480 records, the application core
> dumps because of SIGSEGV (Segment violation).
> 
> The stack trace in sdb comes out thus:

While the trace is useful, the registers and the offending instruction
are usually a little more enlightening to me.  Try the 'x' command too
and look for a register being used as a pointer with a value in hex that
looks like part of a string as in 0x43546546.

-- 
Gregg Wonderly                             DOMAIN: gregg@ihlpb.att.com
AT&T Bell Laboratories                     UUCP:   att!ihlpb!gregg

refson@castle.ed.ac.uk (Keith Refson) (08/24/90)

Does anybody have or know of a FAST replacement for malloc() which
runs on Crays under UNICOS?  I have a program in which memory
allocation appears to take an inordinate proportion of the execution
time, and need to speed it up.   I would appreciate any advice or
hints from anybody with experience in memory allocation under UNICOS.

Thanks in advance.
----------------------------------------------------------------------------
|	Keith   Refson			  |ROYAL MAIL:                     |
|-----------------------------------------|   Department of Earth Sciences |
| JANET   : keith@uk.ac.ox.earth          |   Parks Road                   |
| INTERNET: keith@earth.ox.ac.uk          |   Oxford OX1 3PR               |
| BITNET  : keith%uk.ac.ox.earth@ukacrl   |   UK                           |
| UUCP    : K.Refson@ed.uucp              |PHONE   : +44 865 272026/272016 |
|         : keith%uk.ac.ox.earth@ukc.uucp |FAX     : +44 865 272072        |
----------------------------------------------------------------------------