[comp.lang.c] Dynamic Storage Allocator Pros and Cons

rh@smds.UUCP (Richard Harter) (11/15/90)

I have been asked to post a summary of pros and cons of using the
storage allocation package I recently posted versus using malloc/free
directly.  In the following text G/R (getsp/retsp) is the posted
allocator [which is actually a wrapper around malloc/free] and M/F
is malloc/free.

Performance:

Speed:	Timing studies on a SUN 3 OS 3.4 gave roughly equivalent times
for G/R and M/F (getsp was faster than malloc and retsp was slower than
free.)  Performance on your machine may be quite different.

Storage utilization efficiency: G/R has a fixed overhead of 28 bytes/block.
Overhead for M/F depends on the implementation; 8 bytes per block is
representative.  Efficiency of storage utilization (ratio of storage
allocated to total storage required) depends on the algorithm and the
pattern of allocation block sizes requested.  Knuth cites the 2/3 rule
(equilibrium is two allocated blocks versus one free block) for allocators
which immediately merge returned blocks.  However his calcuation ignores
the possibility of an exact fit.  Some years ago I did a study of 
efficiency of allocation algorithms using the assumptions of (a) random
request sizes and (b) total space large compared to request sizes.
Under these conditions I found that best fit was indeed best and that
storage efficiency was upwards of 90%.  Under the same conditions a
binary buddy system has a storage efficiency of 67% (many malloc/free
implementations use this algorithm.)  However the efficiency of the
buddy system is strongly dependent on the request size distribution
-- it can vary from 50% to 95%.

Summary -- it's probably a wash unless almost all requests are for very
small blocks.

Security and Error Checking:

This is the reason for using G/R, if it matters to you.  Specifically
the features are:

(A)	All invalid size requests (zero, negative, too large) are trapped.

(B)	All invalid 'free's are trapped.

(C)	Border overwrites are trapped.

(D)	Requests are identified so that storage allocation problems
	(bad size, overwrites, illegal returns) can be immediately
	tracked down to the offending statement.

(E)	You can print a storage allocation map with a time tag or
	equivalent for each allocated block.  The map can sorted to
	identify memory leaks or unusual allocation patterns.

(F)	Control information and allocated memory are physically
	separated, reducing the chance that control information is
	compromised by programming errors.

These features are particularly useful in large programs which make
heavy use of allocation and deallocation.  The utility of these features
probably depends strongly on the kinds of programs that you write and
on your preferred debugging style.

Along the line of some other threads, I customarily use a suite of 
pre packaged software instrumentation that I embed in all major
applications.  My experience is that this simplifies software development
and converts many errors which would otherwise be troublesome into
trivial problems.  This may be a matter of what works for one person
doesn't work for another.

Conclusion:

Use it if you want the features.
-- 
Richard Harter, Software Maintenance and Development Systems, Inc.
Net address: jjmhome!smds!rh Phone: 508-369-7398 
US Mail: SMDS Inc., PO Box 555, Concord MA 01742
This sentence no verb.  This sentence short.  This signature done.

jimp@cognos.UUCP (Jim Patterson) (11/16/90)

In article <241@smds.UUCP> rh@smds.UUCP (Richard Harter) writes:
>Security and Error Checking:
>
>This is the reason for using G/R, if it matters to you.  Specifically
>the features are:
>
>(A)	All invalid size requests (zero, negative, too large) are trapped.
                                   ^^^^

Whether a 0 size request is invalid is a matter of interpretation.
Note that ANSI C specifically allows it; if you disallow it, then
getsp/remsp aren't really equivalent to malloc/free.

There are often times when a 0-byte request is legitimate. Usually this
comes up in logic that looks like this:

    Count the number of (some thing)
    Allocate memory for that many struct's to describe those things

(where it's legitimate for there to be 0 or more things).

As long as you only look at entries which you've counted and know are
there, the code is quite valid since it won't look at the pointer when
the count is 0.

We in fact have a wrapper around malloc/free that does much the same
things as yours, and it too disallows 0 size requests. However, in
just about every case I can recall where it complained of a 0-byte
request, the code was actually not broken, it just hadn't considered 0
to be a special case. So, this check isn't really a "good thing" IMHO.
-- 
Jim Patterson                              Cognos Incorporated
UUCP:uunet!mitel!cunews!cognos!jimp        P.O. BOX 9707    
PHONE:(613)738-1440                        3755 Riverside Drive
NOT a Jays fan (not even a fan)            Ottawa, Ont  K1G 3Z4

kpv@ulysses.att.com (Phong Vo[drew]) (11/16/90)

In article <241@smds.UUCP>, rh@smds.UUCP (Richard Harter) writes:
:... 
: Storage utilization efficiency: G/R has a fixed overhead of 28 bytes/block.
: Overhead for M/F depends on the implementation; 8 bytes per block is
: representative.  Efficiency of storage utilization (ratio of storage
: allocated to total storage required) depends on the algorithm and the
: pattern of allocation block sizes requested.  Knuth cites the 2/3 rule
: (equilibrium is two allocated blocks versus one free block) for allocators
: which immediately merge returned blocks.  However his calcuation ignores
: the possibility of an exact fit.  Some years ago I did a study of 
: efficiency of allocation algorithms using the assumptions of (a) random
: request sizes and (b) total space large compared to request sizes.
: Under these conditions I found that best fit was indeed best and that
: storage efficiency was upwards of 90%.  Under the same conditions a
: binary buddy system has a storage efficiency of 67% (many malloc/free
: implementations use this algorithm.)  However the efficiency of the
: buddy system is strongly dependent on the request size distribution
: -- it can vary from 50% to 95%.
: 
There is a paper that Dave Korn and I presented in the Summer '85 USENIX conference,
In Search of a Better Malloc, that gave lots of data on different allocation
strategies in actual implementations. I just want to point out further that
any allocation strategy can be modified with a roving pointer (i.e., allocate
from the last freed block if possible). I've implemented the basic best-fit strategy
with this enhancement using a splay tree for the free blocks. This malloc has the space
efficiency of best-fit and, "in practice", performs about as good as BSD malloc
which is a buddy system. In fact, for graphical programs (f.e., X programs)
where malloc fragmentation can cause page thrashing, the programs do better with
the new malloc than with BSD malloc even though it costs a tiny bit more for
allocation. This malloc is now standardly distributed with SysVr4.

: Summary -- it's probably a wash unless almost all requests are for very
: small blocks.
: 
But this is not trivial since allocation of small blocks is a good part of
significant programs. For example, compilers tend to do lots of small mallocs
for identifiers, symbols, etc.

: Security and Error Checking:
: ... list of nice features ... 
:
I had the same features in my malloc. These features can be turned on by
a compile flag. They are invaluable for debugging memory faults.

	Phong Vo, att!ulysses!kpv

rh@smds.UUCP (Richard Harter) (11/16/90)

In article <14016@ulysses.att.com>, kpv@ulysses.att.com (Phong Vo[drew]) writes:
> In article <241@smds.UUCP>, rh@smds.UUCP (Richard Harter) writes:

	[... I don't need to repeat it; I've already posted it once.]

: There is a paper that Dave Korn and I presented in the Summer '85 USENIX
: conference, In Search of a Better Malloc, that gave lots of data on
: different allocation strategies in actual implementations.

	I have heard good things about this paper -- unfortunately I
	don't have a copy.

: I just want to point out further that any allocation strategy can be
: modified with a roving pointer (i.e., allocate from the last freed block
: if possible). I've implemented the basic best-fit strategy with this
: enhancement using a splay tree for the free blocks. This malloc has the space
: efficiency of best-fit and, "in practice", performs about as good as BSD
: malloc which is a buddy system. In fact, for graphical programs (f.e.,
: X programs) where malloc fragmentation can cause page thrashing, the
: programs do better with the new malloc than with BSD malloc even though
: it costs a tiny bit more for allocation.

This is a very good point.  I haven't studied the matter but I suspect
that the allocator I posted should be fairly good in this regard because
each there is a lifo list for each size so that either exact fit or split
comes from the most recently returned.

One of the things that I have thought about but have never found the time
to implement (There is more to life than dynamic storage allocators) is a
strategy of dealing with the very common situation of allocating sundry
blocks during a routine and then deallocating them at the end (alloca, if
you will). 

: This malloc is now standardly distributed with SysVr4.

Hurrah!  This brings up something that I didn't mention in the posting.
If you are running on a wide variety of platforms it is very useful to
have reliable portable utilities, either as wrappers or as replacements.
While it is true that many platforms have excellent utilities (quite
often subtly incompatible) it is also true that there are some real
lemons out there.

> : Summary -- it's probably a wash unless almost all requests are for very
> : small blocks.

> But this is not trivial since allocation of small blocks is a good part of
> significant programs. For example, compilers tend to do lots of small mallocs
> for identifiers, symbols, etc.

Well this is clearly an application dependent matter.  The impact of the
overhead cost depends on the average block size.  However the "right"
approach is to have separate pools for the short blocks with bit maps
for "free lists".

> : Security and Error Checking:
> : ... list of nice features ... 

> I had the same features in my malloc. These features can be turned on by
> a compile flag. They are invaluable for debugging memory faults.

	Thank you for the kind words.  I sort of take the view that
that error checking should be turned on all of the time.  My view is that
if an error crops up in production software then I want to know about it
immediately and I want all of the information I can get.
-- 
Richard Harter, Software Maintenance and Development Systems, Inc.
Net address: jjmhome!smds!rh Phone: 508-369-7398 
US Mail: SMDS Inc., PO Box 555, Concord MA 01742
This sentence no verb.  This sentence short.  This signature done.

rh@smds.UUCP (Richard Harter) (11/17/90)

In article <9052@cognos.UUCP>, jimp@cognos.UUCP (Jim Patterson) writes:
: In article <241@smds.UUCP> rh@smds.UUCP (Richard Harter) writes:

: >(A)	All invalid size requests (zero, negative, too large) are trapped.
:                                    ^^^^

: Whether a 0 size request is invalid is a matter of interpretation.
: Note that ANSI C specifically allows it; if you disallow it, then
: getsp/retsp aren't really equivalent to malloc/free.

	Quite true, but then I never claimed it was.  However if you
	want that behaviour, change the test.

: There are often times when a 0-byte request is legitimate. Usually this
: comes up in logic that looks like this:

:     Count the number of (some thing)
:     Allocate memory for that many struct's to describe those things

: (where it's legitimate for there to be 0 or more things).

: As long as you only look at entries which you've counted and know are
: there, the code is quite valid since it won't look at the pointer when
: the count is 0.

I suspect that in this kind of case one is often implicitly
protected by a for loop that executes zero times.  One is also protected
most of the time by the fact that the allocated space (of what size??)
will hold variables of all types except structs.  This protects you
against algorithms that explicitly set location 0 of an array outside
of a loop.

: We in fact have a wrapper around malloc/free that does much the same
: things as yours, and it too disallows 0 size requests. However, in
: just about every case I can recall where it complained of a 0-byte
: request, the code was actually not broken, it just hadn't considered 0
: to be a special case. So, this check isn't really a "good thing" IMHO.

Probably differences in coding techniques.  My experience is that 
0 size requests are very rare and, when they happen, there was a real
live bug (usually a typo) in the code.  However I check things like 0
length as a matter of course (ingrained habit, not virtue).  
-- 
Richard Harter, Software Maintenance and Development Systems, Inc.
Net address: jjmhome!smds!rh Phone: 508-369-7398 
US Mail: SMDS Inc., PO Box 555, Concord MA 01742
This sentence no verb.  This sentence short.  This signature done.

njk@diku.dk (Niels J|rgen Kruse) (11/22/90)

rh@smds.UUCP (Richard Harter) writes:

>Security and Error Checking:
>This is the reason for using G/R, if it matters to you.  Specifically
>the features are:
>(A)    All invalid size requests (zero, negative, too large) are trapped.
---------------------------------------------------^^^^^^^^^
Try ``getsp (2147483646,0)''.   ;-)

Another bug: The return value of malloc in GETNODES is never
checked, so if malloc returns NULL, getsp dump core quite
ungracefully.

BTW, Thanks for posting your Storage allocation package, it
makes what you have previously told about it a lot clearer.

PS: You call it a Best Fit allocator.  This is only true for
sizes smaller than 1Kb, for larger blocks the selection
strategy is LIFO.  This means that the memory utilization
achieved is quite a lot lower than that of a true Best Fit
storage allocator.

PPS: If you change the #define for BKSZ to
``( 65472 & (unsigned)-1/2 )'', the code will also work on a
Xenix286 system in large memory model.
-- 
Niels J|rgen Kruse 	DIKU Graduate 	njk@diku.dk

rh@smds.UUCP (Richard Harter) (11/24/90)

In article <1990Nov21.212147.20783@diku.dk>, njk@diku.dk (Niels J|rgen Kruse) writes:
> rh@smds.UUCP (Richard Harter) writes:

> >(A)    All invalid size requests (zero, negative, too large) are trapped.
> ---------------------------------------------------^^^^^^^^^
> Try ``getsp (2147483646,0)''.   ;-)

	??? What machine is this on?  It worked fine on the
	a couple of tests.

> Another bug: The return value of malloc in GETNODES is never
> checked, so if malloc returns NULL, getsp dump core quite
> ungracefully.

	Quite right, thanks for pointing it out -- not that it
	is what one would call a high probability bug.

	
> PS: You call it a Best Fit allocator.  This is only true for
> sizes smaller than 1Kb, for larger blocks the selection
> strategy is LIFO.  This means that the memory utilization
> achieved is quite a lot lower than that of a true Best Fit
> storage allocator.

	Color me lazy.  I never bothered with coding up a fast
	big block best fit component.  For that matter I'm not
	sure that it is warranted.  If the block request sizes
	are "large" relative to the size of managed memory the
	best fit is not much better than lifo.  Best fit gains
	in two ways -- the percentage of fragments is smaller
	because of exact fits and the fragment size distribution
	is skewed to favor very small or very large blocks.
	Both advantages are much less when the request sizes
	are "large".

> PPS: If you change the #define for BKSZ to
> ``( 65472 & (unsigned)-1/2 )'', the code will also work on a
> Xenix286 system in large memory model.

Cute hack.  Actually I never work in Xenix286 -- God has been
merciful.

PS:  If anyone has a better allocator and wants to post it
(or make it available vit FTP) I will be glad to pick it up
and try it out.  

-- 
Richard Harter, Software Maintenance and Development Systems, Inc.
Net address: jjmhome!smds!rh Phone: 508-369-7398 
US Mail: SMDS Inc., PO Box 555, Concord MA 01742
This sentence no verb.  This sentence short.  This signature done.

njk@diku.dk (Niels J|rgen Kruse) (11/26/90)

rh@smds.UUCP (Richard Harter) writes:

rh >(A)    All invalid size requests (zero, negative, too large) are trapped.
njk>--------------------------------------------------^^^^^^^^^
njk> Try ``getsp (2147483646,0)''.   ;-)
rh >       ??? What machine is this on?  It worked fine on the
rh >       a couple of tests.

Sorry, i should have given more details.  It was a Vax-11/785
running MORE/Bsd 4.3, compiling with cc -g.  You must have used
a machine that zero-fill on signed right shift.  Here is the
relevant part of a gdb session:

getsp (size=2147483646, reqno=0) (getsp.c line 150)
150       if (size<=0) {                /* Bad size request             */
(gdb)
155       size += 4;                    /* Allow for check words        */
(gdb)
156       if (ninit) initsp();          /* Initialize if needful        */
(gdb) p size
$1 = -2147483646
(gdb) s
(...)
(gdb)
165       rx=(size-1)>>NSH;             /* Get requested index          */
(gdb)
166       rsz=(1+rx)<<NSH;              /* Get rounded request size     */
(gdb) p rx
$2 = -268435456
(gdb) s
(...)
(gdb)
170       if (rx>=MXSZ) fx=MXSZ;        /* Big request, set fx for big  */
(gdb)
172         if (px[rx]) fx=rx;          /* Exact fit exists             */
(gdb)
Program received signal 11, Segmentation fault

Authors of dynamic storage allocators almost never check for
overflow, when rounding up requests.  This is the first thing i
look for, when i come across a new one.

rh >                                               Best fit gains
rh >       in two ways -- the percentage of fragments is smaller
rh >       because of exact fits and the fragment size distribution
rh >       is skewed to favor very small or very large blocks.
rh >       Both advantages are much less when the request sizes
rh >       are "large".

This depends very much on the shape of the request size
distribution.  Consider what happens when you have request
sizes, that are very large and very rare.  With LIFO, the
largest free blocks tend to get nibbled at by smaller
requests, so when the very large request finally comes
along, no block is large enough and you have to grow the
heap.  Best Fit on the other hand, tend to preserve the
largest free blocks.
-- 
Niels J|rgen Kruse 	DIKU Graduate 	njk@diku.dk

rh@smds.UUCP (Richard Harter) (11/26/90)

In article <1990Nov25.162505.3445@diku.dk>, njk@diku.dk (Niels J|rgen Kruse) writes:
njk rh@smds.UUCP (Richard Harter) writes:

njk rh >(A)    All invalid size requests (zero, negative, too large) are trapped.
njk njk>--------------------------------------------------^^^^^^^^^
njk njk> Try ``getsp (2147483646,0)''.   ;-)
njk rh >       ??? What machine is this on?  It worked fine on the
njk rh >       a couple of tests.

njk Sorry, i should have given more details.  It was a Vax-11/785
njk running MORE/Bsd 4.3, compiling with cc -g.  You must have used
njk a machine that zero-fill on signed right shift...

	Detailed debug printout deleted.  The essence of the
	matter is that the request size is 2**31-2.  The code
	checks for <=0 first, then adds 4, and then shifts 
	right 3.  With the requested size there is overflow.
	With 1-fill the size remains negative.  Nailed me to
	the wall, he did.

njk Authors of dynamic storage allocators almost never check for
njk overflow, when rounding up requests.  This is the first thing i
njk look for, when i come across a new one.

I can't imagine why. :-)  Given that the allocator argument is integer
the best fix is probably to check size for negative a second time
after the round up.  [There are sundry theological reasons why I
think that the argument should be signed which we needn't go into.]

njk rh >       Both advantages are much less when the request sizes
njk rh >       are "large". [re lifo vs best-fit]

njk This depends very much on the shape of the request size
njk distribution.  Consider what happens when you have request
njk sizes, that are very large and very rare.  With LIFO, the
njk largest free blocks tend to get nibbled at by smaller
njk requests, so when the very large request finally comes
njk along, no block is large enough and you have to grow the
njk heap.  Best Fit on the other hand, tend to preserve the
njk largest free blocks.

Can't argue with you there.  This was an engineering trade-off
decision on my part.  It didn't seem worth it to me at the time
I wrote the package to strive for that last bit of space efficiency.
For that matter I wan't overwhelmingly concerned with space
efficiency per se -- witness the decision to go with separate
nodes.  I wanted time efficiency (I tend to make heavy use of
allocation) with good error checking.

I tend to be of the school that believes that fixed table sizes
are a sin against nature and an abomination unto the Lord.  I am
offended when I get messages like "arg list too long".  Call me
a crufty bigoted old curmudgeon, if you like, but that's my view.

But if you do take that view you end doing a lot of allocation
and deallocation which, unfortunately, is a veritable hotbed
of mystery bugs.  Whence my predilection for error checking.
What I need is an OS with the commands

	fix$typos
	fix$syntax
	fix$logic

Ah well, they tell me that VMS is the OS for people who are
doing real production software.  Maybe I can find them there. :-)
-- 
Richard Harter, Software Maintenance and Development Systems, Inc.
Net address: jjmhome!smds!rh Phone: 508-369-7398 
US Mail: SMDS Inc., PO Box 555, Concord MA 01742
This sentence no verb.  This sentence short.  This signature done.