rh@smds.UUCP (Richard Harter) (11/15/90)
I have been asked to post a summary of pros and cons of using the storage allocation package I recently posted versus using malloc/free directly. In the following text G/R (getsp/retsp) is the posted allocator [which is actually a wrapper around malloc/free] and M/F is malloc/free. Performance: Speed: Timing studies on a SUN 3 OS 3.4 gave roughly equivalent times for G/R and M/F (getsp was faster than malloc and retsp was slower than free.) Performance on your machine may be quite different. Storage utilization efficiency: G/R has a fixed overhead of 28 bytes/block. Overhead for M/F depends on the implementation; 8 bytes per block is representative. Efficiency of storage utilization (ratio of storage allocated to total storage required) depends on the algorithm and the pattern of allocation block sizes requested. Knuth cites the 2/3 rule (equilibrium is two allocated blocks versus one free block) for allocators which immediately merge returned blocks. However his calcuation ignores the possibility of an exact fit. Some years ago I did a study of efficiency of allocation algorithms using the assumptions of (a) random request sizes and (b) total space large compared to request sizes. Under these conditions I found that best fit was indeed best and that storage efficiency was upwards of 90%. Under the same conditions a binary buddy system has a storage efficiency of 67% (many malloc/free implementations use this algorithm.) However the efficiency of the buddy system is strongly dependent on the request size distribution -- it can vary from 50% to 95%. Summary -- it's probably a wash unless almost all requests are for very small blocks. Security and Error Checking: This is the reason for using G/R, if it matters to you. Specifically the features are: (A) All invalid size requests (zero, negative, too large) are trapped. (B) All invalid 'free's are trapped. (C) Border overwrites are trapped. (D) Requests are identified so that storage allocation problems (bad size, overwrites, illegal returns) can be immediately tracked down to the offending statement. (E) You can print a storage allocation map with a time tag or equivalent for each allocated block. The map can sorted to identify memory leaks or unusual allocation patterns. (F) Control information and allocated memory are physically separated, reducing the chance that control information is compromised by programming errors. These features are particularly useful in large programs which make heavy use of allocation and deallocation. The utility of these features probably depends strongly on the kinds of programs that you write and on your preferred debugging style. Along the line of some other threads, I customarily use a suite of pre packaged software instrumentation that I embed in all major applications. My experience is that this simplifies software development and converts many errors which would otherwise be troublesome into trivial problems. This may be a matter of what works for one person doesn't work for another. Conclusion: Use it if you want the features. -- Richard Harter, Software Maintenance and Development Systems, Inc. Net address: jjmhome!smds!rh Phone: 508-369-7398 US Mail: SMDS Inc., PO Box 555, Concord MA 01742 This sentence no verb. This sentence short. This signature done.
jimp@cognos.UUCP (Jim Patterson) (11/16/90)
In article <241@smds.UUCP> rh@smds.UUCP (Richard Harter) writes: >Security and Error Checking: > >This is the reason for using G/R, if it matters to you. Specifically >the features are: > >(A) All invalid size requests (zero, negative, too large) are trapped. ^^^^ Whether a 0 size request is invalid is a matter of interpretation. Note that ANSI C specifically allows it; if you disallow it, then getsp/remsp aren't really equivalent to malloc/free. There are often times when a 0-byte request is legitimate. Usually this comes up in logic that looks like this: Count the number of (some thing) Allocate memory for that many struct's to describe those things (where it's legitimate for there to be 0 or more things). As long as you only look at entries which you've counted and know are there, the code is quite valid since it won't look at the pointer when the count is 0. We in fact have a wrapper around malloc/free that does much the same things as yours, and it too disallows 0 size requests. However, in just about every case I can recall where it complained of a 0-byte request, the code was actually not broken, it just hadn't considered 0 to be a special case. So, this check isn't really a "good thing" IMHO. -- Jim Patterson Cognos Incorporated UUCP:uunet!mitel!cunews!cognos!jimp P.O. BOX 9707 PHONE:(613)738-1440 3755 Riverside Drive NOT a Jays fan (not even a fan) Ottawa, Ont K1G 3Z4
kpv@ulysses.att.com (Phong Vo[drew]) (11/16/90)
In article <241@smds.UUCP>, rh@smds.UUCP (Richard Harter) writes:
:...
: Storage utilization efficiency: G/R has a fixed overhead of 28 bytes/block.
: Overhead for M/F depends on the implementation; 8 bytes per block is
: representative. Efficiency of storage utilization (ratio of storage
: allocated to total storage required) depends on the algorithm and the
: pattern of allocation block sizes requested. Knuth cites the 2/3 rule
: (equilibrium is two allocated blocks versus one free block) for allocators
: which immediately merge returned blocks. However his calcuation ignores
: the possibility of an exact fit. Some years ago I did a study of
: efficiency of allocation algorithms using the assumptions of (a) random
: request sizes and (b) total space large compared to request sizes.
: Under these conditions I found that best fit was indeed best and that
: storage efficiency was upwards of 90%. Under the same conditions a
: binary buddy system has a storage efficiency of 67% (many malloc/free
: implementations use this algorithm.) However the efficiency of the
: buddy system is strongly dependent on the request size distribution
: -- it can vary from 50% to 95%.
:
There is a paper that Dave Korn and I presented in the Summer '85 USENIX conference,
In Search of a Better Malloc, that gave lots of data on different allocation
strategies in actual implementations. I just want to point out further that
any allocation strategy can be modified with a roving pointer (i.e., allocate
from the last freed block if possible). I've implemented the basic best-fit strategy
with this enhancement using a splay tree for the free blocks. This malloc has the space
efficiency of best-fit and, "in practice", performs about as good as BSD malloc
which is a buddy system. In fact, for graphical programs (f.e., X programs)
where malloc fragmentation can cause page thrashing, the programs do better with
the new malloc than with BSD malloc even though it costs a tiny bit more for
allocation. This malloc is now standardly distributed with SysVr4.
: Summary -- it's probably a wash unless almost all requests are for very
: small blocks.
:
But this is not trivial since allocation of small blocks is a good part of
significant programs. For example, compilers tend to do lots of small mallocs
for identifiers, symbols, etc.
: Security and Error Checking:
: ... list of nice features ...
:
I had the same features in my malloc. These features can be turned on by
a compile flag. They are invaluable for debugging memory faults.
Phong Vo, att!ulysses!kpv
rh@smds.UUCP (Richard Harter) (11/16/90)
In article <14016@ulysses.att.com>, kpv@ulysses.att.com (Phong Vo[drew]) writes: > In article <241@smds.UUCP>, rh@smds.UUCP (Richard Harter) writes: [... I don't need to repeat it; I've already posted it once.] : There is a paper that Dave Korn and I presented in the Summer '85 USENIX : conference, In Search of a Better Malloc, that gave lots of data on : different allocation strategies in actual implementations. I have heard good things about this paper -- unfortunately I don't have a copy. : I just want to point out further that any allocation strategy can be : modified with a roving pointer (i.e., allocate from the last freed block : if possible). I've implemented the basic best-fit strategy with this : enhancement using a splay tree for the free blocks. This malloc has the space : efficiency of best-fit and, "in practice", performs about as good as BSD : malloc which is a buddy system. In fact, for graphical programs (f.e., : X programs) where malloc fragmentation can cause page thrashing, the : programs do better with the new malloc than with BSD malloc even though : it costs a tiny bit more for allocation. This is a very good point. I haven't studied the matter but I suspect that the allocator I posted should be fairly good in this regard because each there is a lifo list for each size so that either exact fit or split comes from the most recently returned. One of the things that I have thought about but have never found the time to implement (There is more to life than dynamic storage allocators) is a strategy of dealing with the very common situation of allocating sundry blocks during a routine and then deallocating them at the end (alloca, if you will). : This malloc is now standardly distributed with SysVr4. Hurrah! This brings up something that I didn't mention in the posting. If you are running on a wide variety of platforms it is very useful to have reliable portable utilities, either as wrappers or as replacements. While it is true that many platforms have excellent utilities (quite often subtly incompatible) it is also true that there are some real lemons out there. > : Summary -- it's probably a wash unless almost all requests are for very > : small blocks. > But this is not trivial since allocation of small blocks is a good part of > significant programs. For example, compilers tend to do lots of small mallocs > for identifiers, symbols, etc. Well this is clearly an application dependent matter. The impact of the overhead cost depends on the average block size. However the "right" approach is to have separate pools for the short blocks with bit maps for "free lists". > : Security and Error Checking: > : ... list of nice features ... > I had the same features in my malloc. These features can be turned on by > a compile flag. They are invaluable for debugging memory faults. Thank you for the kind words. I sort of take the view that that error checking should be turned on all of the time. My view is that if an error crops up in production software then I want to know about it immediately and I want all of the information I can get. -- Richard Harter, Software Maintenance and Development Systems, Inc. Net address: jjmhome!smds!rh Phone: 508-369-7398 US Mail: SMDS Inc., PO Box 555, Concord MA 01742 This sentence no verb. This sentence short. This signature done.
rh@smds.UUCP (Richard Harter) (11/17/90)
In article <9052@cognos.UUCP>, jimp@cognos.UUCP (Jim Patterson) writes: : In article <241@smds.UUCP> rh@smds.UUCP (Richard Harter) writes: : >(A) All invalid size requests (zero, negative, too large) are trapped. : ^^^^ : Whether a 0 size request is invalid is a matter of interpretation. : Note that ANSI C specifically allows it; if you disallow it, then : getsp/retsp aren't really equivalent to malloc/free. Quite true, but then I never claimed it was. However if you want that behaviour, change the test. : There are often times when a 0-byte request is legitimate. Usually this : comes up in logic that looks like this: : Count the number of (some thing) : Allocate memory for that many struct's to describe those things : (where it's legitimate for there to be 0 or more things). : As long as you only look at entries which you've counted and know are : there, the code is quite valid since it won't look at the pointer when : the count is 0. I suspect that in this kind of case one is often implicitly protected by a for loop that executes zero times. One is also protected most of the time by the fact that the allocated space (of what size??) will hold variables of all types except structs. This protects you against algorithms that explicitly set location 0 of an array outside of a loop. : We in fact have a wrapper around malloc/free that does much the same : things as yours, and it too disallows 0 size requests. However, in : just about every case I can recall where it complained of a 0-byte : request, the code was actually not broken, it just hadn't considered 0 : to be a special case. So, this check isn't really a "good thing" IMHO. Probably differences in coding techniques. My experience is that 0 size requests are very rare and, when they happen, there was a real live bug (usually a typo) in the code. However I check things like 0 length as a matter of course (ingrained habit, not virtue). -- Richard Harter, Software Maintenance and Development Systems, Inc. Net address: jjmhome!smds!rh Phone: 508-369-7398 US Mail: SMDS Inc., PO Box 555, Concord MA 01742 This sentence no verb. This sentence short. This signature done.
njk@diku.dk (Niels J|rgen Kruse) (11/22/90)
rh@smds.UUCP (Richard Harter) writes: >Security and Error Checking: >This is the reason for using G/R, if it matters to you. Specifically >the features are: >(A) All invalid size requests (zero, negative, too large) are trapped. ---------------------------------------------------^^^^^^^^^ Try ``getsp (2147483646,0)''. ;-) Another bug: The return value of malloc in GETNODES is never checked, so if malloc returns NULL, getsp dump core quite ungracefully. BTW, Thanks for posting your Storage allocation package, it makes what you have previously told about it a lot clearer. PS: You call it a Best Fit allocator. This is only true for sizes smaller than 1Kb, for larger blocks the selection strategy is LIFO. This means that the memory utilization achieved is quite a lot lower than that of a true Best Fit storage allocator. PPS: If you change the #define for BKSZ to ``( 65472 & (unsigned)-1/2 )'', the code will also work on a Xenix286 system in large memory model. -- Niels J|rgen Kruse DIKU Graduate njk@diku.dk
rh@smds.UUCP (Richard Harter) (11/24/90)
In article <1990Nov21.212147.20783@diku.dk>, njk@diku.dk (Niels J|rgen Kruse) writes: > rh@smds.UUCP (Richard Harter) writes: > >(A) All invalid size requests (zero, negative, too large) are trapped. > ---------------------------------------------------^^^^^^^^^ > Try ``getsp (2147483646,0)''. ;-) ??? What machine is this on? It worked fine on the a couple of tests. > Another bug: The return value of malloc in GETNODES is never > checked, so if malloc returns NULL, getsp dump core quite > ungracefully. Quite right, thanks for pointing it out -- not that it is what one would call a high probability bug. > PS: You call it a Best Fit allocator. This is only true for > sizes smaller than 1Kb, for larger blocks the selection > strategy is LIFO. This means that the memory utilization > achieved is quite a lot lower than that of a true Best Fit > storage allocator. Color me lazy. I never bothered with coding up a fast big block best fit component. For that matter I'm not sure that it is warranted. If the block request sizes are "large" relative to the size of managed memory the best fit is not much better than lifo. Best fit gains in two ways -- the percentage of fragments is smaller because of exact fits and the fragment size distribution is skewed to favor very small or very large blocks. Both advantages are much less when the request sizes are "large". > PPS: If you change the #define for BKSZ to > ``( 65472 & (unsigned)-1/2 )'', the code will also work on a > Xenix286 system in large memory model. Cute hack. Actually I never work in Xenix286 -- God has been merciful. PS: If anyone has a better allocator and wants to post it (or make it available vit FTP) I will be glad to pick it up and try it out. -- Richard Harter, Software Maintenance and Development Systems, Inc. Net address: jjmhome!smds!rh Phone: 508-369-7398 US Mail: SMDS Inc., PO Box 555, Concord MA 01742 This sentence no verb. This sentence short. This signature done.
njk@diku.dk (Niels J|rgen Kruse) (11/26/90)
rh@smds.UUCP (Richard Harter) writes:
rh >(A) All invalid size requests (zero, negative, too large) are trapped.
njk>--------------------------------------------------^^^^^^^^^
njk> Try ``getsp (2147483646,0)''. ;-)
rh > ??? What machine is this on? It worked fine on the
rh > a couple of tests.
Sorry, i should have given more details. It was a Vax-11/785
running MORE/Bsd 4.3, compiling with cc -g. You must have used
a machine that zero-fill on signed right shift. Here is the
relevant part of a gdb session:
getsp (size=2147483646, reqno=0) (getsp.c line 150)
150 if (size<=0) { /* Bad size request */
(gdb)
155 size += 4; /* Allow for check words */
(gdb)
156 if (ninit) initsp(); /* Initialize if needful */
(gdb) p size
$1 = -2147483646
(gdb) s
(...)
(gdb)
165 rx=(size-1)>>NSH; /* Get requested index */
(gdb)
166 rsz=(1+rx)<<NSH; /* Get rounded request size */
(gdb) p rx
$2 = -268435456
(gdb) s
(...)
(gdb)
170 if (rx>=MXSZ) fx=MXSZ; /* Big request, set fx for big */
(gdb)
172 if (px[rx]) fx=rx; /* Exact fit exists */
(gdb)
Program received signal 11, Segmentation fault
Authors of dynamic storage allocators almost never check for
overflow, when rounding up requests. This is the first thing i
look for, when i come across a new one.
rh > Best fit gains
rh > in two ways -- the percentage of fragments is smaller
rh > because of exact fits and the fragment size distribution
rh > is skewed to favor very small or very large blocks.
rh > Both advantages are much less when the request sizes
rh > are "large".
This depends very much on the shape of the request size
distribution. Consider what happens when you have request
sizes, that are very large and very rare. With LIFO, the
largest free blocks tend to get nibbled at by smaller
requests, so when the very large request finally comes
along, no block is large enough and you have to grow the
heap. Best Fit on the other hand, tend to preserve the
largest free blocks.
--
Niels J|rgen Kruse DIKU Graduate njk@diku.dk
rh@smds.UUCP (Richard Harter) (11/26/90)
In article <1990Nov25.162505.3445@diku.dk>, njk@diku.dk (Niels J|rgen Kruse) writes: njk rh@smds.UUCP (Richard Harter) writes: njk rh >(A) All invalid size requests (zero, negative, too large) are trapped. njk njk>--------------------------------------------------^^^^^^^^^ njk njk> Try ``getsp (2147483646,0)''. ;-) njk rh > ??? What machine is this on? It worked fine on the njk rh > a couple of tests. njk Sorry, i should have given more details. It was a Vax-11/785 njk running MORE/Bsd 4.3, compiling with cc -g. You must have used njk a machine that zero-fill on signed right shift... Detailed debug printout deleted. The essence of the matter is that the request size is 2**31-2. The code checks for <=0 first, then adds 4, and then shifts right 3. With the requested size there is overflow. With 1-fill the size remains negative. Nailed me to the wall, he did. njk Authors of dynamic storage allocators almost never check for njk overflow, when rounding up requests. This is the first thing i njk look for, when i come across a new one. I can't imagine why. :-) Given that the allocator argument is integer the best fix is probably to check size for negative a second time after the round up. [There are sundry theological reasons why I think that the argument should be signed which we needn't go into.] njk rh > Both advantages are much less when the request sizes njk rh > are "large". [re lifo vs best-fit] njk This depends very much on the shape of the request size njk distribution. Consider what happens when you have request njk sizes, that are very large and very rare. With LIFO, the njk largest free blocks tend to get nibbled at by smaller njk requests, so when the very large request finally comes njk along, no block is large enough and you have to grow the njk heap. Best Fit on the other hand, tend to preserve the njk largest free blocks. Can't argue with you there. This was an engineering trade-off decision on my part. It didn't seem worth it to me at the time I wrote the package to strive for that last bit of space efficiency. For that matter I wan't overwhelmingly concerned with space efficiency per se -- witness the decision to go with separate nodes. I wanted time efficiency (I tend to make heavy use of allocation) with good error checking. I tend to be of the school that believes that fixed table sizes are a sin against nature and an abomination unto the Lord. I am offended when I get messages like "arg list too long". Call me a crufty bigoted old curmudgeon, if you like, but that's my view. But if you do take that view you end doing a lot of allocation and deallocation which, unfortunately, is a veritable hotbed of mystery bugs. Whence my predilection for error checking. What I need is an OS with the commands fix$typos fix$syntax fix$logic Ah well, they tell me that VMS is the OS for people who are doing real production software. Maybe I can find them there. :-) -- Richard Harter, Software Maintenance and Development Systems, Inc. Net address: jjmhome!smds!rh Phone: 508-369-7398 US Mail: SMDS Inc., PO Box 555, Concord MA 01742 This sentence no verb. This sentence short. This signature done.