[net.unix-wizards] In Search Of... fast malloc

mp@mit-eddie.UUCP (Mark Plotnick) (08/19/83)
The man page for malloc states that it behaves poorly in a large virtual
environment.  For an application of mine, I need to allocate lots of
little pieces, so I decided to do a little benchmarking.  I looked at
(1) the malloc and free in the standard 4bsd C library
(2) the "malloc.c.new" that came with 4.1c.  Does anybody know if
    this is bug-free?
(3) the malloc/free package from Caltech that utah-gr!thomas distributed
    last year.

The benchmark program was very simple; it just allocated n blocks of m
bytes each, and then printed out the virtual memory statistics (pirated
from csh/sh.time.c).  It will optionally allocate, and immediately free,
n*m bytes upon startup; this greatly reduces the number of sbrk() system
calls.

Here are the result for 20000 blocks of 20 bytes each.  All tests were
done on a lightly-loaded 750 running 4.1bsd with 2meg of memory and a
single rm05.  (each version was run twice; the second time has the giant
free(malloc()) at the beginning).

testing 20000 20:
normal 4bsd
  101.5u 3.3s 2:12 79% 4ktext+312kdata 476kmax 0reads 0writes 0pf 0swaps
  33.0u 1.5s 0:40 86% 4ktext+412kdata 476kmax 0reads 0writes 0pf 0swaps
4.1c
  3.8u 1.5s 0:08 66% 3ktext+239kdata 475kmax 0reads 0writes 0pf 0swaps
  3.8u 1.2s 0:06 82% 3ktext+240kdata 474kmax 0reads 0writes 0pf 0swaps
caltech
  2.3u 2.3s 0:04 114% 4ktext+320kdata 631kmax 0reads 0writes 0pf 0swaps
  2.2u 2.4s 0:04 115% 4ktext+319kdata 632kmax 0reads 0writes 0pf 0swaps

Note that the 4bsd versions actually allocate space in chunks of size
m+4 bytes (more or less).  The caltech version allocates space in chunks
of m+4, rounded up to the nearest power of 2.  The caltech version also
makes frequent calls to vlimit() and informs you if you're getting near
your data space limits.

As expected, the differences are less dramatic for smaller allocations,
and greater if you do a lot of malloc's, especially when you use so much
virtual memory that you start to swap:

testing 20000 100
normal 4bsd
  727.5u 3130.5s 3:46:31 28% 2ktext+1416kdata 1713kmax 0reads 0writes 298156pf 144swaps
  77.6u 726.1s 56:34 23% 2ktext+1299kdata 1706kmax 0reads 0writes 72606pf 48swaps
4.1c
  4.1u 6.5s 0:24 43% 3ktext+978kdata 1595kmax 0reads 0writes 0pf 0swaps
  3.8u 4.6s 0:22 38% 3ktext+957kdata 1489kmax 0reads 0writes 0pf 0swaps
caltech
  3.3u 9.7s 0:26 49% 3ktext+1043kdata 1509kmax 0reads 0writes 1pf 0swaps
  3.2u 10.0s 0:26 50% 3ktext+1071kdata 1529kmax 0reads 0writes 1pf 0swaps

If somebody can come up with better benchmarks (or better malloc's),
give me a buzz.  In particular, a benchmark that does a random mix of
malloc's and free's should be written.  The above tests don't show it,
but the caltech version is probably better than the 4.1c version; since
it only deals with a small number of different sizes, it keeps a
separate free list for each size, which greatly speeds up re-allocation.

	Mark Plotnick (genrad!mit-eddie!mp, eagle!mit-vax!mp)