mp@mit-eddie.UUCP (Mark Plotnick) (08/19/83)
The man page for malloc states that it behaves poorly in a large virtual environment. For an application of mine, I need to allocate lots of little pieces, so I decided to do a little benchmarking. I looked at (1) the malloc and free in the standard 4bsd C library (2) the "malloc.c.new" that came with 4.1c. Does anybody know if this is bug-free? (3) the malloc/free package from Caltech that utah-gr!thomas distributed last year. The benchmark program was very simple; it just allocated n blocks of m bytes each, and then printed out the virtual memory statistics (pirated from csh/sh.time.c). It will optionally allocate, and immediately free, n*m bytes upon startup; this greatly reduces the number of sbrk() system calls. Here are the result for 20000 blocks of 20 bytes each. All tests were done on a lightly-loaded 750 running 4.1bsd with 2meg of memory and a single rm05. (each version was run twice; the second time has the giant free(malloc()) at the beginning). testing 20000 20: normal 4bsd 101.5u 3.3s 2:12 79% 4ktext+312kdata 476kmax 0reads 0writes 0pf 0swaps 33.0u 1.5s 0:40 86% 4ktext+412kdata 476kmax 0reads 0writes 0pf 0swaps 4.1c 3.8u 1.5s 0:08 66% 3ktext+239kdata 475kmax 0reads 0writes 0pf 0swaps 3.8u 1.2s 0:06 82% 3ktext+240kdata 474kmax 0reads 0writes 0pf 0swaps caltech 2.3u 2.3s 0:04 114% 4ktext+320kdata 631kmax 0reads 0writes 0pf 0swaps 2.2u 2.4s 0:04 115% 4ktext+319kdata 632kmax 0reads 0writes 0pf 0swaps Note that the 4bsd versions actually allocate space in chunks of size m+4 bytes (more or less). The caltech version allocates space in chunks of m+4, rounded up to the nearest power of 2. The caltech version also makes frequent calls to vlimit() and informs you if you're getting near your data space limits. As expected, the differences are less dramatic for smaller allocations, and greater if you do a lot of malloc's, especially when you use so much virtual memory that you start to swap: testing 20000 100 normal 4bsd 727.5u 3130.5s 3:46:31 28% 2ktext+1416kdata 1713kmax 0reads 0writes 298156pf 144swaps 77.6u 726.1s 56:34 23% 2ktext+1299kdata 1706kmax 0reads 0writes 72606pf 48swaps 4.1c 4.1u 6.5s 0:24 43% 3ktext+978kdata 1595kmax 0reads 0writes 0pf 0swaps 3.8u 4.6s 0:22 38% 3ktext+957kdata 1489kmax 0reads 0writes 0pf 0swaps caltech 3.3u 9.7s 0:26 49% 3ktext+1043kdata 1509kmax 0reads 0writes 1pf 0swaps 3.2u 10.0s 0:26 50% 3ktext+1071kdata 1529kmax 0reads 0writes 1pf 0swaps If somebody can come up with better benchmarks (or better malloc's), give me a buzz. In particular, a benchmark that does a random mix of malloc's and free's should be written. The above tests don't show it, but the caltech version is probably better than the 4.1c version; since it only deals with a small number of different sizes, it keeps a separate free list for each size, which greatly speeds up re-allocation. Mark Plotnick (genrad!mit-eddie!mp, eagle!mit-vax!mp)