[comp.lang.c] Re^2: Memory Allocation

njk@freja.diku.dk (Niels J|rgen Kruse) (05/09/89)

kpv@ulysses.homer.nj.att.com (Phong Vo[drew]) writes:

>The old malloc was based on a simple first-fit strategy with a roving pointer.
>It is intolerably slow when a process needs to to many mallocs (say thousand

Don't write off roving pointer/rotating freelist allocators
just because of a poor implementation. Rotating freelist
allocators can be made to go *very* fast. That requires an
explicit freelist however.

Even the Bsd 4.1 malloc can be speeded up significantly by
simple means. If there is sufficient interest, i will post a
patch to the Bsd 4.1 malloc (as distributed with the JOVE
sources), that makes it 10 x faster (on my benchmark, your
mileage may vary).

>However, one can go a long way toward making a compromise between having good
>time behavior and good space behavior. Locally, we've been having good
>success with a modified version of the best-fit based algorithm presented
>in that paper.

Could you say how fast it is relative to the Bsd 4.3 malloc,
roughly, < 2 x slower ?, < 3 x slower ?, ...
-- 
         Niels J|rgen Kruse
Email    njk@diku.dk
Mail     Tustrupvej 7, 2 tv, 2720 Vanlose, Denmark

kpv@ulysses.homer.nj.att.com (Phong Vo[drew]) (05/10/89)

In article <4648@freja.diku.dk>, njk@freja.diku.dk (Niels J|rgen Kruse) writes:
> kpv@ulysses.homer.nj.att.com (Phong Vo[drew]) writes:
> 
> >The old malloc was based on a simple first-fit strategy with a roving pointer.
> >It is intolerably slow when a process needs to to many mallocs (say thousand
> 
> Don't write off roving pointer/rotating freelist allocators
> just because of a poor implementation. Rotating freelist
> allocators can be made to go *very* fast. That requires an
> explicit freelist however.
> 
"*very* fast" is, of course, a relative measure.
I know of at least one implementation of malloc that implements this
strategy, instead of an expected search length of n/2 where n is the number
of blocks, the expected search length is f/2 where f is the number of free blocks.
Toward the mid-life of an active program, when memory is sufficiently fragmented
and there are many free blocks, this strategy suffers too. For more detailed data,
I suggest you look at the '85 USENIX paper that I mentioned.

> Even the Bsd 4.1 malloc can be speeded up significantly by
> simple means. If there is sufficient interest, i will post a
> patch to the Bsd 4.1 malloc (as distributed with the JOVE
> sources), that makes it 10 x faster (on my benchmark, your
> mileage may vary).
> 
> >However, one can go a long way toward making a compromise between having good
> >time behavior and good space behavior. Locally, we've been having good
> >success with a modified version of the best-fit based algorithm presented
> >in that paper.
> 
> Could you say how fast it is relative to the Bsd 4.3 malloc,
> roughly, < 2 x slower ?, < 3 x slower ?, ...
> -- 
Here's some data from running a small simulation on a VAX8650, 4.3BSD.
This simulation runs 10000 steps. At each step, a malloc request
of a size varying uniformly between 1 to 1000 bytes is performed
(before someone jumps up and down about the large size variation,
this only magnifies the relative differences between algorithms).
Each malloced block is given a life time ranging uniformly
between 100 to 1000 steps. When a block's time is up, it is freed.
In addition, at the end of the simulation, any block that hasn't been
freed is freed. The data includes 3 algorithms, buddy-system (bsi, libc.a),
old first-fit (fsi, old libc.a) and the best-fit one (vsi). Salloc is the maximum
total size in bytes of 'in-use' allocated data at any time slice. Sarena is
the total sbrk space. Waste is (Sarena-Salloc)/Salloc, the percentage
of how much space is sbrk-ed but not needed. Time is the total user+sys
time spent in doing mallocs and frees alone (no data manipulation).

		bsi	fsi	vsi
Salloc		304921	304921	304921
Sarena		785804	392192	358400
Waste		1.58	0.29	0.18
Time		0.48s	2.23s	0.93s

The buddy system algorithm, bsi, wastes 158% of the space required.
As I mentioned in the previous posting, when a program does this much
malloc/free, it probably spends even more time manipulating data.
The wasted space means that page faults will be more likely.
The first-fit algorithm, fsi, is much better in space usage
but suffers in time. This run is in fact good for it since there are
relatively many frees meaning that its chance of finding a suitable free
block at malloc time is good. There are real life examples where the
time bottle neck is in malloc when the fsi algorithm is used. The
best-fit algorithm, vsi, represents a compromise. It is best in space
usage, wasting only 18% of space and is about twice as slow as bsi.
The speed of vsi is due to the use of a splay tree to maintain free blocks.

>          Niels J|rgen Kruse
> Email    njk@diku.dk
> Mail     Tustrupvej 7, 2 tv, 2720 Vanlose, Denmark

Phong Vo, ulysses!kpv