njk@freja.diku.dk (Niels J|rgen Kruse) (05/09/89)
kpv@ulysses.homer.nj.att.com (Phong Vo[drew]) writes: >The old malloc was based on a simple first-fit strategy with a roving pointer. >It is intolerably slow when a process needs to to many mallocs (say thousand Don't write off roving pointer/rotating freelist allocators just because of a poor implementation. Rotating freelist allocators can be made to go *very* fast. That requires an explicit freelist however. Even the Bsd 4.1 malloc can be speeded up significantly by simple means. If there is sufficient interest, i will post a patch to the Bsd 4.1 malloc (as distributed with the JOVE sources), that makes it 10 x faster (on my benchmark, your mileage may vary). >However, one can go a long way toward making a compromise between having good >time behavior and good space behavior. Locally, we've been having good >success with a modified version of the best-fit based algorithm presented >in that paper. Could you say how fast it is relative to the Bsd 4.3 malloc, roughly, < 2 x slower ?, < 3 x slower ?, ... -- Niels J|rgen Kruse Email njk@diku.dk Mail Tustrupvej 7, 2 tv, 2720 Vanlose, Denmark
kpv@ulysses.homer.nj.att.com (Phong Vo[drew]) (05/10/89)
In article <4648@freja.diku.dk>, njk@freja.diku.dk (Niels J|rgen Kruse) writes: > kpv@ulysses.homer.nj.att.com (Phong Vo[drew]) writes: > > >The old malloc was based on a simple first-fit strategy with a roving pointer. > >It is intolerably slow when a process needs to to many mallocs (say thousand > > Don't write off roving pointer/rotating freelist allocators > just because of a poor implementation. Rotating freelist > allocators can be made to go *very* fast. That requires an > explicit freelist however. > "*very* fast" is, of course, a relative measure. I know of at least one implementation of malloc that implements this strategy, instead of an expected search length of n/2 where n is the number of blocks, the expected search length is f/2 where f is the number of free blocks. Toward the mid-life of an active program, when memory is sufficiently fragmented and there are many free blocks, this strategy suffers too. For more detailed data, I suggest you look at the '85 USENIX paper that I mentioned. > Even the Bsd 4.1 malloc can be speeded up significantly by > simple means. If there is sufficient interest, i will post a > patch to the Bsd 4.1 malloc (as distributed with the JOVE > sources), that makes it 10 x faster (on my benchmark, your > mileage may vary). > > >However, one can go a long way toward making a compromise between having good > >time behavior and good space behavior. Locally, we've been having good > >success with a modified version of the best-fit based algorithm presented > >in that paper. > > Could you say how fast it is relative to the Bsd 4.3 malloc, > roughly, < 2 x slower ?, < 3 x slower ?, ... > -- Here's some data from running a small simulation on a VAX8650, 4.3BSD. This simulation runs 10000 steps. At each step, a malloc request of a size varying uniformly between 1 to 1000 bytes is performed (before someone jumps up and down about the large size variation, this only magnifies the relative differences between algorithms). Each malloced block is given a life time ranging uniformly between 100 to 1000 steps. When a block's time is up, it is freed. In addition, at the end of the simulation, any block that hasn't been freed is freed. The data includes 3 algorithms, buddy-system (bsi, libc.a), old first-fit (fsi, old libc.a) and the best-fit one (vsi). Salloc is the maximum total size in bytes of 'in-use' allocated data at any time slice. Sarena is the total sbrk space. Waste is (Sarena-Salloc)/Salloc, the percentage of how much space is sbrk-ed but not needed. Time is the total user+sys time spent in doing mallocs and frees alone (no data manipulation). bsi fsi vsi Salloc 304921 304921 304921 Sarena 785804 392192 358400 Waste 1.58 0.29 0.18 Time 0.48s 2.23s 0.93s The buddy system algorithm, bsi, wastes 158% of the space required. As I mentioned in the previous posting, when a program does this much malloc/free, it probably spends even more time manipulating data. The wasted space means that page faults will be more likely. The first-fit algorithm, fsi, is much better in space usage but suffers in time. This run is in fact good for it since there are relatively many frees meaning that its chance of finding a suitable free block at malloc time is good. There are real life examples where the time bottle neck is in malloc when the fsi algorithm is used. The best-fit algorithm, vsi, represents a compromise. It is best in space usage, wasting only 18% of space and is about twice as slow as bsi. The speed of vsi is due to the use of a splay tree to maintain free blocks. > Niels J|rgen Kruse > Email njk@diku.dk > Mail Tustrupvej 7, 2 tv, 2720 Vanlose, Denmark Phong Vo, ulysses!kpv