[comp.lang.c++] reference counting vs. scanning GC

tmb@ai.mit.edu (Thomas M. Breuel) (04/24/91)

   In article <1991Apr23.000632.9175@neon.Stanford.EDU> hoelzle@neon.Stanford.EDU
   (Urs Hoelzle) writes in response to an earlier message by tmb@ai.mit.edu
   (Thomas M. Breuel):
   >>[Any kind of garbage collection other than reference counting] will have
   >>much higher overhead [in the case of non-recursive data structures].

This is quoted incorrectly, and it does not represent what I was saying.

   >What is the reason why this must be so?  Do you have any numbers to
   >show that reference counting is superior in this case?  (Read: I don't
   >believe it.)

   Neither do I.  In fact, I was surprised to find that using the conservative
   garbage collector due to Hans Boehm and Mark Weiser (and described in
   Software Practice and Experience in 1988?) instead of doing reference
   counting myself was _considerably_ faster in a C++ implementation which
   used many dynamically allocated arrays.  If my memory serves me correctly,
   the running times dropped to about 60% of the original ones.  (And, yes,
   the application ran long enough to incur several garbage collections!)

Reference counting incurs costs at each copy and assignment operation
involving references to data. For most applications that makes reference
counting inferior to scanning garbage collectors.

However, while the incremental cost of collecting another object with
scanning garbage collectors is considerably smaller than the
bookkeeping cost required for reference counting that object, the
startup cost for a scanning garbage collector is much higher compared
to the zero startup cost for a reference counting system. This is the
case because you have to scan through a considerable number of objects
to determine what is alive and what is dead before you can start any
storage reclamation (generational schemes reduce, but do not
eliminate, this overhead).

Now, if you have some resource of which you can allocate only a small
number of items (say 10-100) before you need to garbage collect, the
overhead associated with scanning garbage collectors can be much more
costly than the overhead associated with updating reference counts.

Typical examples of such situations are numerical applications written
in a functional style and data parallel languages like *Lisp (other
examples are file descriptors, network ports, communication ports, and
other kinds of very limited hardware resources). In a serial numerical
program written in a functional style, you will easily generate
garbage at rates of 10M/sec, meaning that you have to garbage collect
very frequently. The rates are much higher for data level parallel
languages.

Such languages have the additional useful feature that those objects
for which reference counting makes sense can also never be circular,
which means that reference counting not only is the most efficient
garbage collection technique, but that it also is completely
sufficient. Reference counting will most likely also give you better
locality of reference as well, since recently freed objects are likely
to have been accessed recently as well.

For C++, the implication of this is that, while it may make sense
to use scanning garbage collectors for most of the garbage, reference
counting is a sensible technique for large floating point and integer
vectors, which is all I was claiming to begin with.

gyro@kestrel.edu (Scott Layson Burson) (04/25/91)

In article <TMB.91Apr24033353@volterra.ai.mit.edu> tmb@ai.mit.edu (Thomas M. Breuel) writes:
>Such languages have the additional useful feature that those objects
>for which reference counting makes sense can also never be circular,
>which means that reference counting not only is the most efficient
>garbage collection technique, but that it also is completely
>sufficient. Reference counting will most likely also give you better
>locality of reference as well, since recently freed objects are likely
>to have been accessed recently as well.

I must challenge the latter point.  I guess you're referring to the
fact that a garbage collection requires scanning many objects which
may not have been touched recently, and that's true: the garbage
collection process *itself* does not have very good locality.
However, I think you are overlooking the fact that a well-designed
generational garbage collector can make very important improvements in
the locality of the objects as it relocates them, in such a way as to
substantially improve the paging behavior of the application itself.

This is very definitely a concern I have about C++: that when people
start manipulating 60 Mb worth of objects in one program (and I have
already seen 30 Mb process sizes), that the locality problem is going
to bring their machines to their knees, even though they would have
had no trouble running the same algorithms on the same data in Lisp.
I don't know of any allocation algorithms for C++ that pay any
attention to locality, nor do I really see how there *can* be any,
because the information about what other objects an object X points to
or is pointed to by is not available at the time X is created.
Furthermore, there is no way of doing compaction of live objects.

As I say, I am very concerned about this.  I think the C++ community
is not prepared for the magnitude of the problem.

Note, by the way, that while class-specific `operator new' and
`operator delete' might sometimes be of help, in the general case --
in all the cases I have seen -- reference locality is orthogonal to
class boundaries.  The "placement" argument to `new' (ARM 5.3.3) is
more likely to be of significant value.

>For C++, the implication of this is that, while it may make sense
>to use scanning garbage collectors for most of the garbage, reference
>counting is a sensible technique for large floating point and integer
>vectors, which is all I was claiming to begin with.

However, with these caveats about locality, I agree with your
conclusion.

-- Scott
Gyro@Reasoning.COM

tmb@ai.mit.edu (Thomas M. Breuel) (04/25/91)

In article <1991Apr24.192424.23745@kestrel.edu> gyro@kestrel.edu (Scott Layson Burson) writes:
   In article <TMB.91Apr24033353@volterra.ai.mit.edu> tmb@ai.mit.edu (Thomas M. Breuel) writes:
   >Such languages have the additional useful feature that those objects
   >for which reference counting makes sense can also never be circular,
   >which means that reference counting not only is the most efficient
   >garbage collection technique, but that it also is completely
   >sufficient. Reference counting will most likely also give you better
   >locality of reference as well, since recently freed objects are likely
   >to have been accessed recently as well.

   I must challenge the latter point.  I guess you're referring to the
   fact that a garbage collection requires scanning many objects which
   may not have been touched recently, and that's true: the garbage
   collection process *itself* does not have very good locality.
   However, I think you are overlooking the fact that a well-designed
   generational garbage collector can make very important improvements in
   the locality of the objects as it relocates them, in such a way as to
   substantially improve the paging behavior of the application itself.

This is true of compacting garbage collectors but is not relevant to
the case I was discussing.

I was talking about large heaps (many Mbytes) and large objects (about
1 Mbyte each or more). You cannot afford to compact (copy) such
objects just because you might improve locality of reference (apart
from the fact that compacting GC in C++ is tricky anyway).

Compacting, generational, scanning garbage collectors are better than
non-compacting reference counting garbage collectors for many
allocation problems. However, for large monotype arrays, reference
counting has lower overhead, and it is much simpler to implement than
a solid generational scheme. Keep this in mind before you rush out to
switch all your C++ memory management to a scanning garbage collector.

						Thomas.

cwhitley@mole.gnu.ai.mit.edu (Cecil H. Whitley) (04/25/91)

To: popkin@cs.odu.edu
Subject: Re: Turbo Questions (Turbo Vision, Objects, Etc)
Newsgroups: comp.lang.pascal,comp.lang.c,comp.lang.c++,comp.lang.misc
In-Reply-To: <1991Apr24.165725.29020@cs.odu.edu>
Organization: The Internet
Cc: 
Bcc: 

Hi,
tc++ (now named borland c++) doesn't have turbo-vision yet.  It does 
however have an external program that does some of what tv does.
It also doesn't have unit files it has somethingg better!  It uses .obj 
files which are pre-compiled sources from which you can use either
some or all functions/sub-routines.  It also has vroom
(virtual run-time object oriented memory management) borland's 
overlay manager.  You will find it to be more flexible than
turbo pascal, esp for large programs with a lot of data.
Cecil
 

marti@mint.inf.ethz.ch (Robert Marti) (04/25/91)

In article <TMB.91Apr24033353@volterra.ai.mit.edu> tmb@ai.mit.edu
(Thomas M. Breuel) writes:
>   In article <1991Apr23.000632.9175@neon.Stanford.EDU>
>   hoelzle@neon.Stanford.EDU
>   (Urs Hoelzle) writes in response to an earlier message by tmb@ai.mit.edu
>   (Thomas M. Breuel):
>   >>[Any kind of garbage collection other than reference counting] will have
>   >>much higher overhead [in the case of non-recursive data structures].
>This is quoted incorrectly, and it does not represent what I was saying.

First off, I (and not Urs Hoelzle) was the one who was paraphrasing
what I preceived Thomas Breuel as saying.  Sorry if I mistunderstood
and therefore misquoted your views, Thomas.


[Lengthy explanations deleted.]


>For C++, the implication of this is that, while it may make sense
>to use scanning garbage collectors for most of the garbage, reference
>counting is a sensible technique for large floating point and integer
>vectors, which is all I was claiming to begin with.

What exactly means "sensible"?  All _I_ was saying is that in my C++
application, I had "large" bit vectors and large inter vectors and
found that running times using a mark-and-sweep collector were about
60% as opposed to doing reference counting and deallocation myself.
I am a little unsure if this means that ref counting is sensible in
this very situation ...

Robert Marti                      |  Phone:    +41 1 254 72 60
Institut fur Informationssysteme  |  FAX:      +41 1 262 39 73
ETH-Zentrum                       |  E-Mail:   marti@inf.ethz.ch
CH-8092 Zurich, Switzerland       |

tmb@ai.mit.edu (Thomas M. Breuel) (04/26/91)

   >For C++, the implication of this is that, while it may make sense
   >to use scanning garbage collectors for most of the garbage, reference
   >counting is a sensible technique for large floating point and integer
   >vectors, which is all I was claiming to begin with.

   What exactly means "sensible"?  All _I_ was saying is that in my C++
   application, I had "large" bit vectors and large inter vectors and
   found that running times using a mark-and-sweep collector were about
   60% as opposed to doing reference counting and deallocation myself.
   I am a little unsure if this means that ref counting is sensible in
   this very situation ...

And I have had exactly the opposite experience. Perhaps our notions of
"large" and "frequent deallocation" are different. As I said before, I
am concerned with arrays that are a significant fraction of heap size
(about 1M out of 16M) and that are written and read only a few times during
their lifetime.

Maybe you could be more specific about your application.

jimad@microsoft.UUCP (Jim ADCOCK) (05/04/91)

In article <1991Apr24.192424.23745@kestrel.edu| gyro@kestrel.edu (Scott Layson Burson) writes:
|This is very definitely a concern I have about C++: that when people
|start manipulating 60 Mb worth of objects in one program (and I have
|already seen 30 Mb process sizes), that the locality problem is going
|to bring their machines to their knees, even though they would have
|had no trouble running the same algorithms on the same data in Lisp.
|I don't know of any allocation algorithms for C++ that pay any
|attention to locality, nor do I really see how there *can* be any,
|because the information about what other objects an object X points to
|or is pointed to by is not available at the time X is created.
|Furthermore, there is no way of doing compaction of live objects.
|
|As I say, I am very concerned about this.  I think the C++ community
|is not prepared for the magnitude of the problem.

Agreed.  I see two major impediments to solving these problems:

1) C++ presently has inadequate support for moveable objects 
[although a number of vendors have C/C++ compiler extensions for 
moveable objects]

2) C++ presently has inadequate support for GC [although the problem is
very similar to exceptions, so why not kill two birds with one stone?]

|Note, by the way, that while class-specific `operator new' and
|`operator delete' might sometimes be of help, in the general case --
|in all the cases I have seen -- reference locality is orthogonal to
|class boundaries.  The "placement" argument to `new' (ARM 5.3.3) is
|more likely to be of significant value.

A more fundamental problem is that any fixed placement strategy assumes
that the patterns of usage of objects is going to remain fixed, but this
is not true of any non-trivial app.  Thus, the only way to maintain good
locality of reference over the long run is to have moveable objects.
As an extreme, see Jul's thesis on Emerald, where objects are moved 
"automatically" over the net to the process where they are needed.

I agree with your concerns, but I don't think its too hard to fix
these problems with some small changes to C++.  Its just that some
people don't yet seem to be interested in solving these problems.

[In addition to all this, note that overloaded operator dot would make
 one small contribution towards allowing moveable objects, since such
 overloading allows C++ programs to abstract the notion of an object
 distinct from the byte address used to store that object.]

boehm@parc.xerox.com (Hans Boehm) (05/08/91)

jimad@microsoft.UUCP (Jim ADCOCK) writes:

>In article <1991Apr24.192424.23745@kestrel.edu| gyro@kestrel.edu (Scott Layson Burson) writes:
>|This is very definitely a concern I have about C++: that when people
>|start manipulating 60 Mb worth of objects in one program (and I have
>|already seen 30 Mb process sizes), that the locality problem is going
>|to bring their machines to their knees, even though they would have
>|had no trouble running the same algorithms on the same data in Lisp.
>|I don't know of any allocation algorithms for C++ that pay any
>|attention to locality, nor do I really see how there *can* be any,
>|because the information about what other objects an object X points to
>|or is pointed to by is not available at the time X is created.
>|Furthermore, there is no way of doing compaction of live objects.
>|
>|As I say, I am very concerned about this.  I think the C++ community
>|is not prepared for the magnitude of the problem.

>Agreed.  ...

>A more fundamental problem is that any fixed placement strategy assumes
>that the patterns of usage of objects is going to remain fixed, but this
>is not true of any non-trivial app.  Thus, the only way to maintain good
>locality of reference over the long run is to have moveable objects.
>As an extreme, see Jul's thesis on Emerald, where objects are moved 
>"automatically" over the net to the process where they are needed.

I agree that in a distributed memory system this makes sense.  In the
uniprocessor case, the issues are less clear.  There is some reason to
believe that objects allocated close together in time are likely to be
accessed together.  (I believe Paul Wilson's paper in the upcoming SIGPLAN
conference makes this point, but I'm sure there are better references.)
Fixed placement allocators typically allocate successively allocated
objects close to each other.  (If they don't, they should.)  Thus there
is some reason to believe that fixed placement allocators don't do that
badly with respect to locality.

Moving collectors uniformly win in that they avoid fragmentation.  But,
at least in our environment (Cedar with 30 MB or so heap and some long
lived sessions), this isn't much of an issue even with fixed allocation.

A compacting trace-and-sweep collector obviously wins over a noncompacting
one in terms of locality.  (Actually, in the presence of nonuniform
sized objects, even that's not completely clear, but it seems likely.)
But it tends to be expensive in other respects.  Copying collectors
have their own problems.  You have to be fairly clever to get any
reasonable locality.  (My intuition is that breadth-first copying is
significantly worse than fixed allocation.  I don't have numbers to
either back that up or to refute it.)  Copying on reference wins
substantially for at least one application (see Johnson's paper in
ASPLOS '91), but that's hard to do well on a conventional machine.  A
full copying collection is likely to be a disaster, given the current
ratio of cpu to disk speeds.  Moving collectors don't get along with
some programming techniques (e.g. hashing on addresses).

A comparison of some of these approaches can be found in John
DeTreville, "Experience with Concurrent Garbage Collectors for Modula-2+",
DEC SRC report no. 64.  Ben Zorn's paper in the 1990 Lisp conference also
compares copying and trace&sweep collection in one particular context.
If anybody knows of any other real data (that's not completely obsolete),
I would love to know about it.

Hans
(boehm@xerox.com)

gyro@kestrel.edu (Scott Layson Burson) (05/11/91)

In article <boehm.673720634@siria> boehm@parc.xerox.com (Hans Boehm) writes:
>Fixed placement allocators typically allocate successively allocated
>objects close to each other.  (If they don't, they should.)

They don't, at least not when used simplistically.  When objects are
freed, allocators attempt to reuse the space they took up.  The
strategies for doing this vary, of course, but the bottom line is that
after a lot of allocations and freeings, successively allocated
objects get less and less likely to be close to each other.

One factor that exacerbates the problem greatly is allocating
short-lived and long-lived objects in the same space.  So it would
seem that the first thing to do would be to try to segregate objects
by expected lifetime.

To whatever extent that can be done in a particular application, I
think it's a good start on the problem, but it isn't always possible
to do it very well.

Bartlett's generational collector may be a lifesaver for certain
applications people I know are working on.  I haven't yet heard of
anyone who's done a SPARC/G++ port, however.

-- Scott
Gyro@Reasoning.COM

boehm@parc.xerox.com (Hans Boehm) (05/14/91)

gyro@kestrel.edu (Scott Layson Burson) writes:

>In article <boehm.673720634@siria> boehm@parc.xerox.com (Hans Boehm) writes:
>>Fixed placement allocators typically allocate successively allocated
>>objects close to each other.  (If they don't, they should.)

>They don't, at least not when used simplistically.  When objects are
>freed, allocators attempt to reuse the space they took up.  The
>strategies for doing this vary, of course, but the bottom line is that
>after a lot of allocations and freeings, successively allocated
>objects get less and less likely to be close to each other.

>One factor that exacerbates the problem greatly is allocating
>short-lived and long-lived objects in the same space.  So it would
>seem that the first thing to do would be to try to segregate objects
>by expected lifetime.

Granted, things will get a little worse as more objects are reallocated, but
not terribly so.  Let's be a bit more precise.  This issue is of interest 
primarily for small (< 1 page) objects.  As an example, let's assume we're
allocating 8 byte cons cells on 4K pages.  Let's further assume that pages
contain only objects of one size.  (This is likely to be a good choice for
a nonmoving allocator.)  Let's further assume the heap is 3/4 full.

When I garbage collect, I will find that an average page contains 512/4=128
free objects.  These will end up next to each other on the size 8 free list.
Thus the probability of two subsequent allocations residing on the same
page is about 127/128, assuming all of memory is equally densely populated.
This is worse than what I would get for newly allocated memory from a
copying collector (which would get 511/512).  But it is probably better than
what I would get by considering those same list nodes after they've been
breadth-first copied a few times.

Reality is of course more complicated.  Different objects sizes interact,
though this is probably not a major effect, since the number of different
object sizes in a typical data structure is probably small.  In our
experience, pages tend to be very nonuniformly populated, even after
a system has been up for quite a while.  This helps
the nonmoving allocator, since it can chose not to allocate on nearly
full pages.

Breadth-first copying is of course not optimal.  But other strategies seem
to make it harder to build concurrent collectors on conventional hardware.
The whole point is still that the tradeoffs are nontrivial.

Hans
(boehm@xerox.com)

jimad@microsoft.UUCP (Jim ADCOCK) (05/17/91)

In article <1991May11.054648.2483@kestrel.edu> gyro@kestrel.edu (Scott Layson Burson) writes:
|In article <boehm.673720634@siria> boehm@parc.xerox.com (Hans Boehm) writes:
|>Fixed placement allocators typically allocate successively allocated
|>objects close to each other.  (If they don't, they should.)
|
|They don't, at least not when used simplistically.  When objects are
|freed, allocators attempt to reuse the space they took up.  The
|strategies for doing this vary, of course, but the bottom line is that
|after a lot of allocations and freeings, successively allocated
|objects get less and less likely to be close to each other.

|One factor that exacerbates the problem greatly is allocating
|short-lived and long-lived objects in the same space.  So it would
|seem that the first thing to do would be to try to segregate objects
|by expected lifetime.

If you think about it, the problem is not where to create objects that
are being born, but where one should have created the object that is 
currently dying.  Because without moving objects, the only way to conveniently
reuse the hole left by a dying object is for that hole to be at "exactly"
the right location for a new object to use.  I don't see any good why to 
predict in advance when an object is going to die, thereby allowing it
to be placed well at time of creation.  Some people claim the right thing to do
is to let the end programmer specify where the object will be created,
either by specifying how long the programmer guesses the object will live:

FOO* pYoungFoo = new (SHORT_LIVED) foo(23);
FOO* pOldFoo = new (LONG_LIVED) foo(123);

or by specifying a neighborhood of objects that are all going to die
at about the same time:

FOO* pYoungFool = new (BELLEVUE) fool(34);
FOO* pOldFool = new (SEATTLE) fool(92);
FOO* pMiddleAgedFool = new (MERCER_ISLAND) fool(45);

I think this is a cop-out, in that the end user [if writing "OOP" classes]
is seldom better equipped to guess the life-time of objects than the 
person writing the memory management routines.  In particular, if one
is writing general-purpose reusable classes, one typically has no idea
at all how objects of those classes are going to be used.  So, I claim
writing reusable classes with explicit placement parameters is troublesome,
at best.  If one cannot, then, successfully predict the lifetime of objects,
so that the hole they leave behind is well placed, one needs to be able
to move that hole to a more useful location.  The only way I cn see to do that 
is either to move objects around, implicitly moving left-behind holes too, 
or by building hardware with virtual memory paging hardware more in tune
with the needs of OOP [how often do we make 4K-sized objects?]

So, I'd like to see C++ have better support for movable objects.

daniel@fern.ucsc.edu (Daniel Edelson) (05/17/91)

In article <72384@microsoft.UUCP> jimad@microsoft.UUCP (Jim ADCOCK) writes:

>So, I'd like to see C++ have better support for movable objects.

Then lobby X3J16 not to forbid taking the address of
``this'' within a member function.

-daniel
daniel@cis.ucsc.edu

jimad@microsoft.UUCP (Jim ADCOCK) (05/21/91)

In article <boehm.674163589@siria> boehm@parc.xerox.com (Hans Boehm) writes:
|Reality is of course more complicated.  Different objects sizes interact,
|though this is probably not a major effect, since the number of different
|object sizes in a typical data structure is probably small.  In our
|experience, pages tend to be very nonuniformly populated, even after
|a system has been up for quite a while.  This helps
|the nonmoving allocator, since it can chose not to allocate on nearly
|full pages.

In which case you're ending up with a situation with old pages, containing
un-reclaimed holes.  Eventually pages fill up with old long-lived objects,
excepting a few holes.  Either one ignores those holes, failing to reclaim
memory, in which case you don't have a stable persistent system, or one
is forced to fill the holes in old pages, in which case you've got new
objects spread out across predominantly old pages.  The only way
around this dilemma is to move objects.  Of course, for non-stable
systems or even quasi-stable systems, it might be acceptible to "leak"
a little memory by never plugging the holes in old pages with new objects.
But, unless one can predict a finite lifetime for one's app, you've
got to go back and fill those holes eventually.

robertk@lotatg.lotus.com (Robert Krajewski) (05/21/91)

In article <72384@microsoft.UUCP> jimad@microsoft.UUCP (Jim ADCOCK) writes:

   From: jimad@microsoft.UUCP (Jim ADCOCK)
   Newsgroups: comp.lang.c++

   Some people claim the right thing to do
   is to let the end programmer specify where the object will be created,
   either by specifying how long the programmer guesses the object will live:

   FOO* pYoungFoo = new (SHORT_LIVED) foo(23);
   FOO* pOldFoo = new (LONG_LIVED) foo(123);

   or by specifying a neighborhood of objects that are all going to die
   at about the same time:

   FOO* pYoungFool = new (BELLEVUE) fool(34);
   FOO* pOldFool = new (SEATTLE) fool(92);
   FOO* pMiddleAgedFool = new (MERCER_ISLAND) fool(45);

   I think this is a cop-out...

... but even so, even one of the examples of ultimate Right Thing object
storage implementations, the Lisp Machine, had a similar strategy for
programmers that wanted to optimized locality. Lisp Machine storage
was divided into areas -- there was a normally working storage area,
but one could also direct allocation into other areas, either through
explicit arguments in special versions of CONS or MAKE-ARRAY, or
through binding a special implicit parameter to the allocator.

The Lisp Machine compiler used areas in a particularly nasty way:
after compilation was complete, the area was *reset*, that is, totally
deallocated. That saved a lot of the work for the garbage collector,
but modification of the compiler could be very dangerous if you left
things outside pointing to the compiler area. After the
generation-based GC was moved in, the compiler area was retained, but
reset code was taken out. With this change, compilation time *sped up*
because the generational GC had a chance to reclaim new objects before
the compiler was finished; that storage was then freed up for other
objects, and the machine did not have to page to get more unused
pages.

In general, with the ephemeral GC in place, areas were much less
important than before. However, they still provide some information
about locality, so there is a place for them.

   The only way I cn see to do that 
   is either to move objects around, implicitly moving left-behind holes too, 
   or by building hardware with virtual memory paging hardware more in tune
   with the needs of OOP [how often do we make 4K-sized objects?]

Well, it's been done, although it's more of a question of low-level
software (object storage) conventions than anything else. There's very
little about a Lisp Machine that makes it a *Lisp* machine per se. I
don't think there is ever going to be a version of C++ that's going to
allow similar object architectures to fly at full speed, since there
are some many programs that manipulate addresses in a
non-object-oriented fashion.  What is more reachable is getting enough
reasonable features and latitude in spec so that, if one wanted to,
one could write code in a clean dialect of C++ that would not be
hostile to a true object architecture.

boehm@parc.xerox.com (Hans Boehm) (05/23/91)

jimad@microsoft.UUCP (Jim ADCOCK) writes:

>In article <boehm.674163589@siria> boehm@parc.xerox.com (Hans Boehm) writes:
>|Reality is of course more complicated.  Different objects sizes interact,
>|though this is probably not a major effect, since the number of different
>|object sizes in a typical data structure is probably small.  In our
>|experience, pages tend to be very nonuniformly populated, even after
>|a system has been up for quite a while.  This helps
>|the nonmoving allocator, since it can chose not to allocate on nearly
>|full pages.

>In which case you're ending up with a situation with old pages, containing
>un-reclaimed holes.  Eventually pages fill up with old long-lived objects,
>excepting a few holes.  Either one ignores those holes, failing to reclaim
>memory, in which case you don't have a stable persistent system, or one
>is forced to fill the holes in old pages, in which case you've got new
>objects spread out across predominantly old pages.  The only way
>around this dilemma is to move objects.  Of course, for non-stable
>systems or even quasi-stable systems, it might be acceptible to "leak"
>a little memory by never plugging the holes in old pages with new objects.
>But, unless one can predict a finite lifetime for one's app, you've
>got to go back and fill those holes eventually.

Our system does not fill holes on pages that are 3/4 occupied by old objects.
(The choice of 3/4 is rather arbitrary.  I don't have numbers to back up
that choice.)  This does nothing to prevent the system from being stable.
In the (unlikely) worst case, it will prevent 1/4 of the available memory
from getting used.  The decision about whethether a page is 3/4 full of
old objects is recomputed at every full collection.  Thus holes are likely
to get filled eventually, without adversely affecting my original
argument about locality.

Thus stability is not really the issue.  We're wasting at most an amount
of space proportional to the amount of live space.  This is a concern.
But it's usually even more of a concern with copying collectors.  (Again,
compacting in place wins, but ...)

Hans
(boehm@xerox.com)