[comp.lang.misc] Best garbage collection algorithm

ttwang@polyslo.CalPoly.EDU (Thomas Wang) (09/07/89)

I have decided to implement a garbage collection system for C++.  Currently
I am trying to pick the garbage collection algorithm to use.  Are there
better algorithms than 'generation scavenging'?  Does anyone have paper
references for the 'best' garbage collection algorithm?

Please e-mail or post.  Thanks.

 -Thomas Wang ("This is a fantastic comedy that Ataru and his wife Lum, an
                invader from space, cause excitement involving their neighbors."
                  - from a badly translated Urusei Yatsura poster)

                                                     ttwang@polyslo.calpoly.edu

wilson@carcoar.Stanford.EDU (Paul Wilson) (09/07/89)

In article <14236@polyslo.CalPoly.EDU> ttwang@polyslo.CalPoly.EDU (Thomas Wang) writes:
>I have decided to implement a garbage collection system for C++.  Currently
>I am trying to pick the garbage collection algorithm to use.  Are there
>better algorithms than 'generation scavenging'?  Does anyone have paper
>references for the 'best' garbage collection algorithm?
>

It's hard to say what's the "best" gc algorithm, since that varies
depending on things like what language you're doing it for.  C++
sounds like a major headache because it allows pointers to be converted
into integers or stepped past the end of an array, and doesn't distinguish
between array pointers and array element pointers.  Nasty confusing
things to do to a reasonable gc.

You probably have to go with something like a conservative mark-sweep
gc [Look for an article by Hans-Juergen Boehm et al., in Software
Practice and Experience within the last year or two, I think.]
This is not going to be the most efficient gc, but I don't know
if anything else will work for languages that weren't ever meant
to be garbage collected.  I believe Xerox' new Portable Common
Runtime system for their Cedar system uses something like this.

But if you have a reasonable language (like Lisp or Smalltalk or Modula-3) --

I'll be delivering a paper at the upcoming OOPSLA that gives my
humble view on the best gc algorithm for stock hardware, namely
mine. (:-)  Admittedly, my gc incorporates a lot of good ideas from
Ungar's Generation Scavenging, Moon's Ephemeral GC, Sobalvarro's
version of Moon's gc, and Bob Shaw's version of Generation Scavenging.
Eventually it will steal from Caudill & Wirfs-Brock's gc (for
Tektronix Smalltalk) as well.  (As Ungar & Jackson's OOPSLA 88 paper
shows, their (C & W-B's) Large Object Space is a good idea, handling large
objects specially so they don't actually have to be copied at
a scavenge.)

   I think you should have more than one generation scavenged online
(during user sessions), but I show how to reduce the likelihood
of noticeable pauses by careful scheduling.  This can also actually
increase gc efficiency by scavenging when there's not much work
to do (little live data to copy).

   I agree with Moon that things should be advanced quickly from
one generation to the next, generally after no more than two
scavenges.  I agree with Ungar, though, that one scavenge is
*not* enough -- some guarantee of age is worthwhile to
prevent premature advancment of very young objects, which
will probably die almost immediately and pollute the next older
generation.  I've simplified and modified Bob Shaw's "bucket brigade"
scheme for distinguishing objects' approximate ages by putting
them in different subdivisions of a generation.  This turns out
to allow scavenge thresholds *between* 1 and 2 scavenges, which
appears to be about right.

   Remembered set strategies also have their pros and cons.  Mine's
different from Ungar's and should work better for ill-behaved programs
(lots of writes to intermediate- or long-lifetime data), but his is
faster for nicely behaved programs.  It's not clear which is best
overall, especially since other factors enter into it, such as
whether you have a big cache.

   Oh yeah -- if you just happen to have hardware support for
fine-grained incremental garbage collection, Bob Courts' gc
is nifty, reorganizing data dynamically to improve locality 
[CACM, Sept. 88, I think].   Also, the Appel-Ellis-Li stock-hardware gc
from SIGPLAN '88 is interesting and beautiful and concurrent.
(Watch out for the chunkiness and potential burstiness of
its incrementalness, though.) 

   --  Paul

Paul R. Wilson                         
Software Systems Laboratory               lab ph.: (312) 996-9216
U. of Illin. at C. EECS Dept. (M/C 154)   wilson@bert.eecs.uic.edu
Box 4348   Chicago,IL 60680             wilson@carcoar.stanford.edu
Paul R. Wilson                         
Software Systems Laboratory               lab ph.: (312) 996-9216
U. of Illin. at C. EECS Dept. (M/C 154)   wilson@bert.eecs.uic.edu
Box 4348   Chicago,IL 60680