[comp.lang.c] C garbage collection

chase@Ozona.orc.olivetti.com (David Chase) (11/17/88)

In article <276@esosun.UUCP> jackson@freyja (Jerry Jackson) writes:
>(I wrote)
>
>>(Yes, a garbage collector for C --
>>           ~ftp/sun-source/gc.shar@titan.rice.edu
>>It works on Sun3s, Sun4s, Vaxes.  Send mail with subject "help" to
>>"archive-server@rice.edu" if you lack FTP access.)
>>
>
>I've written garbage collectors for lisp and have a pretty good idea
>what is involved... I can't imagine what this does, but I'm pretty sure
>it's something very different.  Could someone please explain what this
>program does?

Sure (I thought you'd never ask :-).  The collector was written by
Hans-J. Boehm, Alan J. Demers, with contributions from Mark Weiser at
Xerox and suggestions from me here.  The algorithms are described in 

  Boehm, H., and M. Weiser, "Garbage Collection in an Uncooperative
  Environment", Software Practice & Experience, September 1988.

The collector/allocator segregates objects by size (Big Bags of Pages)
and keeps a free list for each size.  Objects too large to fit in a
page, of course, are allocated using a different strategy.  The mark
bits for each piece of allocated memory are stored in a "bit array" at
the beginning of each page; there are no "header words" prepended to
allocated memory.  Pages of objects can also be tagged as
"non-pointer-containing" to improve the performance of the collector.
After collection, pages are scanned to rebuild free lists and empty
pages.  Collection itself is just trace-and-sweep, using process stack
to store temporary marking information.

The collector makes certain assumptions about pointer use that don't
hold for all C programs; however, they hold for most C programs.
These are:

1) BSS, stack, registers, and data are always "live".
2) If something "live" contains a pointer to the *beginning* of a
   block of allocated memory, then that object is "live".
3) Linked lists store their "next" pointer in the first word
   of an object (this is just an important optimization).

Obviously this collector will sometimes fail to reclaim an object,
because some integers will look like pointers.  In practice, this is
not a problem.  Obviously, this collector can be tricked into
reclaiming something to which a pointer is retained if the pointer is
encoded.  Knuth's pointer XORing trick in lists is not compatible with
this collector, and adding 4 to every pointer will also confuse it.
Note, however, that there is nothing wrong with encoded pointers to an
object AS LONG AS AT LEAST ONE pointer to the beginning remains.
Thus, this collector can be used with a "front-end allocator" if the
front-end allocator keeps its pools linked together.  Most C programs
do retain pointers to the beginning of objects, so this usually works.

How is this used with C?  Typically, I replace "malloc" with
"garbage-collected malloc" and "free" with "do-nothing".  The garbage
collector supports freeing of allocated structures, but I don't bother.
If free is do-nothing, then a trivial realloc maintains compatibility
with the old-style realloc that people seem to worry about so much
(this is so because if you can "realloc" a pointer, then you must have
a copy of it somewhere, and if you have a copy of it somewhere, then
in the garbage collector's eyes it hasn't really been freed, and thus
hasn't been overwritten or reused.  Note that with BIBOP allocation a
realloc must almost always perform a new allocation.)

What have we used this with?  Code, lots of it.  (Read the SP&E
article; they ran it with lots of stuff, also.)  Thousands of lines of
C and Modula-2+, generated by tools, compiled by three different
compilers (M2+, cc, gcc), taken from the X library, and recycled from
years ago.  We are perhaps not a typical group, since M2+ requires
garbage collection, and all code written recently (by us) was written
with the knowledge that a collector would be used.

We've only had two problems with it:
1) if you don't put your linked-list links in the first word and have
   BIG lists, you'll gobble up lots and lots of stack.  Note that
   signals are turned off during collection to avoid unpleasant
   surprises.

2) because there are no header words attached to objects, this
   allocator exposes overwriting bugs that otherwise go unnoticed
   until memory is freed (and if memory isn't freed, then they just
   go unnoticed).

For more information, obtain the collector and read the README file.
The shar file is about 67K large.

David