chase@Ozona.orc.olivetti.com (David Chase) (11/17/88)
In article <276@esosun.UUCP> jackson@freyja (Jerry Jackson) writes: >(I wrote) > >>(Yes, a garbage collector for C -- >> ~ftp/sun-source/gc.shar@titan.rice.edu >>It works on Sun3s, Sun4s, Vaxes. Send mail with subject "help" to >>"archive-server@rice.edu" if you lack FTP access.) >> > >I've written garbage collectors for lisp and have a pretty good idea >what is involved... I can't imagine what this does, but I'm pretty sure >it's something very different. Could someone please explain what this >program does? Sure (I thought you'd never ask :-). The collector was written by Hans-J. Boehm, Alan J. Demers, with contributions from Mark Weiser at Xerox and suggestions from me here. The algorithms are described in Boehm, H., and M. Weiser, "Garbage Collection in an Uncooperative Environment", Software Practice & Experience, September 1988. The collector/allocator segregates objects by size (Big Bags of Pages) and keeps a free list for each size. Objects too large to fit in a page, of course, are allocated using a different strategy. The mark bits for each piece of allocated memory are stored in a "bit array" at the beginning of each page; there are no "header words" prepended to allocated memory. Pages of objects can also be tagged as "non-pointer-containing" to improve the performance of the collector. After collection, pages are scanned to rebuild free lists and empty pages. Collection itself is just trace-and-sweep, using process stack to store temporary marking information. The collector makes certain assumptions about pointer use that don't hold for all C programs; however, they hold for most C programs. These are: 1) BSS, stack, registers, and data are always "live". 2) If something "live" contains a pointer to the *beginning* of a block of allocated memory, then that object is "live". 3) Linked lists store their "next" pointer in the first word of an object (this is just an important optimization). Obviously this collector will sometimes fail to reclaim an object, because some integers will look like pointers. In practice, this is not a problem. Obviously, this collector can be tricked into reclaiming something to which a pointer is retained if the pointer is encoded. Knuth's pointer XORing trick in lists is not compatible with this collector, and adding 4 to every pointer will also confuse it. Note, however, that there is nothing wrong with encoded pointers to an object AS LONG AS AT LEAST ONE pointer to the beginning remains. Thus, this collector can be used with a "front-end allocator" if the front-end allocator keeps its pools linked together. Most C programs do retain pointers to the beginning of objects, so this usually works. How is this used with C? Typically, I replace "malloc" with "garbage-collected malloc" and "free" with "do-nothing". The garbage collector supports freeing of allocated structures, but I don't bother. If free is do-nothing, then a trivial realloc maintains compatibility with the old-style realloc that people seem to worry about so much (this is so because if you can "realloc" a pointer, then you must have a copy of it somewhere, and if you have a copy of it somewhere, then in the garbage collector's eyes it hasn't really been freed, and thus hasn't been overwritten or reused. Note that with BIBOP allocation a realloc must almost always perform a new allocation.) What have we used this with? Code, lots of it. (Read the SP&E article; they ran it with lots of stuff, also.) Thousands of lines of C and Modula-2+, generated by tools, compiled by three different compilers (M2+, cc, gcc), taken from the X library, and recycled from years ago. We are perhaps not a typical group, since M2+ requires garbage collection, and all code written recently (by us) was written with the knowledge that a collector would be used. We've only had two problems with it: 1) if you don't put your linked-list links in the first word and have BIG lists, you'll gobble up lots and lots of stack. Note that signals are turned off during collection to avoid unpleasant surprises. 2) because there are no header words attached to objects, this allocator exposes overwriting bugs that otherwise go unnoticed until memory is freed (and if memory isn't freed, then they just go unnoticed). For more information, obtain the collector and read the README file. The shar file is about 67K large. David