steve@siemens.UUCP (09/26/86)
I received mail that apparently also went onto the net, from Dan Hoey (hoey@nrl-aic.ARPA). He discussed garbage collection in response to my unsupported allegation that, "S[ymbolics] talks about their garbage collection more, but X[erox]'s is better." I am glad to see someone taking up an informed discussion in this area. First, I briefly recap his letter, eliding (well-put) flames: + In the language of computer + science, Xerox reclaims storage using a ``reference counter'' + technique, rather than a ``garbage collector.'' + If we are to believe Xerox, the reference counter + technique is fundamentally faster, and reclaims acceptable amounts of + storage. However, it is apparent that reference counters will never + reclaim circular list structure. As a frequent user of circular list + structure (doubly-linked lists, anyone?), I find the lack tantamount to + a failure to reclaim storage. + I have never understood why Xerox continues to neglect to write a + garbage collector. It is not necessary to stop using reference counts, + but simply to have a garbage collector available for those putatively + rare occasions when they run out of memory. + Dan Hoey Xerox's system is designed for highly interactive use on a personal workstation (sound familiar?). They spread the work of storage reclamation evenly throughout the computation by keeping reference counts. Note that they have many extra tricks such as "References from the stack are not counted, but are handled separately at "sweep" time; thus the vast majority of data manipulations do not cause updates to [the reference counts]" (Interlisp-D Reference Manual, October, 1985). Even if this scheme were to use a greater total amount of CPU time than typical garbage collection, it would remain more acceptable for use on a personal, highly interactive workstation. I have no idea how it can be compared to Symbolics for overall performance, without comparing the entire Interlisp vs. Zetalisp systems. Nevertheless, I can say that my experience is that Interlisp runs a "G.C." every few seconds and it lasts, subjectively, an eyeblink. Occasionally I get it to take longer, for example when I zero my pointers to 1500 arrays in one fell swoop. I have some figures from one application, too. An old, shoddy implementation ran 113 seconds CPU and 37.5 seconds GC (25% GC). A decent implementation of the same program, running a similar problem twice, got 145 seconds CPU, but 10.8 and 20.3 seconds GC (6.9% and 12% GC). (The good implementation still doesn't have a good hashing function so it's still slower.) I cannot claim that these figures are representative. I have heard horror stories about other Lisps' GCs, although I don't have any feel for Symbolics's "Ephemeral GC". I have a strong feeling Xerox has other tricks besides the one about the stack, which they don't want to tell anyone. I know they recently fixed the reference counter from counting 16 or more references as "infinity" (and thus never reclaimable) to an overflow scheme where the reference count gets squirreled away somewhere else when it gets bigger. Finally, normally the amount of unreclaimed garbage (e.g. circular lists) grows much slower than memory fragments, so you have to rebuild your world before unreclaimed garbage becomes a problem anyway. Postfinally, Xerox makes a big deal that their scheme takes time proportional to the number of objects reclaimed, while traditional systems take time proportional to the number of objects allocated. I think Symbolics's ephemeral scheme is a clever way to consider only subsets of the universe of allocated objects, that are most likely to have garbage. I wish I knew whether it is a band-aid or an advance in the state-of-the-art. Absolutely ultimately, "traditional" GC I refer to, is known as "mark- and-sweep". Steve Clark {topaz or ihnp4}!princeton!siemens!steve