[net.ai] Garb Collect Symb vs Xerox

steve@siemens.UUCP (09/26/86)
I received mail that apparently also went onto the net, from Dan Hoey
(hoey@nrl-aic.ARPA).  He discussed garbage collection in response to
my unsupported allegation that, "S[ymbolics] talks about their garbage
collection more, but X[erox]'s is better."  I am glad to see someone
taking up an informed discussion in this area.

First, I briefly recap his letter, eliding (well-put) flames:

+ In the language of computer
+ science, Xerox reclaims storage using a ``reference counter''
+ technique, rather than a ``garbage collector.''

+ If we are to believe Xerox, the reference counter
+ technique is fundamentally faster, and reclaims acceptable amounts of
+ storage.  However, it is apparent that reference counters will never
+ reclaim circular list structure.  As a frequent user of circular list
+ structure (doubly-linked lists, anyone?), I find the lack tantamount to
+ a failure to reclaim storage.

+ I have never understood why Xerox continues to neglect to write a
+ garbage collector.  It is not necessary to stop using reference counts,
+ but simply to have a garbage collector available for those putatively
+ rare occasions when they run out of memory.

+ Dan Hoey

Xerox's system is designed for highly interactive use on a personal
workstation (sound familiar?).  They spread the work of storage reclamation
evenly throughout the computation by keeping reference counts.  Note that
they have many extra tricks such as "References from the stack are not
counted, but are handled separately at "sweep" time; thus the vast majority
of data manipulations do not cause updates to [the reference counts]"
(Interlisp-D Reference Manual, October, 1985).  Even if this scheme were
to use a greater total amount of CPU time than typical garbage collection,
it would remain more acceptable for use on a personal, highly interactive
workstation.  I have no idea how it can be compared to Symbolics for overall
performance, without comparing the entire Interlisp vs. Zetalisp systems.

Nevertheless, I can say that my experience is that Interlisp runs a "G.C."
every few seconds and it lasts, subjectively, an eyeblink.  Occasionally
I get it to take longer, for example when I zero my pointers to 1500 arrays
in one fell swoop.  I have some figures from one application, too.  An
old, shoddy implementation ran 113 seconds CPU and 37.5 seconds GC (25% GC).
A decent implementation of the same program, running a similar problem twice,
got 145 seconds CPU, but 10.8 and 20.3 seconds GC (6.9% and 12% GC).  (The
good implementation still doesn't have a good hashing function so it's still
slower.)  I cannot claim that these figures are representative.  I have
heard horror stories about other Lisps' GCs,
although I don't have any feel for Symbolics's "Ephemeral GC".

I have a strong feeling Xerox has other tricks besides the one about the
stack, which they don't want to tell anyone.  I know they recently fixed
the reference counter from counting 16 or more references as "infinity"
(and thus never reclaimable) to an overflow scheme where the reference
count gets squirreled away somewhere else when it gets bigger.

Finally, normally the amount of unreclaimed garbage (e.g. circular lists)
grows much slower than memory fragments, so you have to rebuild your
world before unreclaimed garbage becomes a problem anyway.

Postfinally, Xerox makes a big deal that their scheme takes time proportional
to the number of objects reclaimed, while traditional systems take time
proportional to the number of objects allocated.  I think Symbolics's
ephemeral scheme is a clever way to consider only subsets of the universe
of allocated objects, that are most likely to have garbage.  I wish I knew
whether it is a band-aid or an advance in the state-of-the-art.

Absolutely ultimately, "traditional" GC I refer to, is known as "mark-
and-sweep".

Steve Clark    {topaz or ihnp4}!princeton!siemens!steve