[comp.object] Precise GC

boehm@parc.xerox.com (Hans Boehm) (10/13/90)

>In article <1990Oct11.004854.11732@cbnewsc.att.com> lgm@cbnewsc.att.com (lawrence.g.mayka) writes:
>>b) Languages that essentially preclude precise (i.e., not
>>"conservative") garbage collection (Modula-3).
>Ah, seems to preclude C++ too.

"Precise" garbage collection is a rather imprecise concept and thus,
I would argue, a funny absolute criterion to use in language selection.
I know of many gradations of "conservativism" in garbage collection.
Here are some:

1) No structural information about heap objects, interior pointers valid.

2) No structural information in heap objects, no interior pointers.

3) Structural information on heap objects, conservative on stack and registers.
   (Recent collectors on Xerox D machines did this, as does Bartlett's
   collector and AKCL.)

4) Pointer location information maintained on stack and registers (i.e. tagged
   pointers or a compiler generated map of where pointers are at various
   programs points).  I believe Standard ML of NJ, and the original KCL
   fit here.

5) Same as (4), but also clears registers (or updates map) when their pointer
   values become known dead.  Some Lisps may do this.  (I wouldn't have much
   confidence in an implementation that claimed to do this, since its hard
   to test well.)

It is not always clear what kind of conservativism will hurt you.  For example,
it is my impression that on a SPARC (with its sparsely occupied stacks),
the step from (4) to (3) is likely to cause more spurious retention than going
from (3) to (2).  In most cases, even (1) will work fine.  But on a machine
with 16 bit pointer alignment (as in David Chase's bad experience with
hyperconservative collection) there is a good possibility that it won't.

Even (5) is not in any sense precise.  It depends both on the compilers dead
variable analysis, and on other choices made by the compiler writer.
For example, will the following program run out of heap space:

let
    f = function(n:integer)
           let
		x = new 100000 byte object
	   in
		/* drop x */; if n > 0 then f(n-1)

in
    f (10000);

If my compiler does tail recursion elimination, or does good dead variable
analysis, I'm in luck.  If it waits until x goes out of scope to clear its
register, and does not do tail recursion elimination, the garbage collector
won't be able to reclaim anything.  There is another illustration in Weiser's
and my SP&E paper.

To really define "precise" collection, you need a language definition that
defines accessibility of objects.  I know of no such language definition.
Even if you came up with one, I strongly suspect that it would overconstrain
the compiler writer to the point that you won't be happy with the performance.

Garbage collectors are never perfect.  But in my experience programmers
manually managing storage are much less perfect still.  And programmers
may also err in more disastrous ways.

craig@Neon.Stanford.EDU (Craig D. Chambers) (10/13/90)

In article <563@roo.UUCP> boehm@parc.xerox.com (Hans Boehm) writes:
>4) Pointer location information maintained on stack and registers (i.e. tagged
>   pointers or a compiler generated map of where pointers are at various
>   programs points).  I believe Standard ML of NJ, and the original KCL
>   fit here.
>
>5) Same as (4), but also clears registers (or updates map) when their pointer
>   values become known dead.  Some Lisps may do this.  (I wouldn't have much
>   confidence in an implementation that claimed to do this, since its hard
>   to test well.)

The Self compiler does (5) (on the SPARC), using a compiler-generated
register mask word for each call site, identifying live registers.
All pointers are tagged, so the live registers mask includes all
registers that may contain a pointer.  And I have confidence in my
implementation, but I'd probably be hard pressed to convince you to
share my confidence!

>let
>    f = function(n:integer)
>           let
>		x = new 100000 byte object
>	   in
>		/* drop x */; if n > 0 then f(n-1)
>
>in
>    f (10000);

>If my compiler does tail recursion elimination, or does good dead variable
>analysis, I'm in luck.  If it waits until x goes out of scope to clear its
>register, and does not do tail recursion elimination, the garbage collector
>won't be able to reclaim anything.  There is another illustration in Weiser's
>and my SP&E paper.
>
>To really define "precise" collection, you need a language definition that
>defines accessibility of objects.  I know of no such language definition.
>Even if you came up with one, I strongly suspect that it would overconstrain
>the compiler writer to the point that you won't be happy with the performance.

The Self compiler is constrained to preserve the appearance of a naive
interpreter directly executing the source code, and to present a view
of the program at debug-time as if no optimizations had been
performed.  This implies that all user-visible variables (including
the "x" local variable above) may be viewed at debug time and so don't
die until the scope terminates.  This also implies that tail-recursion
elimination is practically impossible to perform without ruining the
user's view of the program (to get around this problem without
banishing loops Self includes a _Restart primitive which basically is
a tail-recursive call that the programmer explicit wants eliminated).
Under these constraints, the above program will run out of memory,
unless your debugger is smart enough to allocate a bunch of "x"
objects only when the debugger views the stack frames, and then you'll
only run out of memory at debug time.  Of course, if "x" were some
internal temporary expression that the user could not view at debug
time, the Self compiler would reclaim the storage (in fact, it would
optimize away the allocation entirely as a useless computation).

If this meets your definition of a "language definition that defines
accessibility of objects," then I'd have to disagree that this
overconstrains the compiler writer (i.e. me).  We're pretty happy with
our performance, which averages around 2x slower than optimized C,
depending on the benchmarks you use (1.5x slower for the Stanford
integer benchmarks), perhaps because we don't write programs that do
what you describe above.  We've found that this restictive debugging
model prevents very few optimizations. The only kinds of optimizations
forbidden that I can think of are the kinds where a variable that's
still in scope is treated as dead by the compiler (e.g. tail-recursion
elimination, dead assignment elimination, reusing registers of dead
variables).  The smarter the compiler and the debugger, the more
optimizations can be performed by the compiler and still dealt with by
the debugger.

-- Craig

lgm@cbnewsc.att.com (lawrence.g.mayka) (10/16/90)

In article <563@roo.UUCP>, boehm@parc.xerox.com (Hans Boehm) writes:
> "Precise" garbage collection is a rather imprecise concept and thus,
> I would argue, a funny absolute criterion to use in language selection.
> I know of many gradations of "conservativism" in garbage collection.

I was referring to "exact" garbage collection, which collects *all*
inaccessible storage.  My application must run continuously over long
periods of time without leaking memory.  *Any* degree of conservatism
is too conservative for me (in the matter of GC, anyway).

> 4) Pointer location information maintained on stack and registers (i.e. tagged
>    pointers or a compiler generated map of where pointers are at various
>    programs points).  I believe Standard ML of NJ, and the original KCL
>    fit here.

Bingo!

> Garbage collectors are never perfect.  But in my experience programmers
> manually managing storage are much less perfect still.  And programmers
> may also err in more disastrous ways.

I am certainly not advocating manual storage management.

	Lawrence G. Mayka
	AT&T Bell Laboratories
	lgm@iexist.att.com

Standard disclaimer.

boehm@parc.xerox.com (Hans Boehm) (10/17/90)

In article <1990Oct12.221032.20917@Neon.Stanford.EDU> craig@Neon.Stanford.EDU (Craig D. Chambers) writes:

[Me:]
>>To really define "precise" collection, you need a language definition that
>>defines accessibility of objects.  I know of no such language definition.
>>Even if you came up with one, I strongly suspect that it would overconstrain
>>the compiler writer to the point that you won't be happy with the performance.

[Craig:]
>The Self compiler is constrained to preserve the appearance of a naive
>interpreter directly executing the source code, and to present a view
>of the program at debug-time as if no optimizations had been
>performed.  This implies that all user-visible variables (including
>the "x" local variable above) may be viewed at debug time and so don't
>die until the scope terminates.  This also implies that tail-recursion
>elimination is practically impossible to perform without ruining the
>user's view of the program (to get around this problem without
>banishing loops Self includes a _Restart primitive which basically is
>a tail-recursive call that the programmer explicit wants eliminated).
>Under these constraints, the above program will run out of memory,
>unless your debugger is smart enough to allocate a bunch of "x"
>objects only when the debugger views the stack frames, and then you'll
>only run out of memory at debug time.  Of course, if "x" were some
>internal temporary expression that the user could not view at debug
>time, the Self compiler would reclaim the storage (in fact, it would
>optimize away the allocation entirely as a useless computation).

This comes closer to defining object accessibility than anything else
I've seen.  However I don't think it's quite sufficient, since I can
construct similar examples using debugger-inaccessible temporaries.
For example, consider the following code, where X is known to be a large
bignum.  (Similar examples can be constructed with lists, but they're a
bit less intuitive.)  I assume that bignum arithmetic dynamically allocates
space in the heap to hold results.

let
   y = X
in
   i := 0;
   while i < 10000 do
     i++;
     z := y * y + g(i);
     f(z);
   endwhile

I would certainly like my compiler to move "y * y" (or any such expression)
out of the loop.  But that isn't allowed, since doing so would preserve
the temporary across the procedure call.  If f ends up invoking the garbage
collector, I will fail to reclaim the storage used to hold y*y.  Thus some
conservativism would have crept back into my collector.

>If this meets your definition of a "language definition that defines
>accessibility of objects," then I'd have to disagree that this
>overconstrains the compiler writer (i.e. me).  We're pretty happy with
>our performance, which averages around 2x slower than optimized C,
>depending on the benchmarks you use (1.5x slower for the Stanford
>integer benchmarks), perhaps because we don't write programs that do
>what you describe above.  We've found that this restictive debugging
>model prevents very few optimizations. The only kinds of optimizations
>forbidden that I can think of are the kinds where a variable that's
>still in scope is treated as dead by the compiler (e.g. tail-recursion
>elimination, dead assignment elimination, reusing registers of dead
>variables).  The smarter the compiler and the debugger, the more
>optimizations can be performed by the compiler and still dealt with by
>the debugger.

>-- Craig

Your factor of 2 presumably buys me all sorts of things, so I'm not quite
sure what to make of it.  For most applications, I would be very unhappy
if I had to pay a factor of 2 solely for "precise" collection. It would
be nice to know how those costs broke down.  I suspect that not being able
to reuse registers hurts a lot more on a '386 than a SPARC.

Hans

craig@Neon.Stanford.EDU (Craig D. Chambers) (10/17/90)

In article <570@roo.UUCP> boehm@parc.xerox.com (Hans Boehm) writes:
>let
>   y = X
>in
>   i := 0;
>   while i < 10000 do
>     i++;
>     z := y * y + g(i);
>     f(z);
>   endwhile
>
>I would certainly like my compiler to move "y * y" (or any such expression)
>out of the loop.  But that isn't allowed, since doing so would preserve
>the temporary across the procedure call.  If f ends up invoking the garbage
>collector, I will fail to reclaim the storage used to hold y*y.  Thus some
>conservativism would have crept back into my collector.

Yes, you are right about this aspect of the language being undefined
(i.e. when temp expressions have their space reclaimed).  All the
language specifies is the minimum time that an expression must remain
allocated, not when it is required to be reclaimed.  All kinds of
optimizations might lengthen the effective lifetime of heap storage
(one of which you mention above).

>Your factor of 2 presumably buys me all sorts of things, so I'm not quite
>sure what to make of it.  For most applications, I would be very unhappy
>if I had to pay a factor of 2 solely for "precise" collection. It would
>be nice to know how those costs broke down.  I suspect that not being able
>to reuse registers hurts a lot more on a '386 than a SPARC.

Yes, you're right again, my numbers don't really say anything about
the cost of preserving "dead" variables solely for the debugger.  My
intuition from examining the generated code is that in C-style
benchmarks, all the message sends are eliminated (in the common case
branches), and so there is virtually no run-time cost for this
debugger model (since the only place that the debugger could look at
the values of variables is at the interrupt point at the end of the
loop, and most variables in question are out of scope by then).  Note
that I'm talking about the optimized code that contains no breakpoints
and isn't being single-stepped through.  To implement breakpoints and
single-stepping, the compiler could generate another version of the
code with more potential interrupt points, and this code might be a
bit slower than the normal run-time code.

But real measurements are needed to say anything for sure, and so I'll
add that measurement to the performance chapter of my thesis (which
I'm writing right now, BTW).  I had intended to do the analysis
anyway, but your posting will help make sure I get around to it.
Hopefully the 68k version of the new compiler will be working soon,
and we can see how much of an impact this debugger requirement has on
a relatively register-starved machine.  (Does a '386 have even fewer
registers?  I can configure the code generator to use only as many
physical registers as I tell it to, so I could crudely simulate a
machine with an arbitrarily small number of registers.)

-- Craig

mark@parc.xerox.com (Mark Weiser) (10/18/90)

In article <1990Oct16.015230.24169@cbnewsc.att.com> lgm@cbnewsc.att.com (lawrence.g.mayka) writes:

>I was referring to "exact" garbage collection, which collects *all*
>inaccessible storage.  My application must run continuously over long
>periods of time without leaking memory.  *Any* degree of conservatism
>is too conservative for me (in the matter of GC, anyway).

I am not sure you read Hans carefully.  One of his points was that
there is no such thing as exact.

With regard to running continuously over long periods of time: of
course.  We have many servers written in Cedar with this
characteristic.  Cedar has used partially conservative (from stacks)
collection since 1982 or so, and fully conservative since 1988 or so.
All is well.

And it is important to distinguish kinds of leaks.  Conservative
collectors tend to leak once and then stop--because some magic number
in the data or stack has held an object inappropriately.  But this is
not a growing leak.  Always just that one object.  There are some
cases to be careful of--if that one object is the head of a long growing
linked list you were hoping to be collected you could be in trouble.
Our practice here is to explicitly zero the references out of linked
lists as we are done with them.  

-mark

--
Spoken: Mark Weiser 	ARPA:	weiser@xerox.com	Phone: +1-415-494-4406

boehm@parc.xerox.com (Hans Boehm) (10/19/90)

In article <1990Oct16.015230.24169@cbnewsc.att.com> lgm@cbnewsc.att.com (lawrence.g.mayka) writes:

>In article <563@roo.UUCP>, boehm@parc.xerox.com (Hans Boehm) writes:
>> "Precise" garbage collection is a rather imprecise concept and thus,
>> I would argue, a funny absolute criterion to use in language selection.
>> I know of many gradations of "conservativism" in garbage collection.

>I was referring to "exact" garbage collection, which collects *all*
>inaccessible storage.  My application must run continuously over long
>periods of time without leaking memory.  *Any* degree of conservatism
>is too conservative for me (in the matter of GC, anyway).

The problem again is that "accessibility" of storage is not defined by
any language specification that I know of (with SELF apparently coming
closer than most).  In practice, it tends to be defined both by the
collector and the compiler.  Imposing restrictions on the collector isn't
enough.

>> 4) Pointer location information maintained on stack and registers (i.e. tagged
>>    pointers or a compiler generated map of where pointers are at various
>>    programs points).  I believe Standard ML of NJ, and the original KCL
>>    fit here.

>Bingo!

Again, this isn't enough to guarantee anything interesting.  Here's another,
somewhat more involved, example.  Assume that I have a program P that
manipulates a queue Q, which is implemented as a singly linked list with
head and tail pointers H and T.  Consider the following pseudo-program:
(Scopes are indicated by indentation level.)

let
    H = a pointer variable;
    T = a pointer variable;
    F = a function variable of appropriate type;
in
    let
	first = initial Queue entry (an allocated object);
    in
        let
	    f = function not mentioning first;
	    g = function mentioning first;
        in
	    H := first; T := first; F := f; g();

P;

Assume that P runs for a long time, and adds and removes many items from Q.
Assume further that the length of the queue is bounded.
Assume also that it does no other allocation, and that it calls F occasionally.
Does this program consume more and more storage?  It does if a reference to
first is retained.  This can happen because one of the following holds:

1) The collector is conservative, and there is a bogus reference to first.
2) first was allocated in as a tagged pointer on the stack, and the location
  was not cleared, and it was not overwritten by P's data.  (This is likely
  on SPARC-like architectures that tend to leave lots of holes in the stack.)
3) The compiler decided to represent the function f as an <environment pointer,
  instruction pointer> pair.  The environment for f contains the binding for
  first.
4) The compiler copies bindings into closures, but tries to share closures if
  possible.  It created a single record containing the bindings used by both
  f and g.  Since f is still accessible, it will retain the binding of first.
5) The compiler performed some other baroque optimization that extended the
  lifetime of first.

None of these optimizations are forbidden by most language specifications.
Sometimes they can be very profitable.  (My impression is that SML of NJ
does 4, though perhaps not in this case.  Nobody other than the implementors
would know.)

Generational collection can also introduce similar phenomena if your heap gets
big enough that you can't afford periodic full collections.

If you are writing a long running server, it is possible to program defensively
against such things.  But the same is true for conservative collectors.
Empirically, long running servers usually work just fine with conservative
collection.  There is a partial explanation of this, and a good discussion of
an EXTREME form of conservative garbage collection in

Wentworth, E. P., "Pitfalls of Conservative Garbage Collection",
SP&E 20, 7 (July 1990) pp. 719-727.

It points out when conservative garbage collection works even in very
densely occupied address spaces, and when it doesn't.  I claim all the
failures he discusses can also occur with nonconservative collection
and compiler optimization.  He just cranked up the probabilities high
enough to make them easily observable.

Hans