boehm@parc.xerox.com (Hans Boehm) (10/13/90)
>In article <1990Oct11.004854.11732@cbnewsc.att.com> lgm@cbnewsc.att.com (lawrence.g.mayka) writes: >>b) Languages that essentially preclude precise (i.e., not >>"conservative") garbage collection (Modula-3). >Ah, seems to preclude C++ too. "Precise" garbage collection is a rather imprecise concept and thus, I would argue, a funny absolute criterion to use in language selection. I know of many gradations of "conservativism" in garbage collection. Here are some: 1) No structural information about heap objects, interior pointers valid. 2) No structural information in heap objects, no interior pointers. 3) Structural information on heap objects, conservative on stack and registers. (Recent collectors on Xerox D machines did this, as does Bartlett's collector and AKCL.) 4) Pointer location information maintained on stack and registers (i.e. tagged pointers or a compiler generated map of where pointers are at various programs points). I believe Standard ML of NJ, and the original KCL fit here. 5) Same as (4), but also clears registers (or updates map) when their pointer values become known dead. Some Lisps may do this. (I wouldn't have much confidence in an implementation that claimed to do this, since its hard to test well.) It is not always clear what kind of conservativism will hurt you. For example, it is my impression that on a SPARC (with its sparsely occupied stacks), the step from (4) to (3) is likely to cause more spurious retention than going from (3) to (2). In most cases, even (1) will work fine. But on a machine with 16 bit pointer alignment (as in David Chase's bad experience with hyperconservative collection) there is a good possibility that it won't. Even (5) is not in any sense precise. It depends both on the compilers dead variable analysis, and on other choices made by the compiler writer. For example, will the following program run out of heap space: let f = function(n:integer) let x = new 100000 byte object in /* drop x */; if n > 0 then f(n-1) in f (10000); If my compiler does tail recursion elimination, or does good dead variable analysis, I'm in luck. If it waits until x goes out of scope to clear its register, and does not do tail recursion elimination, the garbage collector won't be able to reclaim anything. There is another illustration in Weiser's and my SP&E paper. To really define "precise" collection, you need a language definition that defines accessibility of objects. I know of no such language definition. Even if you came up with one, I strongly suspect that it would overconstrain the compiler writer to the point that you won't be happy with the performance. Garbage collectors are never perfect. But in my experience programmers manually managing storage are much less perfect still. And programmers may also err in more disastrous ways.
craig@Neon.Stanford.EDU (Craig D. Chambers) (10/13/90)
In article <563@roo.UUCP> boehm@parc.xerox.com (Hans Boehm) writes: >4) Pointer location information maintained on stack and registers (i.e. tagged > pointers or a compiler generated map of where pointers are at various > programs points). I believe Standard ML of NJ, and the original KCL > fit here. > >5) Same as (4), but also clears registers (or updates map) when their pointer > values become known dead. Some Lisps may do this. (I wouldn't have much > confidence in an implementation that claimed to do this, since its hard > to test well.) The Self compiler does (5) (on the SPARC), using a compiler-generated register mask word for each call site, identifying live registers. All pointers are tagged, so the live registers mask includes all registers that may contain a pointer. And I have confidence in my implementation, but I'd probably be hard pressed to convince you to share my confidence! >let > f = function(n:integer) > let > x = new 100000 byte object > in > /* drop x */; if n > 0 then f(n-1) > >in > f (10000); >If my compiler does tail recursion elimination, or does good dead variable >analysis, I'm in luck. If it waits until x goes out of scope to clear its >register, and does not do tail recursion elimination, the garbage collector >won't be able to reclaim anything. There is another illustration in Weiser's >and my SP&E paper. > >To really define "precise" collection, you need a language definition that >defines accessibility of objects. I know of no such language definition. >Even if you came up with one, I strongly suspect that it would overconstrain >the compiler writer to the point that you won't be happy with the performance. The Self compiler is constrained to preserve the appearance of a naive interpreter directly executing the source code, and to present a view of the program at debug-time as if no optimizations had been performed. This implies that all user-visible variables (including the "x" local variable above) may be viewed at debug time and so don't die until the scope terminates. This also implies that tail-recursion elimination is practically impossible to perform without ruining the user's view of the program (to get around this problem without banishing loops Self includes a _Restart primitive which basically is a tail-recursive call that the programmer explicit wants eliminated). Under these constraints, the above program will run out of memory, unless your debugger is smart enough to allocate a bunch of "x" objects only when the debugger views the stack frames, and then you'll only run out of memory at debug time. Of course, if "x" were some internal temporary expression that the user could not view at debug time, the Self compiler would reclaim the storage (in fact, it would optimize away the allocation entirely as a useless computation). If this meets your definition of a "language definition that defines accessibility of objects," then I'd have to disagree that this overconstrains the compiler writer (i.e. me). We're pretty happy with our performance, which averages around 2x slower than optimized C, depending on the benchmarks you use (1.5x slower for the Stanford integer benchmarks), perhaps because we don't write programs that do what you describe above. We've found that this restictive debugging model prevents very few optimizations. The only kinds of optimizations forbidden that I can think of are the kinds where a variable that's still in scope is treated as dead by the compiler (e.g. tail-recursion elimination, dead assignment elimination, reusing registers of dead variables). The smarter the compiler and the debugger, the more optimizations can be performed by the compiler and still dealt with by the debugger. -- Craig
lgm@cbnewsc.att.com (lawrence.g.mayka) (10/16/90)
In article <563@roo.UUCP>, boehm@parc.xerox.com (Hans Boehm) writes: > "Precise" garbage collection is a rather imprecise concept and thus, > I would argue, a funny absolute criterion to use in language selection. > I know of many gradations of "conservativism" in garbage collection. I was referring to "exact" garbage collection, which collects *all* inaccessible storage. My application must run continuously over long periods of time without leaking memory. *Any* degree of conservatism is too conservative for me (in the matter of GC, anyway). > 4) Pointer location information maintained on stack and registers (i.e. tagged > pointers or a compiler generated map of where pointers are at various > programs points). I believe Standard ML of NJ, and the original KCL > fit here. Bingo! > Garbage collectors are never perfect. But in my experience programmers > manually managing storage are much less perfect still. And programmers > may also err in more disastrous ways. I am certainly not advocating manual storage management. Lawrence G. Mayka AT&T Bell Laboratories lgm@iexist.att.com Standard disclaimer.
boehm@parc.xerox.com (Hans Boehm) (10/17/90)
In article <1990Oct12.221032.20917@Neon.Stanford.EDU> craig@Neon.Stanford.EDU (Craig D. Chambers) writes: [Me:] >>To really define "precise" collection, you need a language definition that >>defines accessibility of objects. I know of no such language definition. >>Even if you came up with one, I strongly suspect that it would overconstrain >>the compiler writer to the point that you won't be happy with the performance. [Craig:] >The Self compiler is constrained to preserve the appearance of a naive >interpreter directly executing the source code, and to present a view >of the program at debug-time as if no optimizations had been >performed. This implies that all user-visible variables (including >the "x" local variable above) may be viewed at debug time and so don't >die until the scope terminates. This also implies that tail-recursion >elimination is practically impossible to perform without ruining the >user's view of the program (to get around this problem without >banishing loops Self includes a _Restart primitive which basically is >a tail-recursive call that the programmer explicit wants eliminated). >Under these constraints, the above program will run out of memory, >unless your debugger is smart enough to allocate a bunch of "x" >objects only when the debugger views the stack frames, and then you'll >only run out of memory at debug time. Of course, if "x" were some >internal temporary expression that the user could not view at debug >time, the Self compiler would reclaim the storage (in fact, it would >optimize away the allocation entirely as a useless computation). This comes closer to defining object accessibility than anything else I've seen. However I don't think it's quite sufficient, since I can construct similar examples using debugger-inaccessible temporaries. For example, consider the following code, where X is known to be a large bignum. (Similar examples can be constructed with lists, but they're a bit less intuitive.) I assume that bignum arithmetic dynamically allocates space in the heap to hold results. let y = X in i := 0; while i < 10000 do i++; z := y * y + g(i); f(z); endwhile I would certainly like my compiler to move "y * y" (or any such expression) out of the loop. But that isn't allowed, since doing so would preserve the temporary across the procedure call. If f ends up invoking the garbage collector, I will fail to reclaim the storage used to hold y*y. Thus some conservativism would have crept back into my collector. >If this meets your definition of a "language definition that defines >accessibility of objects," then I'd have to disagree that this >overconstrains the compiler writer (i.e. me). We're pretty happy with >our performance, which averages around 2x slower than optimized C, >depending on the benchmarks you use (1.5x slower for the Stanford >integer benchmarks), perhaps because we don't write programs that do >what you describe above. We've found that this restictive debugging >model prevents very few optimizations. The only kinds of optimizations >forbidden that I can think of are the kinds where a variable that's >still in scope is treated as dead by the compiler (e.g. tail-recursion >elimination, dead assignment elimination, reusing registers of dead >variables). The smarter the compiler and the debugger, the more >optimizations can be performed by the compiler and still dealt with by >the debugger. >-- Craig Your factor of 2 presumably buys me all sorts of things, so I'm not quite sure what to make of it. For most applications, I would be very unhappy if I had to pay a factor of 2 solely for "precise" collection. It would be nice to know how those costs broke down. I suspect that not being able to reuse registers hurts a lot more on a '386 than a SPARC. Hans
craig@Neon.Stanford.EDU (Craig D. Chambers) (10/17/90)
In article <570@roo.UUCP> boehm@parc.xerox.com (Hans Boehm) writes: >let > y = X >in > i := 0; > while i < 10000 do > i++; > z := y * y + g(i); > f(z); > endwhile > >I would certainly like my compiler to move "y * y" (or any such expression) >out of the loop. But that isn't allowed, since doing so would preserve >the temporary across the procedure call. If f ends up invoking the garbage >collector, I will fail to reclaim the storage used to hold y*y. Thus some >conservativism would have crept back into my collector. Yes, you are right about this aspect of the language being undefined (i.e. when temp expressions have their space reclaimed). All the language specifies is the minimum time that an expression must remain allocated, not when it is required to be reclaimed. All kinds of optimizations might lengthen the effective lifetime of heap storage (one of which you mention above). >Your factor of 2 presumably buys me all sorts of things, so I'm not quite >sure what to make of it. For most applications, I would be very unhappy >if I had to pay a factor of 2 solely for "precise" collection. It would >be nice to know how those costs broke down. I suspect that not being able >to reuse registers hurts a lot more on a '386 than a SPARC. Yes, you're right again, my numbers don't really say anything about the cost of preserving "dead" variables solely for the debugger. My intuition from examining the generated code is that in C-style benchmarks, all the message sends are eliminated (in the common case branches), and so there is virtually no run-time cost for this debugger model (since the only place that the debugger could look at the values of variables is at the interrupt point at the end of the loop, and most variables in question are out of scope by then). Note that I'm talking about the optimized code that contains no breakpoints and isn't being single-stepped through. To implement breakpoints and single-stepping, the compiler could generate another version of the code with more potential interrupt points, and this code might be a bit slower than the normal run-time code. But real measurements are needed to say anything for sure, and so I'll add that measurement to the performance chapter of my thesis (which I'm writing right now, BTW). I had intended to do the analysis anyway, but your posting will help make sure I get around to it. Hopefully the 68k version of the new compiler will be working soon, and we can see how much of an impact this debugger requirement has on a relatively register-starved machine. (Does a '386 have even fewer registers? I can configure the code generator to use only as many physical registers as I tell it to, so I could crudely simulate a machine with an arbitrarily small number of registers.) -- Craig
mark@parc.xerox.com (Mark Weiser) (10/18/90)
In article <1990Oct16.015230.24169@cbnewsc.att.com> lgm@cbnewsc.att.com (lawrence.g.mayka) writes: >I was referring to "exact" garbage collection, which collects *all* >inaccessible storage. My application must run continuously over long >periods of time without leaking memory. *Any* degree of conservatism >is too conservative for me (in the matter of GC, anyway). I am not sure you read Hans carefully. One of his points was that there is no such thing as exact. With regard to running continuously over long periods of time: of course. We have many servers written in Cedar with this characteristic. Cedar has used partially conservative (from stacks) collection since 1982 or so, and fully conservative since 1988 or so. All is well. And it is important to distinguish kinds of leaks. Conservative collectors tend to leak once and then stop--because some magic number in the data or stack has held an object inappropriately. But this is not a growing leak. Always just that one object. There are some cases to be careful of--if that one object is the head of a long growing linked list you were hoping to be collected you could be in trouble. Our practice here is to explicitly zero the references out of linked lists as we are done with them. -mark -- Spoken: Mark Weiser ARPA: weiser@xerox.com Phone: +1-415-494-4406
boehm@parc.xerox.com (Hans Boehm) (10/19/90)
In article <1990Oct16.015230.24169@cbnewsc.att.com> lgm@cbnewsc.att.com (lawrence.g.mayka) writes: >In article <563@roo.UUCP>, boehm@parc.xerox.com (Hans Boehm) writes: >> "Precise" garbage collection is a rather imprecise concept and thus, >> I would argue, a funny absolute criterion to use in language selection. >> I know of many gradations of "conservativism" in garbage collection. >I was referring to "exact" garbage collection, which collects *all* >inaccessible storage. My application must run continuously over long >periods of time without leaking memory. *Any* degree of conservatism >is too conservative for me (in the matter of GC, anyway). The problem again is that "accessibility" of storage is not defined by any language specification that I know of (with SELF apparently coming closer than most). In practice, it tends to be defined both by the collector and the compiler. Imposing restrictions on the collector isn't enough. >> 4) Pointer location information maintained on stack and registers (i.e. tagged >> pointers or a compiler generated map of where pointers are at various >> programs points). I believe Standard ML of NJ, and the original KCL >> fit here. >Bingo! Again, this isn't enough to guarantee anything interesting. Here's another, somewhat more involved, example. Assume that I have a program P that manipulates a queue Q, which is implemented as a singly linked list with head and tail pointers H and T. Consider the following pseudo-program: (Scopes are indicated by indentation level.) let H = a pointer variable; T = a pointer variable; F = a function variable of appropriate type; in let first = initial Queue entry (an allocated object); in let f = function not mentioning first; g = function mentioning first; in H := first; T := first; F := f; g(); P; Assume that P runs for a long time, and adds and removes many items from Q. Assume further that the length of the queue is bounded. Assume also that it does no other allocation, and that it calls F occasionally. Does this program consume more and more storage? It does if a reference to first is retained. This can happen because one of the following holds: 1) The collector is conservative, and there is a bogus reference to first. 2) first was allocated in as a tagged pointer on the stack, and the location was not cleared, and it was not overwritten by P's data. (This is likely on SPARC-like architectures that tend to leave lots of holes in the stack.) 3) The compiler decided to represent the function f as an <environment pointer, instruction pointer> pair. The environment for f contains the binding for first. 4) The compiler copies bindings into closures, but tries to share closures if possible. It created a single record containing the bindings used by both f and g. Since f is still accessible, it will retain the binding of first. 5) The compiler performed some other baroque optimization that extended the lifetime of first. None of these optimizations are forbidden by most language specifications. Sometimes they can be very profitable. (My impression is that SML of NJ does 4, though perhaps not in this case. Nobody other than the implementors would know.) Generational collection can also introduce similar phenomena if your heap gets big enough that you can't afford periodic full collections. If you are writing a long running server, it is possible to program defensively against such things. But the same is true for conservative collectors. Empirically, long running servers usually work just fine with conservative collection. There is a partial explanation of this, and a good discussion of an EXTREME form of conservative garbage collection in Wentworth, E. P., "Pitfalls of Conservative Garbage Collection", SP&E 20, 7 (July 1990) pp. 719-727. It points out when conservative garbage collection works even in very densely occupied address spaces, and when it doesn't. I claim all the failures he discusses can also occur with nonconservative collection and compiler optimization. He just cranked up the probabilities high enough to make them easily observable. Hans