yoram@cheshire.columbia.edu (Yoram Eisenstadter) (05/26/87)
In June 1983, a paper by Lieberman and Hewitt entitled "A Real-Time Garbage Collector Based on the Lifetimes of Objects" appeared in CACM. The paper proposed an algorithm for garbage collection in Lisp which it claimed was superior to other methods, but did not present any experimental data to validate this claim. (The basic idea of the algorithm is that older objects are likely to stay around longer, and should be garbage collected less often than newer objects.) I have the following questions: - Does anybody know of any empirical results on the performance of this particular algorithm? - Does anybody know if anybody ever collected data on the following properties of Lisp programs? (What about object-oriented programs?) (1) Rate of object creation (2) Average lifetime of objects (3) Proportion of "forward" pointers, which point from one object to a more recently allocated object (4) Locality of reference, i.e., do programs point to "nearby" or "far away" objects - What are the various tradeoffs among garbage collection algorithms? I see lots of systems with "stop-and-copy" garbage collectors, even though incremental garbage collectors are much more user-friendly. What is it about incremental collectors that prevents everybody from implementing them? Complexity? Inefficiency? Since a lively discussion of garbage collectors may be of interest to the general readership of this newsgroup, you may want to follow up to the net rather than reply to me in person. Thanks, Yoram Yoram Eisenstadter | Arpanet: yoram@cs.columbia.edu Columbia University | Usenet: seismo!columbia!cs!yoram Dept. of Computer Science | Bitnet: yoram%cs.columbia.edu@WISCVM New York, NY 10027 | Phone: (212) 280-8180
code@sphinx.UUCP (05/27/87)
In article <4631@columbia.UUCP> yoram@cs.columbia.edu (Yoram Eisenstadter) writes: >In June 1983, a paper by Lieberman and Hewitt entitled "A >Real-Time Garbage Collector Based on the Lifetimes of Objects" >appeared in CACM. [ ...asks for empirical evaluations, etc. ... ] I too am interested in real-time garbage-collected systems. Which ones are available, which ones are in the wings? (is there a forthcoming version of Scheme, T, or Common to run on UNIX systems in general?) Also, I am interested in interfaces to X windows and graphics toolkits that are somewhat standardized (either across vendors, or from one vendor for all of the major UNIX workstations -- VAXen, SUNs, Apollos, etc. Preferably I'd like to see these features in a nice object-oriented system. Something like real-time CommonLoops with graphics. Any info? (By the way, the applications I have in mind are sophisticated automatic program animation, including animation of dynamically allocated data structures.) Interesting followups to the net, marginal ones to me via e-mail, please.
willc@tekchips.TEK.COM (Will Clinger) (05/28/87)
In article <4631@columbia.UUCP> yoram@cs.columbia.edu (Yoram Eisenstadter) writes: >In June 1983, a paper by Lieberman and Hewitt entitled "A >Real-Time Garbage Collector Based on the Lifetimes of Objects" >appeared in CACM.... > >I have the following questions: > >- Does anybody know of any empirical results on the performance >of [the Lieberman-Hewitt] algorithm? > The following diagram relates four garbage collection algorithms: real-time stop and copy -------------------------> Baker algorithm (Minsky-Fenichel-Yochelson) | | | | | | generations | generations | | | | V real-time V generation scavenging -----------------> Lieberman-Hewitt That is, the Baker algorithm is a real-time version of the popular stop-and-copy algorithm, and the Lieberman-Hewitt algorithm adds the idea of segregating structures by age. A scaled-down implementation of the Lieberman-Hewitt algorithm is used by Symbolics lisp machines. Empirical results were presented in a paper by David Moon at the 1984 ACM Conference on Lisp and Functional Programming. A paper by Stoney Ballard and Stephen Shirron in Smalltalk-80: Bits of History, Words of Advice contains a good discussion of the Baker and Lieberman-Hewitt algorithms along with a qualitative summary of their experience with a prototype Smalltalk interpreter for a VAX. Common wisdom has it that the Baker and Lieberman-Hewitt algorithms need special hardware support. Generation scavenging, which is a non-real-time version of the Lieberman-Hewitt algorithm, appears to be the coming thing for stock hardware. A paper by Pat Caudill and Allen Wirfs-Brock at the 1986 OOPSLA conference gives empirical results for the generation scavenging collector used in Tektronix Smalltalk. >- Does anybody know if anybody ever collected data on the >following properties of Lisp programs? (What about >object-oriented programs?) > > (1) Rate of object creation > (2) Average lifetime of objects > (3) Proportion of "forward" pointers, which point from > one object to a more recently allocated object > (4) Locality of reference, i.e., do programs point > to "nearby" or "far away" objects (1), (2), and (3) are very sensitive to the program you're running; (4) is implementation-dependent. Many Lisp, Scheme, and Smalltalk systems can tell you the information you need to calculate (1). Moon's article offers data relevant to (2) for two benchmarks. Data on (3) is scarce. >- What are the various tradeoffs among garbage collection >algorithms? I see lots of systems with "stop-and-copy" garbage >collectors, even though incremental garbage collectors are much >more user-friendly. What is it about incremental collectors that >prevents everybody from implementing them? Complexity? >Inefficiency? Generation-based and real-time collectors are more complex. Real-time collectors may also decrease efficiency on stock hardware, because they usually require that fetches from the heap test for and/or follow forwarding pointers. A paper by Rod Brooks at the 1984 Lisp conference describes this problem and some alternative implementation strategies. I thought these were good questions. Peace, William Clinger Tektronix Computer Research Lab
rbbb@titan.rice.edu@rice.EDU (David Chase) (05/28/87)
In article <4631@columbia.UUCP> yoram@cs.columbia.edu (Yoram Eisenstadter) writes: >In June 1983, a paper by Lieberman and Hewitt .... >I have the following questions: > >- Does anybody know of any empirical results on the performance >of this particular algorithm? > The garbage collector the Symbolics 3600 is (was) similar. Try reading "Garbage Collection in a Large Lisp System", by Moon, 1984 SIGPLAN Symposium on LISP and Functional Programming, and "Architecture of the Symbolics 3600", by Moon, 1985 International Symposium on Computer Architecture. >- Does anybody know if anybody ever collected data on the >following properties of Lisp programs? (What about >object-oriented programs?) > > (1) Rate of object creation > (2) Average lifetime of objects > (3) Proportion of "forward" pointers, which point from > one object to a more recently allocated object > (4) Locality of reference, i.e., do programs point > to "nearby" or "far away" objects > Yes. Yes. TITLE = "Compact Encodings of List Structure", AUTHOR = "Daniel G. Bobrow and Douglas W. Clark", JOURNAL = toplas, VOLUME = 1, NUMBER = 2, MONTH = oct, YEAR = 1979, AUTHOR = "Douglas W. Clark", TITLE = "Measurements of Dynamic List Structure Use in {L}isp", JOURNAL = ieeese, VOLUME = 5, NUMBER = 1, MONTH = jan, YEAR = 1979, AUTHOR = "D. W. Clark and C. C. Green", TITLE = "An Empirical Study of List Structure in {LISP}", JOURNAL = cacm, VOLUME = 20, NUMBER = 2, YEAR = 1977, MONTH = feb, >- What are the various tradeoffs among garbage collection >algorithms? I see lots of systems with "stop-and-copy" garbage >collectors, even though incremental garbage collectors are much >more user-friendly. What is it about incremental collectors that >prevents everybody from implementing them? Complexity? >Inefficiency? 1) Tradeoffs, in no particular order: collection delay (essentially zero in a truly incremental system) collection "software" overhead collection "hardware" overhead need for help from the programmer language restrictions Software overhead is the time spent (or not saved) in maintaining invariants on the run-time environment for the sake of the collector. This includes both tag-bit manipulations and optimizations left undone. Hardware overhead is the extra memory needed to store tag bits, the microcode (I'll call it hardware, anyway) assist, and the "barrier" on the memory system. Need for help from the programmer. Some systems work much better if the programmer helps out, either by providing clues or by explicitly reusing storage. Language restrictions include (for example) the ability to perform static type-checking and forbidding the use of "upward FUNARGS" (using a function with lexically bound variables as a return value). Static type-checking can permit the use of specially-compiled garbage collectors and allocation from "pointer-free" storage (need not be traced). Disallowing upward FUNARGs guarantees that activation records can be stack allocated. Compiler analysis can also discover this in many situations. People are less likely to implement "real-time" collectors (as described by Baker in CACM 21(4)) because of the overhead placed on CAR and CDR operations. This overhead can be shifted to hardware (as in the Symbolics machines), but then you must pay for the extra hardware. The problem with special-purpose hardware is that it is, well, special purpose. There is less demand for it, and there are fewer things that you can do with it. Because there is less demand for it, chip manufacturers don't work hard to make it go fast. Look at how the 68000 series has improved over the last 7 years, for example. With a Sun 3 and a smart compiler, you can compete with a 360o. If you decide that you hate LISP, then there is still a market for used Suns. There was also a concurrent collector implemented for Cedar Mesa at Xerox. It ran every so often, mostly concurrent with ordinary execution. See "On Adding Garbage Collection and Runtime Types to a Strongly-Typed, Statically Checked, Concurrent Language", Xerox PARC TR CSL-84-7. It helps to first read "An efficient, incremental, automatic garbage collector" by Deutsch and Bobrow, CACM 19(9). Also, there has been a shift toward RISC machines in the last few years, and so far those (except for work at UCB) tend not to help with garbage collection. The work at Berkeley has been in support of a not-really- incremental collector that apparently works very very well. See David Ungar's article in Proc. of the ACM SIGSOFT/SIGPLAN Soft. Eng. Symp. on Practical Software Development Environments. His collector runs every few seconds, but the collection times are quite fast (not "real-time", but suitable for interactive use. I've even met "satisfied customers"). I could say more, but I had better stop now. There's a LOT to be said about garbage collection. Oh yeah, object-oriented statistics. See "Smalltalk-80: Bits of History, Words of Advice", ed. Glenn Krasner, Addison Wesley, 1983. David Chase
pierson@encore.UUCP (05/29/87)
The main followup work appears to have been done in the Smalltalk world. The SOAR project at Berkeley developed a "Generation Scavanging" garbage collector based on the same idea as Hewitt, et. al. It was published a few years ago (I don't have the reference with me). The SOAR project decided that many of their ideas failed. Does anyone know how they felt about this one? The latest Symbolics garbage collector is also supposed to use object lifetime as part of its strategy. -- dan {talcott,linus,vaxine,axiom,necis,decvax,ihnp4,allegra}!encore!pierson
bs@hplabsc.UUCP (Bob Shaw) (05/29/87)
Generation collectors can be more "real-time" than real-time collectors. In discussing garbage collection techniques, a point has been raised which I'm quite curious about. The term "real-time" has been used in all of the articles I've seen in the discussion. What is the nature of the "real-time" response that people are seeking? Is it simply a matter of not wanting a several second (minute) delay in response while editting a program, or are we talking about the needs for smooth 30 frame a second animation, or is it motivated by the need to respond to asynchronous events within microseconds of the event? Also, is this "real-time" behaviour sustained (e.g. as in a heart arrhythmia monitor) or bursty (e.g. as in gathering data from a two second car barrier crash)? Depending on the definition of "real-time" used, generation schemes can easily match or exceed the performance of the real-time collectors (e.g. especially in bursty situations). In Baker's paper (Comm of the ACM April 1978) he states when talking about adding vectors and arrays to his model, "However, the method can no longer claim to be real-time because neither the time taken by the array allocation (ARRAY-CONS) nor the time taken by the array element accessing function is bounded by a constant." In practice, it is possible to bound these operations by placing a limit on the size of allowable structures. Later, in the same paper, Baker comments, "Our scheme is still real-time on a virtual memory computer, but the bounds on the elementary list operations now have the order of magnitude of secondary storage operations." Thus, given systems with vectors and virtual memory, the bound on delays is now in the range of tens of milliseconds (for common secondary storage technology). In a 1984 Lisp Conference paper Dave Moon described some users reactions to the original incremental (real-time) collector implemented on the 3600's: "They found that garbage collection made interactive response so poor and unpredictable, and its overhead was so high, that it was preferable to turn off garbage collection and simply wait for the inevitable exhaustion of virtual address space (at which point the system crashes and must be re-booted)." According to Dave Ungar's dissertation (Berkeley Tech Report UCB/CSD 86/287), on a 68010 class machine running Berkeley Smalltalk, the mean time between Generation Scavenging collections was 16 seconds and the pauses due to collections ranged from 90 to 330 milliseconds with a mean of 160 milliseconds. In all fairness, his algorithm is designed to minimize VM paging and these numbers do not include the initial paging necessary to bring the pages in the first time. Nonetheless, most people would have a tough time spotting a .16 second delay during an editting session. I'd like to propose that "real-time" should be replaced by "incremental" in describing the algorithms since the bounding values can be large and difficult to establish. For many situations, the performance of Generation schemes is such that a user would not be able to determine whether a Generation or "real-time" algorithm was in use. Do people really desire guaranteed time bounds on garbage collection? Over and out, Bob Shaw HP Labs bs@hplabs.hp.com P.S. For those interested, I've written a tech report describing a Generation scheme that runs on stock hardware with no generation tags and essentially no runtime slowdown. It uses a simple heap layout and information from the virtual memory system. In the worst case, it degenerates into a conventional stop and copy algorithm. Empirical data about Lisp data lifetimes (Boy, are they short!) is also included. It is available as Stanford Technical Report CSL-TR-87-323 and entitled, "Improving Garbage Collector Performance in Virtual Memory."