wilson@carcoar.Stanford.EDU (Paul Wilson) (02/23/89)
In an earlier posting to comp.arch, I asked if anybody was trying to keep the Newest generation of a generational gc entirely within a cache. I suggested a small (about 50 KB) generation with a large (about 100KB) cache. The earlier posting didn't make clear why I think this is a good idea -- one response said that the cache misses caused by a scavenge would be no big deal compared to the usual costs of actually running the user program. Actually, what I meant to say is that keeping the user program in cache should keep the user program from having many cache misses. (In exactly the same sense that generational gc's improve locality for the running program, at the scale of main memory. Only on a smaller scale.) My reasoning is this: Lisp and Smalltalk allocate huge amounts of short-lived data. In the absence of garbage collection, the cache will tend to be full of recently-allocated (hence recently touched) data. And it will always be allocating new data in new locations. This will cause continual cache misses, to bring the newly-allocated locations into the cache. In addition, this will cause some older data to be bumped out of the cache; you lose coming and going. (The old data will generally be things that were allocated a couple of seconds before. Since they were given new values on allocation, they *will* be dirty and they *will* be written out. Yuck.) To give this a (VERY APPROXIMATE) quantitative flavor, suppose you have a 128KB cache, and a 5 MIPS processor; suppose also that you have a 20% load ratio, 1% miss rate, and uniformly distributed misses, for an average cache lifetime of about 3 seconds. (This VERY off-the-cuff supposition is due to Doug Johnson, who should not be held accountable. Feel free to suggest better numbers.) Now suppose you try to run Lisp instead of "normal" programs. Suppose that on your 5 MIPS processor a compute-bound Lisp program allocates 50 to 100 KB of data per second. (This not-quite off-the-cuff supposition comes from the programs Bob Shaw analyzed in his Stanford dissertation last year. His programs allocated between 40 KB (or thereabouts) and over 200 KB (!) per second on a 4 MIPS processor.) At 50 KB/s allocation, cache misses due to sheer allocation will be competitive with the usual kinds of cache misses. Over 100 KB/s, they will dominate. It seems to me that this will greatly decrease the usefulness of a cache in a Lisp system. Two strategies for dealing with these costs: 1) Keep the newest generation so small that it lives in the cache, reusing the same locations over and over again. 2) Tell the cache not to bother to swap in blocks that we're allocating fresh, so at least we don't swap in garbage only to initialize it as new objects with new values. (I believe the Symbolics does something similar with paging (?), "creating" pages rather than reusing them. That way, the VM system is free to reuse any page frame rather than swapping garbage in from disk.) The big questions are whether (1) is worth the cost and whether (2) is possible in a given hardware configuration. (1) does seem reasonable to me, again based on data from Bob Shaw's dissertation. I calculate that even with a 32 KB newest generation, his programs will only spend between about 1% and 4% of their time copying data. (Using Ungar's measurements that say you can copy about a half a megabyte per second on a ~4 MIPS 68020, and assuming you advance things to the next generation if they survive one scavenge.) If that's representative, then the question reduces to whether a two or three percent cost in copying is repaid by our improved cache hit rate. My guess is yes. Anybody else care to guess? Anybody got numbers that will decide the issue? About point (2), Allen Baum sez both IBM and HP have patents on caches that could support this sort of thing (e.g., caches exposed to software control). So is anybody actively researching/developing such things for garbage collected systems? (BTW, it seems that such a strategy is going to have big problems with multitasking. Looking at Hennessy et al.'s recent (TOPLAS?) paper, it didn't appear that his locality measurements were for any Lisps with neat locality tricks. (Most of their measurements were for non-Lisp programs.) It seems that gc strategies introduce a bunch of new parameters that all could affect locality in funky ways.) -- Paul Paul R. Wilson Human-Computer Interaction Laboratory lab ph.: (312) 413-0042 U. of Ill. at Chi. EECS Dept. (M/C 154) Box 4348 Chicago,IL 60680 wilson@carcoar.stanford.edu Paul R. Wilson Human-Computer Interaction Laboratory lab ph.: (312) 413-0042 U. of Ill. at Chi. EECS Dept. (M/C 154) Box 4348 Chicago,IL 60680 wilson@carcoar.stanford.edu