aarons@syma.sussex.ac.uk (Aaron Sloman) (03/15/90)
wilson@carcoar.Stanford.EDU (Paul Wilson) writes: > Date: 13 Mar 90 01:11:46 GMT > Reply-To: wilson@carcoar.Stanford.EDU (Paul Wilson) > > As somebody who's written a state-of-the-art gc (in some respects, at > least), ...... > ...................Don't expect a fast language on stock > hardware to have gc speed costs of less than about 10% on average. > I think in the 8-15 percent range myself. .......... > Paul R. Wilson > Software Systems Laboratory lab ph.: (312) 996-9216 > U. of Illin. at C. EECS Dept. (M/C 154) wilson@bert.eecs.uic.edu > Box 4348 Chicago,IL 60680 I thought 8-15 percept was rather high, so I thought I would do some experiments in Pop-11 (a lisp like language with a Pascal-like syntax) in the Poplog system on a Sun4/260 (stock hardware I think). I ran Poplog with heap limit set to vary between 800,000 bytes and 2,000,000 bytes, and then did a range of things, including running the integrated editor on some fairly large files, using the incremental compiler to re-compile a little 1700 line expert system shell several times, running a small planning program that uses the shell several times, computing factorial(1000) several times (in order to generate lots of "big" integers to force garbage collections), doing some double precision decimal arithmetic (which causes garbage collections) and some single precision (which doesn't), and at various times recorded the ratio of GC time to total CPU time. It varied between 1.5% and 3.2%. Typical times for individual garbage collections were about 0.15 seconds - obviously it would have been more for a larger process, or less if I had "locked" the part of the heap containing non-garbage. Also, by allocating more heap-space I could reduce the frequency of garbage collections and so bring the percentage of GC time down. These times were obtained using the default Poplog garbage collector which uses the "stop-and-copy" method. Forcing it to use the non-copying ("compacting") garbage collector, which is invoked when there is not enough memory or swap space for the copying mechanism) then it would have been a bit slower (e.g. 0.22 seconds per GC). It would also have slowed down had there been more data-structures. e.g a program with about 2 Mbytes in the heap might spend a couple of seconds on a garbage collection. The actual time would depend on whether there's enough physical memory to make paging unnecessary. The above times compare quite well with stories about having to stop for coffee whenever a lisp machine has a garbage collection, and of course as "stock" hardware gets faster, the GC times will reduce, though not necessarily the percentage. Aaron Sloman, School of Cognitive and Computing Sciences, Univ of Sussex, Brighton, BN1 9QH, England EMAIL aarons@cogs.sussex.ac.uk or: aarons%uk.ac.sussex.cogs@nsfnet-relay.ac.uk aarons%uk.ac.sussex.cogs%nsfnet-relay.ac.uk@relay.cs.net
jack@cs.glasgow.ac.uk (Jack Campin) (03/15/90)
aarons@syma.sussex.ac.uk (Aaron Sloman) wrote: > wilson@carcoar.Stanford.EDU (Paul Wilson) writes: >> Don't expect a fast language on stock hardware to have gc speed costs of >> less than about 10% on average. > [Poplog] varied between 1.5% and 3.2%. PS-algol uses about 2.9% according to some measurements done a couple of years ago. -- -- Jack Campin Computing Science Department, Glasgow University, 17 Lilybank Gardens, Glasgow G12 8QQ, Scotland 041 339 8855 x6044 work 041 556 1878 home JANET: jack@cs.glasgow.ac.uk BANG!net: via mcvax and ukc FAX: 041 330 4913 INTERNET: via nsfnet-relay.ac.uk BITNET: via UKACRL UUCP: jack@glasgow.uucp
wilson@carcoar.Stanford.EDU (Paul Wilson) (03/16/90)
In article <4836@vanuata.cs.glasgow.ac.uk> jack@cs.glasgow.ac.uk (Jack Campin) writes: > >aarons@syma.sussex.ac.uk (Aaron Sloman) wrote: >> wilson@carcoar.Stanford.EDU (Paul Wilson) writes: >>> Don't expect a fast language on stock hardware to have gc speed costs of >>> less than about 10% on average. > >> [Poplog] varied between 1.5% and 3.2%. > >PS-algol uses about 2.9% according to some measurements done a couple of >years ago. > Again, we may be having a little misunderstanding (generational vs. not, etc.), but I find this interesting. Questions: 1) how fast is PS-Algol, flat out? Less-than-great code quality can flatter gc performance. 2) what kind of gc was used? 3) what kind of programs were used for testing? 4) did this involve persistent objects? >-- Jack Campin Computing Science Department, Glasgow University, 17 Lilybank >Gardens, Glasgow G12 8QQ, Scotland 041 339 8855 x6044 work 041 556 1878 home >JANET: jack@cs.glasgow.ac.uk BANG!net: via mcvax and ukc FAX: 041 330 4913 >INTERNET: via nsfnet-relay.ac.uk BITNET: via UKACRL UUCP: jack@glasgow.uucp -- Paul the garbageman Paul R. Wilson Software Systems Laboratory lab ph.: (312) 996-9216 U. of Illin. at C. EECS Dept. (M/C 154) wilson@bert.eecs.uic.edu Box 4348 Chicago,IL 60680
jack@cs.glasgow.ac.uk (Jack Campin) (03/17/90)
wilson@carcoar.Stanford.EDU (Paul Wilson) wrote: > jack@cs.glasgow.ac.uk (Jack Campin) writes: >> aarons@syma.sussex.ac.uk (Aaron Sloman) wrote: >>> wilson@carcoar.Stanford.EDU (Paul Wilson) writes: >>>> Don't expect a fast language on stock hardware to have gc speed costs of >>>> less than about 10% on average. >>> [Poplog] varied between 1.5% and 3.2%. >> PS-algol uses about 2.9% according to some measurements done a couple of >> years ago. > Questions: > 1) how fast is PS-Algol, flat out? Less-than-great code quality can > flatter gc performance. Very fast for an interpreted language. Much better than any Smalltalk I've seen (Smalltalk is probably the closest familiar programming environment). There is a much faster native-code-generated version used by ICL on their mainframes; I don't have any performance figures on this (that I can give out, anyway). Maybe comparable to a good interpreted Lisp, but that's guesswork. I wasn't surprised that Poplog led to similar figures on something; anyone got a nice closure-bashing benchmark that we could try in Smalltalk, Scheme, Poplog and PS-algol to firm this up a bit? > 2) what kind of gc was used? Sliding compaction; Deutsch-Schorr-Waite or Morris, I forget which. Later versions add a feature that makes a big difference: sometimes treating read-only persistent objects as garbage (you can always get them back from disk later if this is wrong). This can reduce heap size drastically. > 3) what kind of programs were used for testing? Recompiling the compiler. > 4) did this involve persistent objects? Yes, but that doesn't make much difference to in-core gc, except for the point mentioned in 2). The garbage collector only needs to test bit 31 of a 4-byte address to tell persistent and virtual-memory pointers apart. Persistent storage garbage collection is a different kettle of ballgames. Time for a lunchbreak with our persistent store (~100 MB of data, gc running on fast Suns). Only done every few weeks. -- -- Jack Campin Computing Science Department, Glasgow University, 17 Lilybank Gardens, Glasgow G12 8QQ, Scotland 041 339 8855 x6044 work 041 556 1878 home JANET: jack@cs.glasgow.ac.uk BANG!net: via mcvax and ukc FAX: 041 330 4913 INTERNET: via nsfnet-relay.ac.uk BITNET: via UKACRL UUCP: jack@glasgow.uucp
wilson@carcoar.Stanford.EDU (Paul Wilson) (03/20/90)
In article <4862@vanuata.cs.glasgow.ac.uk> jack@cs.glasgow.ac.uk (Jack Campin) writes: > >wilson@carcoar.Stanford.EDU (Paul Wilson) wrote: >> jack@cs.glasgow.ac.uk (Jack Campin) writes: >>> aarons@syma.sussex.ac.uk (Aaron Sloman) wrote: >>>> wilson@carcoar.Stanford.EDU (Paul Wilson) writes: >>>>> Don't expect a fast language on stock hardware to have gc speed costs of >>>>> less than about 10% on average. >>>> [Poplog] varied between 1.5% and 3.2%. >>> PS-algol uses about 2.9% according to some measurements done a couple of >>> years ago. > >> Questions: > >> 1) how fast is PS-Algol, flat out? Less-than-great code quality can >> flatter gc performance. > >Very fast for an interpreted language. Much better than any Smalltalk I've >seen (Smalltalk is probably the closest familiar programming environment). Okay, that makes sense. Not to demean either Smalltalk or PS-Algol (two languages I like a lot), but this doesn't sound really fast, by my standards. I'm thinking of things like T (the object-oriented extended Scheme from Yale) and commercial Common Lisps, which have optimizing native-code compilers. (Though I'm actually unclear on current CL speeds.) Self seems to be approaching this kind of performance, and the techniques involved should work at least as well for future-generation Smalltalks. And the fast implementations should get even faster over time. GC's will have to keep up. I'd guess that if your worked at it, a good native-code compiled PS-Algol would run several times as fast, and your gc costs, as a percentage of run time, would be several times higher. This may or may not be a big issue for PS-Algol, depending on whether typical applications are compute- bound or I/O bound. >There is a much faster native-code-generated version used by ICL on their >mainframes; I don't have any performance figures on this (that I can give >out, anyway). Maybe comparable to a good interpreted Lisp, but that's >guesswork. I wasn't surprised that Poplog led to similar figures on >something; anyone got a nice closure-bashing benchmark that we could try >in Smalltalk, Scheme, Poplog and PS-algol to firm this up a bit? There's a closure-intensive benchmark or two in the appendix to David Kranz' thesis on Orbit (the T Compiler) from Yale. I don't know whether that's the right thing to test, though. It's hard to come up with a fair closure benchmark that will work across languages. In some languages you might use closures in certain situations, but in others you might not, even if they're available. For example, in vanilla Scheme (or PS-Algol) you might use closures to implement objects (which can get you in trouble performance-wise -- there's another subtle interaction with generational gc's). But in T, a full object system is provided, so you'd use that instead. >> 2) what kind of gc was used? > >Sliding compaction; Deutsch-Schorr-Waite or Morris, I forget which. I take it this is not a generational gc, though you're probably treating the persistent store as a kind of separate entity. Also, for programs with a lot of live data at gc time, sliding compaction, etc., can be pretty slow -- several passes over the data. > Later >versions add a feature that makes a big difference: sometimes treating >read-only persistent objects as garbage (you can always get them back from >disk later if this is wrong). This can reduce heap size drastically. That seems very reasonable. >> 3) what kind of programs were used for testing? > >Recompiling the compiler. Depending on your compiler, this may or may not be a reasonable test. For our bytecode compiler, our generational gc is extremely effective. If hte first generation is of reasonable size, nothing except the actual generated code (and associated data structures like literal frames) survives to be advanced out of the first generation. When the compiler starts compiling modules instead of just independent procedures, we expect the copying costs to go up significantly. (At a gc, we'll typically have some intermediate data structures of a whole module to copy, as well as those involved in compiling a single procedure.) Further in this direction is Lucid's Common Lisp compiler, which may have a *megabyte* of inter- procedural analysis goop lying around when doing an optimized system build. For that kind of "serious" stuff you'd want a serious generational gc. >> 4) did this involve persistent objects? > >Yes, but that doesn't make much difference to in-core gc, except for the >point mentioned in 2). The garbage collector only needs to test bit 31 of >a 4-byte address to tell persistent and virtual-memory pointers apart. >Persistent storage garbage collection is a different kettle of ballgames. >Time for a lunchbreak with our persistent store (~100 MB of data, gc >running on fast Suns). Only done every few weeks. I seem to recall that PS-Algol used to have a rather strange allocation/gc scheme, with objects allocated in areas by type for locality reasons. Were any locality effects ever measured? (Also, I seem to recall someone saying that there was a possibility of storage leaks in the P-store, under certain circumstances, or something like that. Is the same scheme in use, or has it been modified?) >-- >-- Jack Campin Computing Science Department, Glasgow University, 17 Lilybank >Gardens, Glasgow G12 8QQ, Scotland 041 339 8855 x6044 work 041 556 1878 home >JANET: jack@cs.glasgow.ac.uk BANG!net: via mcvax and ukc FAX: 041 330 4913 >INTERNET: via nsfnet-relay.ac.uk BITNET: via UKACRL UUCP: jack@glasgow.uucp -- Paul Paul R. Wilson Software Systems Laboratory lab ph.: (312) 996-9216 U. of Illin. at C. EECS Dept. (M/C 154) wilson@bert.eecs.uic.edu Box 4348 Chicago,IL 60680 Paul R. Wilson Software Systems Laboratory lab ph.: (312) 996-9216 U. of Illin. at C. EECS Dept. (M/C 154) wilson@bert.eecs.uic.edu Box 4348 Chicago,IL 60680
jack@cs.glasgow.ac.uk (Jack Campin) (03/21/90)
wilson@carcoar.Stanford.EDU (Paul Wilson) wrote: >> anyone got a nice closure-bashing benchmark that we could try >> in Smalltalk, Scheme, Poplog and PS-algol to firm this up a bit? > There's a closure-intensive benchmark or two in the appendix to David Kranz' > thesis on Orbit (the T Compiler) from Yale. I don't know whether that's > the right thing to test, though. It's hard to come up with a fair closure > benchmark that will work across languages. In some languages you might > use closures in certain situations, but in others you might not, even > if they're available. For example, in vanilla Scheme (or PS-Algol) > you might use closures to implement objects (which can get you in > trouble performance-wise -- there's another subtle interaction with > generational gc's). But in T, a full object system is provided, so > you'd use that instead. I don't see why closures and explicitly created objects should affect gc techniques differently. The way PS-algol does closures certainly can, but that's a detail of implementation. (Too much is retained, since the abstract machine objects that hold the environment of a partially applied procedure are way too big and flat). As the storage system sees it, closures and objects are the same thing. >>> what kind of programs were used for testing? >> Recompiling the compiler. > Depending on your compiler, this may or may not be a reasonable test. For > our bytecode compiler, our generational gc is extremely effective. If hte > first generation is of reasonable size, nothing except the actual generated > code (and associated data structures like literal frames) survives to be > advanced out of the first generation. When the compiler starts compiling > modules instead of just independent procedures, we expect the copying > costs to go up significantly. (At a gc, we'll typically have some > intermediate data structures of a whole module to copy, as well as those > involved in compiling a single procedure.) Further in this direction > is Lucid's Common Lisp compiler, which may have a *megabyte* of inter- > procedural analysis goop lying around when doing an optimized system build. > For that kind of "serious" stuff you'd want a serious generational gc. The test I was quoting would have taken 12MB of heap to run without gc; the heap was constrained to be 1MB. The compiler takes a fairly simple- minded approach and passes the entire parse tree between the parser and code generator in one lump. How does this relate to your setup? > I seem to recall that PS-Algol used to have a rather strange allocation/gc > scheme, with objects allocated in areas by type for locality reasons. Were > any locality effects ever measured? (Also, I seem to recall someone saying > that there was a possibility of storage leaks in the P-store, under certain > circumstances, or something like that. Is the same scheme in use, or has > it been modified?) This is correct. The persistent store is partitioned into containers (mis)named "databases". The idea of this model is that programmers would control inter-database references so that the databases formed a directed acyclic graph; the leaf databases could then be garbage-collected individually without disturbing others. I implemented a persistent store garbage collector that took advantage of this; it permitted storage leaks, but these were completely avoidable by any sane PS-algol programmer. The locality effects are thus very dependent on what people do with the system. The problem was getting our users to realize the persistent store implications of their programs' name structures. Let enough naive student hackers loose with no documentation, and inter-database references will rapidly degenerate into a spaghetti mess that can only be globally gc'd. We have another gc that does mark/sweep on the whole persistent universe - obviously, this is slower - and didn't have much option but to use it instead (this one clears up storage leaks). Forethought and a tougher attitude to system administration would have prevented this debacle - but since nobody else at the time had a persistent store capable of accumulating a ragbag of inter-referential student projects for several years, we couldn't predict what would happen. A final question: how do object sizes affect GC techniques? Most of the fans of generational methods seem to be LISPers, whose objects are mostly the same size. PS-algol objects are more or less Poisson-distributed (mean 50-60 bytes) and I suspect that Smalltalk and Poplog ones are too. -- -- Jack Campin Computing Science Department, Glasgow University, 17 Lilybank Gardens, Glasgow G12 8QQ, Scotland 041 339 8855 x6044 work 041 556 1878 home JANET: jack@cs.glasgow.ac.uk BANG!net: via mcvax and ukc FAX: 041 330 4913 INTERNET: via nsfnet-relay.ac.uk BITNET: via UKACRL UUCP: jack@glasgow.uucp