wilson@carcoar.Stanford.EDU (Paul Wilson) (03/16/90)
In article <2356@syma.sussex.ac.uk> aarons@syma.sussex.ac.uk (Aaron Sloman) writes: > I (Paul Wilson) write: > >> ...................Don't expect a fast language on stock >> hardware to have gc speed costs of less than about 10% on average. >> I think in the 8-15 percent range myself. > .......... > >I thought 8-15 percept was rather high, so I thought I would do some >experiments in Pop-11 (a lisp like language with a Pascal-like >syntax) in the Poplog system on a Sun4/260 (stock hardware I think). > > [ ...description of programs run, short resulting gc times...] Two points: I probably didn't make it clear in my posting that I was referring to *generational* gc's. They have a higher base overhead, but perform better when faced with large, data-hungry applications. (They also tend to have short pause times that are less likely to intefere with interactive response.) Also, I'm unclear about the system you're running -- I don't know much about Pop. The programs you're testing appear not to stress a gc much. This is likely because they're short and they don't ever have much live data. Running a small program several times is NOT at all like running a big program once. Such experiments can be *very* misleadingly optimistic. (That's one of the reasons why I'm trying to collect scalable (Scheme) programs for benchmarking gc's on different problem sizes.) Think about it this way. After every execution of a program, all of its intermediate data objects, and probably its results, are garbage. They will incur _no_ copying cost. If you execute small programs repeatedly between gc's, you generally only copy live data for those few iterations that a gc happens to occur during. Objects from other iterations are all dead. Running programs that do a lot of number consing is particularly bad, because those objects are typically short-lived and will flatter your gc efficiency. (Unless you store pointers to the heap-allocated numbers into objects in an older generation. Then your performance can get bad fast.) Garbage collection times correspond roughly to the average amount of data that is live during execution. If you run a benchmark a bunch of times between scavenges, the raw memory usage scales up, but the copying cost does not. This creates an illusion that the gc is working very well. In real long-running programs, the amount of live data *is* typically larger, sometimes dramatically larger, and the copying cost goes up significantly. A big hunk of the cost of a generational gc is incurred _during_ _program_execution_, not just at scavenges. (Stores into the heap have to be checked and/or recorded, to keep track of intergenerational pointers.) So if you just record pause times, you miss this component of gc cost entirely. For little programs with little live data, this is likely to be the major cost of generational garbage collection. (I estimate a minimum of about 3%, but probably more like 6%, and maybe more, for fast optimized code.) Again, you can get _very_optimistic_ estimates of gc performance if you ignore this cost. The right way to test is this to recompile the code, leaving out the checking/marking code emitted for generational gc, and see how much faster your programs run without that continual overhead. (But maybe Pop-11 doesn't have a generational gc. If not, I guess it doesn't have a lot of heap-resident system stuff like big editors, browsers and compilers; that would increase the pause times.) -- Paul Paul R. Wilson Software Systems Laboratory lab ph.: (312) 996-9216 U. of Illin. at C. EECS Dept. (M/C 154) wilson@bert.eecs.uic.edu Box 4348 Chicago,IL 60680
gza@mentor.cc.purdue.edu (William R Burdick) (03/20/90)
Dave Ungar's paper in OOPSLA 89 (first paper in the book, I forget the name, but it starts with "Tenuring") is about a scheme to tenure objects based on continual feedback from the garbage collector. He uses only 2 generations (young and old) and claims an *average* of 3% overhead for interactive programs in Smalltalk-80 (which generates garbage much faster than LISP -- I've heard estimates of 10 times that of LISP). For details about how his scheme works, you can also look at his paper in ACM Software Engineering Notes, may 1984 (page 157-167). Or if enough people want it, I'll post a summary of the techniques. -- -- Bill Burdick burdick@cello.ecn.purdue.edu
aarons@syma.sussex.ac.uk (Aaron Sloman) (03/20/90)
wilson@carcoar.Stanford.EDU (Paul Wilson) writes: > Date: 15 Mar 90 23:13:31 GMT > > In article <2356@syma.sussex.ac.uk> aarons@syma.sussex.ac.uk (Aaron Sloman) > writes: > > I (Paul Wilson) write: > > > >> ...................Don't expect a fast language on stock > >> hardware to have gc speed costs of less than about 10% on average. > >> I think in the 8-15 percent range myself. > > .......... > >(Sloman:) > >I thought 8-15 percept was rather high, so I thought I would do some > >experiments in Pop-11 (a lisp like language with a Pascal-like > >syntax) in the Poplog system on a Sun4/260 (stock hardware I think). > > > > [ ...description of programs run, short resulting gc times...] > > (Wilson:) > Two points: I probably didn't make it clear in my posting that I was > referring to *generational* gc's. I did not realize that was your intention. Poplog does not at present have a generational gc. It has a stop-and-copy gc for use when there's lots of memory/swap space available and a shuffling Morris gc for use when there isn't, or paging is too slow. It also allows the heap to be "locked": e.g. if you have compiled a very large suite of programs, lock the heap after compilation, and then only a small amount of store will need to be copied on each GC, though all still has to be scanned. (This, I believe, produces the advantages of a generational GC, except that it isn't automated.) Poplog's heap can have "holes" in it for items allocated by external programs linked in (e.g. C, Fortran). It can also store non-relocatable Poplog data in such holes, e.g. for use in connection with external programs. > ........ Also, I'm unclear about the > system you're running -- I don't know much about Pop. > > The programs you're testing appear not to stress a gc much. It's hard to stress the Poplog GC as it is very fast. So only frequent GC's cause problems, and, as I said, you can do things to reduce frequency, like expanding heap space. > ... This is > likely because they're short and they don't ever have much live > data. One reason I chose compilation for part of the test is that the amount of "live" data steadily increases, at least with Poplog's incremental compiler, which allocates procedures in heap space, along with everything else. During compilation of the Prolog subsystem of Poplog, GC time was about 3 per cent. When compiling the Common Lisp extension, which increases the used heap area by a megabyte, the GC time was about 3.7 percent. Both percentages can be decreased by allocating more heap space, and/or by locking the heap after each file is compiled. > Running a small program several times is NOT at all like running a > big program once. Such experiments can be *very* misleadingly optimistic. I agree, but part of my point was also that for a given program you can vary the percentage of GC time by setting a small or large limit for the heap size. By allocating a VERY large heap (system memory and swap space permitting) I can reduce the frequency of garbage collections and thereby substantially reduce the percentage of time spent on gc. So statistics are more or less useless without full information about such variables. > Running programs that do a lot of number consing is particularly bad, > because those objects are typically short-lived and will flatter > your gc efficiency. (Unless you store pointers to the heap-allocated > numbers into objects in an older generation. Then your performance > can get bad fast.) Alternatively you have very little spare heap space and therefore have to reclaim space FREQUENTLY, in which case the performance will also get bad. > .... In real long-running programs, the amount of live data > *is* typically larger, sometimes dramatically larger, and the copying > cost goes up significantly. I guess what's real, or typical, is a matter of style, which is why benchmarking is so difficult. I do a lot of real work in the poplog editor - reading and sending mail, editing, developing programs, examining online documentation, browsing through news bulletins, running a Pop-11 based troff previewer, etc. etc. and many of the files I regularly read into the editor for browsing and editing are a few hundred kilobytes or more in size. The editor uses a vector of strings for a file, and while you work on a line, the corresponding string will be discarded every time it needs to grow, and every time you move off a string with wasted space to edit another line. So garbage is being created all the time. Yet with a big heap you can work for ages without invoking a GC and if you don't actually trace GC's you won't notice them because they are so fast (except on an overloaded shared machine). > (But maybe Pop-11 doesn't have a generational gc. If not, I guess > it doesn't have a lot of heap-resident system stuff like big editors, > browsers and compilers; that would increase the pause times.) No, it doesn't have a generational gc, and most of the core system stuff, like the editor, compilers, utilities, etc are not in the heap. However, there are large libraries of utilities, which get added to the heap if used. (On VMS, SunOS, and Dynix, saved images can be created in which user code and date are in a shared, non garbage collectable, segment.) Anyhow, it appears, from what you say, that generational GCs have a serious performance penalty, and that it may be preferable to have a very fast, infrequently invoked, alternative mechanism. Apologies for letting this grow too long. It's all that heap size available.... Aaron Sloman, School of Cognitive and Computing Sciences, Univ of Sussex, Brighton, BN1 9QH, England EMAIL aarons@cogs.sussex.ac.uk aarons%uk.ac.sussex.cogs@nsfnet-relay.ac.uk
wilson@carcoar.Stanford.EDU (Paul Wilson) (03/21/90)
In article <2372@syma.sussex.ac.uk> aarons@syma.sussex.ac.uk (Aaron Sloman) writes: >wilson@carcoar.Stanford.EDU (Paul Wilson) writes: > >> Date: 15 Mar 90 23:13:31 GMT >> >> In article <2356@syma.sussex.ac.uk> aarons@syma.sussex.ac.uk (Aaron Sloman) >> writes: >> > I (Paul Wilson) write: >> > >> >> ...................Don't expect a fast language on stock >> >> hardware to have gc speed costs of less than about 10% on average. >> >> I think in the 8-15 percent range myself. >> > .......... > >> >(Sloman:) >> >I thought 8-15 percept was rather high, so I thought I would do some >> >experiments in Pop-11 (a lisp like language with a Pascal-like >> >syntax) in the Poplog system on a Sun4/260 (stock hardware I think). >> > >> > [ ...description of programs run, short resulting gc times...] >> >> (Wilson:) >> Two points: I probably didn't make it clear in my posting that I was >> referring to *generational* gc's. > >I did not realize that was your intention. Poplog does not at present >have a generational gc. It has a stop-and-copy gc for use when there's >lots of memory/swap space available and a shuffling Morris gc for use >when there isn't, or paging is too slow. It also allows the heap to be >"locked": e.g. if you have compiled a very large suite of programs, lock >the heap after compilation, and then only a small amount of store will >need to be copied on each GC, though all still has to be scanned. (This, >I believe, produces the advantages of a generational GC, except that it >isn't automated.) This is a reasonable intermediate strategy, between a simple gc and a generational one. Since scanning is several times faster than copying (as long as no paging is involved), it can be a win. It avoids the generation checking and/or marking of a fully generational gc, but still gives you part of the benefit. For a large system with large heap images (either system goop or application data), though, it will not do well. Scanning a many-megabyte image can thrash the system at every gc. >> ........ Also, I'm unclear about the >> system you're running -- I don't know much about Pop. >> >> The programs you're testing appear not to stress a gc much. > >It's hard to stress the Poplog GC as it is very fast. So only >frequent GC's cause problems, and, as I said, you can do things >to reduce frequency, like expanding heap space. But memory is not free. In a high-performance native-code-compiled system, your performance will generally drop off quickly at the point where you're allocating and scavenging more virtual memory than you can keep in physical memory. This is becoming more severe all the time, as processor speed gains continue to outstrip disk speed gains. Decreasing memory costs tend to counter this trend, but it's unclear what the bottom line is. (And the various trends have different impacts on different users.) >> ... This is >> likely because they're short and they don't ever have much live >> data. > >One reason I chose compilation for part of the test is that the amount >of "live" data steadily increases, at least with Poplog's incremental >compiler, which allocates procedures in heap space, along with >everything else. During compilation of the Prolog subsystem of Poplog, >GC time was about 3 per cent. When compiling the Common Lisp extension, >which increases the used heap area by a megabyte, the GC time was about >3.7 percent. Both percentages can be decreased by allocating more heap >space, and/or by locking the heap after each file is compiled. >> Running a small program several times is NOT at all like running a >> big program once. Such experiments can be *very* misleadingly optimistic. > >I agree, but part of my point was also that for a given program you can >vary the percentage of GC time by setting a small or large limit for the >heap size. By allocating a VERY large heap (system memory and swap space >permitting) I can reduce the frequency of garbage collections and >thereby substantially reduce the percentage of time spent on gc. Until you outrun physical memory and start paging. You have to trade off increased copying for increased locality. The interaction between heap sizes and gc efficiency is important, but can't be viewed in isolation. And a simple compiler may not be a good test for two reasons. As I said previously, the average amount of live intermediate data may be small, when compared to an optimizing compiler. But in your case, it would appear that the dominant cost would be the repeated copying of the "results" (the already- created procedure objects, etc.) while creating more results. This could get much worse, too. An optimizing compiler may do several times as much computation, allocating lots of space and forcing several times as many gc's. So the dominant cost may be increased by several times. >So statistics are more or less useless without full information about >such variables. Right. You need to know when to bill the gc for whatever page faults occur, rather than just measuring pause times. There's no simple answer, because it depends on how much memory you have, and how much of that you had to buy precisely because of the characteristics of the system you're measuring. >> Running programs that do a lot of number consing is particularly bad, >> because those objects are typically short-lived and will flatter >> your gc efficiency. (Unless you store pointers to the heap-allocated >> numbers into objects in an older generation. Then your performance >> can get bad fast.) >Alternatively you have very little spare heap space and therefore >have to reclaim space FREQUENTLY, in which case the performance will >also get bad. Right. A generational gc will tend to compensate somewhat for short- lived numbers, as long as you don't store pointers to them into old objects. But a non-generational gc will simply copy (or scan) all the data that much more often. (Though serious number consing may make the program enough slower than a well-compiled program that it will flatter the gc significantly more. Less number consing tends to go hand-in-hand with faster code, increasing the relative importance of the *rest* of the gc's costs... yikes!) >> .... In real long-running programs, the amount of live data >> *is* typically larger, sometimes dramatically larger, and the copying >> cost goes up significantly. >I guess what's real, or typical, is a matter of style, which is why >benchmarking is so difficult. I do a lot of real work in the poplog >editor - reading and sending mail, editing, developing programs, >examining online documentation, browsing through news bulletins, running >a Pop-11 based troff previewer, etc. etc. and many of the files I >regularly read into the editor for browsing and editing are a few >hundred kilobytes or more in size. The editor uses a vector of strings >for a file, and while you work on a line, the corresponding string will >be discarded every time it needs to grow, and every time you move off a >string with wasted space to edit another line. So garbage is being >created all the time. Yet with a big heap you can work for ages without >invoking a GC and if you don't actually trace GC's you won't notice them >because they are so fast (except on an overloaded shared machine). A lot of the (mostly interactive) things you mention *do* account for a lot of peoples' primary usage patterns. But for most of these things, raw general computation speed is likely not to be the limiting factor either -- the continual overhead of a generational gc probably won't be a big deal. You're more likely to be concerned with the speed of graphics & disk operations, etc. And it will allow you a more integrated system, with utilities implemented in the language and resident in the heap. >> (But maybe Pop-11 doesn't have a generational gc. If not, I guess >> it doesn't have a lot of heap-resident system stuff like big editors, >> browsers and compilers; that would increase the pause times.) > >No, it doesn't have a generational gc, and most of the core system >stuff, like the editor, compilers, utilities, etc are not in the >heap. However, there are large libraries of utilities, which get >added to the heap if used. (On VMS, SunOS, and Dynix, saved images >can be created in which user code and date are in a shared, non >garbage collectable, segment.) > >Anyhow, it appears, from what you say, that generational GCs have a >serious performance penalty, and that it may be preferable to have a >very fast, infrequently invoked, alternative mechanism. Right -- if you're willing to live with a less congenial development environment in creating the system. (You give up writing almost everything in the same language, with all of your debugging tools available.) And for a lot of people, it's still not going to work well. If your applications have megabytes of live data, or if they include large, heap-resident libraries, you're going to run into annoying gc pauses and overhead, or thrash. My own preference is for a generational gc, at least in a general development environment; its performance may have a higher baseline overhead, but it's more robust in the face of big hairy programs. You trade a little processor overhead for a lot of locality, and save significant money on memory. For a system that generates stand-alone programs, though, you'd like to have the option of specifying which gc strategy it should use, and have three choices. A simple gc will give the fastest performance for programs with little data live at any given time; a generational gc will be better for programs with lots of live data, and a fine-grained incremental gc will be necessary for difficult real-time applications. (The performance will be degraded significantly, but response times will be reliably predictable.) -- Paul Paul R. Wilson Software Systems Laboratory lab ph.: (312) 996-9216 U. of Illin. at C. EECS Dept. (M/C 154) wilson@bert.eecs.uic.edu Box 4348 Chicago,IL 60680