acuff@SUMEX-AIM.STANFORD.EDU (Richard Acuff) (06/17/89)
For a while we've been having problems with running out of address space on the Explorer. We'd been assuming that our programs were just consing and holding onto a lot of data until we noticed an application that ran fine without TGC but ran out of address space with it. We started poking around and discovered that data was being promoted out of generation 2 into generation 3 when the system automatically collected generation 2. This was different from TGC's initial behavior (from rel 3). What this means is that data that "survives" (ie. is held onto) for a long time but then dropped can easily get "tenured" into a generation that isn't collected. TI claims that the amount of data mistakenly tenured is very small, but I think there are cases of applications or usage patterns where this isn't the case. Thus, if you've been mysteriously running out of address space would you try to check <Term> G when you're near the end and see how much data has survived a generation 2 collection? I believe this is a problem, and I believe the problem will get worse if indicated changes in Rel 6 happen. The default for SYS:*GC-MAX-INCREMENTAL-GENERATION* will be reduced from 2 to 1, and the thresholds for 1 and 2 will be reduced to 20% of RAM and 40% of RAM respectively. I think this implies that garbage will accumulate in generation 2. The way the system is now is "unsafe" since a program which at any one time uses less data than the available address space can run out of address space. This may very well be a good tradeoff against performance since in most cases the amount collected from generation 2 is quite small but it takes a lot of time to collect, but it is still unsafe. I would like to see TI do three things: 1. Discuss the issues in the release notes and/or other documentation so that users can decide the tradeoff. 2. Add a facility for controlling if the highest automatically collected generation, SYS:*GC-MAX-INCREMENTAL-GENERATION*, is automatically promoted or not. 3. Allow the user to tune the generation thresholds. Thus if I have a program that cons more than 40% of RAM, holds onto most of it for a while, and then drops it, I can tune the system to that without having to resort to batch GC. Please let me know what you think about this issue. -- Rich
pf@islington-terrace.csc.ti.com (Paul Fuqua) (06/17/89)
Date: Friday, June 16, 1989 2:45pm (CDT) From: Richard Acuff <acuff at sumex-aim.stanford.edu> Subject: Running out of address space I believe this is a problem, and I believe the problem will get worse if indicated changes in Rel 6 happen. The default for SYS:*GC-MAX-INCREMENTAL-GENERATION* will be reduced from 2 to 1, and the thresholds for 1 and 2 will be reduced to 20% of RAM and 40% of RAM respectively. I think this implies that garbage will accumulate in generation 2. I've been using a pre-release 6.0 band for a few months now, and the last couple have had the indicated GC parameters. In fact, I noticed the increased number of level 1 and 2 collections and reported the new thresholds as a bug. This may very well be a good tradeoff against performance since in most cases the amount collected from generation 2 is quite small but it takes a lot of time to collect, but it is still unsafe. My usual long-term garbage percentage in level 2 is 15%. I'm one of those people who like to run for weeks at a time between boots. 3. Allow the user to tune the generation thresholds. Thus if I have a program that cons more than 40% of RAM, holds onto most of it for a while, and then drops it, I can tune the system to that without having to resort to batch GC. I want this. I want to go back to the original generation sizes, more or less. I want at least the TGC equivalent of the old flip-ratio control. It would be trivial to change the threshold constants (and I may do this privately). What I do at the moment is set the max incremental generation to 2 and run a background process that wakes up at 4:00am and does (gc-immediately :max-gen 3 :promote nil) after flushing histories and kill rings. (Kill rings and input histories are often-forgotten sources of long-term pointers to nearly everything.) That tends to find about 15% of level 3 as garbage. Please let me know what you think about this issue. As customers, you have much greater influence than internal TI people like me, so speak up. Paul Fuqua pf@csc.ti.com {smu,texsun,cs.utexas.edu,rice}!ti-csl!pf Texas Instruments Computer Science Center PO Box 655474 MS 238, Dallas, Texas 75265
mxh@SUMEX-AIM.STANFORD.EDU (Max Hailperin) (06/18/89)
It's a bit unfair for me to respond to Rich Acuff's solicitation of opinions on the TGC promotion-from-last-generation-collected issue, since I'm one of his site's users who discovered the problem in the first place. None the less, I feel compelled to add the following to the discussion: Promoting out of the last generation collected has the following two cardinal hallmarks of a *genuine bug*: 1) It is counter to the documentation. 2) It made my runs crash, which they didn't when the bug was patched. Note that these were *real runs*, not artificial test cases: they were the very thing that we paid TI good money for Explorers to do. Rich's suggestion of user control is good, but inessential. What is absolutely essential is that safety takes precedence over "performance" (at least by default), and that the documentation reflects reality. I lost several weeks due to this bug, and I would be *incensed* if TI were to deny that it is one. Having lost that much time over it, I'd be reluctant to ever support the purchase of a machine from a company that denied it's bugness. Ok, I feel a bit better now. I hope some TI folks are reading this. Thanks.