[comp.sys.ti.explorer] Running out of address space

acuff@SUMEX-AIM.STANFORD.EDU (Richard Acuff) (06/17/89)

For a while we've been having problems with running out of address space on
the Explorer.  We'd been assuming that our programs were just consing and
holding onto a lot of data until we noticed an application that ran fine
without TGC but ran out of address space with it.  We started poking around
and discovered that data was being promoted out of generation 2 into
generation 3 when the system automatically collected generation 2.  This was
different from TGC's initial behavior (from rel 3).  What this means is that
data that "survives" (ie. is held onto) for a long time but then dropped can
easily get "tenured" into a generation that isn't collected.

   TI claims that the amount of data mistakenly tenured is very small, but I
think there are cases of applications or usage patterns where this isn't the
case.  Thus, if you've been mysteriously running out of address space would
you try to check <Term> G when you're near the end and see how much data has
survived a generation 2 collection?

   I believe this is a problem, and I believe the problem will get worse if
indicated changes in Rel 6 happen.  The default for
SYS:*GC-MAX-INCREMENTAL-GENERATION* will be reduced from 2 to 1, and the
thresholds for 1 and 2 will be reduced to 20% of RAM and 40% of RAM
respectively.  I think this implies that garbage will accumulate in
generation 2.

   The way the system is now is "unsafe" since a program which at any one
time uses less data than the available address space can run out of address
space.  This may very well be a good tradeoff against performance since in
most cases the amount collected from generation 2 is quite small but it takes
a lot of time to collect, but it is still unsafe.

I would like to see TI do three things:

1. Discuss the issues in the release notes and/or other documentation so that
   users can decide the tradeoff.

2. Add a facility for controlling if the highest automatically collected
   generation, SYS:*GC-MAX-INCREMENTAL-GENERATION*, is automatically promoted
   or not.

3. Allow the user to tune the generation thresholds.  Thus if I have a
   program that cons more than 40% of RAM, holds onto most of it for a while,
   and then drops it, I can tune the system to that without having to resort
   to batch GC.

Please let me know what you think about this issue.

	-- Rich

pf@islington-terrace.csc.ti.com (Paul Fuqua) (06/17/89)

    Date: Friday, June 16, 1989  2:45pm (CDT)
    From: Richard Acuff <acuff at sumex-aim.stanford.edu>
    Subject: Running out of address space
    
       I believe this is a problem, and I believe the problem will get worse if
    indicated changes in Rel 6 happen.  The default for
    SYS:*GC-MAX-INCREMENTAL-GENERATION* will be reduced from 2 to 1, and the
    thresholds for 1 and 2 will be reduced to 20% of RAM and 40% of RAM
    respectively.  I think this implies that garbage will accumulate in
    generation 2.

I've been using a pre-release 6.0 band for a few months now, and the
last couple have had the indicated GC parameters.  In fact, I noticed
the increased number of level 1 and 2 collections and reported the new
thresholds as a bug.
    
	    This may very well be a good tradeoff against performance since in
    most cases the amount collected from generation 2 is quite small but it takes
    a lot of time to collect, but it is still unsafe.

My usual long-term garbage percentage in level 2 is 15%.  I'm one of
those people who like to run for weeks at a time between boots.
    
    3. Allow the user to tune the generation thresholds.  Thus if I have a
       program that cons more than 40% of RAM, holds onto most of it for a while,
       and then drops it, I can tune the system to that without having to resort
       to batch GC.

I want this.  I want to go back to the original generation sizes, more
or less.  I want at least the TGC equivalent of the old flip-ratio
control.  It would be trivial to change the threshold constants (and I
may do this privately).

What I do at the moment is set the max incremental generation to 2 and
run a background process that wakes up at 4:00am and does
(gc-immediately :max-gen 3 :promote nil) after flushing histories and
kill rings.  (Kill rings and input histories are often-forgotten sources
of long-term pointers to nearly everything.)  That tends to find about
15% of level 3 as garbage.
    
    Please let me know what you think about this issue.

As customers, you have much greater influence than internal TI people
like me, so speak up.

Paul Fuqua                     pf@csc.ti.com
                               {smu,texsun,cs.utexas.edu,rice}!ti-csl!pf
Texas Instruments Computer Science Center
PO Box 655474 MS 238, Dallas, Texas 75265

mxh@SUMEX-AIM.STANFORD.EDU (Max Hailperin) (06/18/89)

It's a bit unfair for me to respond to Rich Acuff's solicitation of opinions
on the TGC promotion-from-last-generation-collected issue, since I'm one of
his site's users who discovered the problem in the first place.  None the
less, I feel compelled to add the following to the discussion:

Promoting out of the last generation collected has the following two
cardinal hallmarks of a *genuine bug*:
 1) It is counter to the documentation.
 2) It made my runs crash, which they didn't when the bug was patched.
    Note that these were *real runs*, not artificial test cases: they
    were the very thing that we paid TI good money for Explorers to do.

Rich's suggestion of user control is good, but inessential.  What is
absolutely essential is that safety takes precedence over "performance"
(at least by default), and that the documentation reflects reality.

I lost several weeks due to this bug, and I would be *incensed* if TI
were to deny that it is one.  Having lost that much time over it, I'd
be reluctant to ever support the purchase of a machine from a company
that denied it's bugness.

Ok, I feel a bit better now.  I hope some TI folks are reading this.
Thanks.