[comp.object] Garbage collection -- difficulty, cost, generality

aarons@syma.sussex.ac.uk (Aaron Sloman) (03/15/90)

wilson@carcoar.Stanford.EDU (Paul Wilson) writes:

> Date: 13 Mar 90 01:11:46 GMT
> Reply-To: wilson@carcoar.Stanford.EDU (Paul Wilson)
>
> As somebody who's written a state-of-the-art gc (in some respects, at
> least), ......

> ...................Don't expect a fast language on stock
> hardware to have gc speed costs of less than about 10% on average.
> I think in the 8-15 percent range myself.
    ..........

> Paul R. Wilson
> Software Systems Laboratory               lab ph.: (312) 996-9216
> U. of Illin. at C. EECS Dept. (M/C 154)   wilson@bert.eecs.uic.edu
> Box 4348   Chicago,IL 60680

I thought 8-15 percept was rather high, so I thought I would do some
experiments in Pop-11 (a lisp like language with a Pascal-like
syntax) in the Poplog system on a Sun4/260 (stock hardware I think).

I ran Poplog with heap limit set to vary between 800,000 bytes and
2,000,000 bytes, and then did a range of things, including running
the integrated editor on some fairly large files, using the
incremental compiler to re-compile a little 1700 line expert system
shell several times, running a small planning program that uses the
shell several times, computing factorial(1000) several times (in
order to generate lots of "big" integers to force garbage
collections), doing some double precision decimal arithmetic (which
causes garbage collections) and some single precision (which
doesn't), and at various times recorded the ratio of GC time to
total CPU time.

It varied between 1.5% and 3.2%.

Typical times for individual garbage collections were about 0.15
seconds - obviously it would have been more for a larger process, or
less if I had "locked" the part of the heap containing non-garbage.

Also, by allocating more heap-space I could reduce the frequency
of garbage collections and so bring the percentage of GC time down.

These times were obtained using the default Poplog garbage collector
which uses the "stop-and-copy" method. Forcing it to use the
non-copying ("compacting") garbage collector, which is invoked when
there is not enough memory or swap space for the copying mechanism)
then it would have been a bit slower (e.g. 0.22 seconds per GC).

It would also have slowed down had there been more data-structures.
e.g a program with about 2 Mbytes in the heap might spend a couple
of seconds on a garbage collection. The actual time would depend
on whether there's enough physical memory to make paging
unnecessary.

The above times compare quite well with stories about having to
stop for coffee whenever a lisp machine has a garbage collection,
and of course as "stock" hardware gets faster, the GC times will
reduce, though not necessarily the percentage.

Aaron Sloman,
School of Cognitive and Computing Sciences,
Univ of Sussex, Brighton, BN1 9QH, England
    EMAIL   aarons@cogs.sussex.ac.uk
or:
            aarons%uk.ac.sussex.cogs@nsfnet-relay.ac.uk
            aarons%uk.ac.sussex.cogs%nsfnet-relay.ac.uk@relay.cs.net

jack@cs.glasgow.ac.uk (Jack Campin) (03/15/90)

aarons@syma.sussex.ac.uk (Aaron Sloman) wrote:
> wilson@carcoar.Stanford.EDU (Paul Wilson) writes:
>> Don't expect a fast language on stock hardware to have gc speed costs of
>> less than about 10% on average.

> [Poplog] varied between 1.5% and 3.2%.

PS-algol uses about 2.9% according to some measurements done a couple of
years ago.

-- 
--  Jack Campin   Computing Science Department, Glasgow University, 17 Lilybank
Gardens, Glasgow G12 8QQ, Scotland   041 339 8855 x6044 work  041 556 1878 home
JANET: jack@cs.glasgow.ac.uk    BANG!net: via mcvax and ukc   FAX: 041 330 4913
INTERNET: via nsfnet-relay.ac.uk   BITNET: via UKACRL   UUCP: jack@glasgow.uucp

wilson@carcoar.Stanford.EDU (Paul Wilson) (03/16/90)

In article <4836@vanuata.cs.glasgow.ac.uk> jack@cs.glasgow.ac.uk (Jack Campin) writes:
>
>aarons@syma.sussex.ac.uk (Aaron Sloman) wrote:
>> wilson@carcoar.Stanford.EDU (Paul Wilson) writes:
>>> Don't expect a fast language on stock hardware to have gc speed costs of
>>> less than about 10% on average.
>
>> [Poplog] varied between 1.5% and 3.2%.
>
>PS-algol uses about 2.9% according to some measurements done a couple of
>years ago.
>

Again, we may be having a little misunderstanding (generational vs. not, etc.),
but I find this interesting.

Questions:

  1) how fast is PS-Algol, flat out?  Less-than-great code quality can
     flatter gc performance. 

  2) what kind of gc was used? 

  3) what kind of programs were used for testing?

  4) did this involve persistent objects?

>--  Jack Campin   Computing Science Department, Glasgow University, 17 Lilybank
>Gardens, Glasgow G12 8QQ, Scotland   041 339 8855 x6044 work  041 556 1878 home
>JANET: jack@cs.glasgow.ac.uk    BANG!net: via mcvax and ukc   FAX: 041 330 4913
>INTERNET: via nsfnet-relay.ac.uk   BITNET: via UKACRL   UUCP: jack@glasgow.uucp

   -- Paul the garbageman

Paul R. Wilson                         
Software Systems Laboratory               lab ph.: (312) 996-9216
U. of Illin. at C. EECS Dept. (M/C 154)   wilson@bert.eecs.uic.edu
Box 4348   Chicago,IL 60680

jack@cs.glasgow.ac.uk (Jack Campin) (03/17/90)

wilson@carcoar.Stanford.EDU (Paul Wilson) wrote:
> jack@cs.glasgow.ac.uk (Jack Campin) writes:
>> aarons@syma.sussex.ac.uk (Aaron Sloman) wrote:
>>> wilson@carcoar.Stanford.EDU (Paul Wilson) writes:
>>>> Don't expect a fast language on stock hardware to have gc speed costs of
>>>> less than about 10% on average.
>>> [Poplog] varied between 1.5% and 3.2%.
>> PS-algol uses about 2.9% according to some measurements done a couple of
>> years ago.

> Questions:

>   1) how fast is PS-Algol, flat out?  Less-than-great code quality can
>      flatter gc performance. 

Very fast for an interpreted language.  Much better than any Smalltalk I've
seen (Smalltalk is probably the closest familiar programming environment).
There is a much faster native-code-generated version used by ICL on their
mainframes; I don't have any performance figures on this (that I can give
out, anyway).  Maybe comparable to a good interpreted Lisp, but that's
guesswork.  I wasn't surprised that Poplog led to similar figures on
something; anyone got a nice closure-bashing benchmark that we could try
in Smalltalk, Scheme, Poplog and PS-algol to firm this up a bit?

>   2) what kind of gc was used? 

Sliding compaction; Deutsch-Schorr-Waite or Morris, I forget which.  Later
versions add a feature that makes a big difference: sometimes treating
read-only persistent objects as garbage (you can always get them back from
disk later if this is wrong).  This can reduce heap size drastically.

>   3) what kind of programs were used for testing?

Recompiling the compiler.

>   4) did this involve persistent objects?

Yes, but that doesn't make much difference to in-core gc, except for the
point mentioned in 2).  The garbage collector only needs to test bit 31 of
a 4-byte address to tell persistent and virtual-memory pointers apart.
Persistent storage garbage collection is a different kettle of ballgames.
Time for a lunchbreak with our persistent store (~100 MB of data, gc
running on fast Suns).  Only done every few weeks.

-- 
--  Jack Campin   Computing Science Department, Glasgow University, 17 Lilybank
Gardens, Glasgow G12 8QQ, Scotland   041 339 8855 x6044 work  041 556 1878 home
JANET: jack@cs.glasgow.ac.uk    BANG!net: via mcvax and ukc   FAX: 041 330 4913
INTERNET: via nsfnet-relay.ac.uk   BITNET: via UKACRL   UUCP: jack@glasgow.uucp

wilson@carcoar.Stanford.EDU (Paul Wilson) (03/20/90)

In article <4862@vanuata.cs.glasgow.ac.uk> jack@cs.glasgow.ac.uk (Jack Campin) writes:
>
>wilson@carcoar.Stanford.EDU (Paul Wilson) wrote:
>> jack@cs.glasgow.ac.uk (Jack Campin) writes:
>>> aarons@syma.sussex.ac.uk (Aaron Sloman) wrote:
>>>> wilson@carcoar.Stanford.EDU (Paul Wilson) writes:
>>>>> Don't expect a fast language on stock hardware to have gc speed costs of
>>>>> less than about 10% on average.
>>>> [Poplog] varied between 1.5% and 3.2%.
>>> PS-algol uses about 2.9% according to some measurements done a couple of
>>> years ago.
>
>> Questions:
>
>>   1) how fast is PS-Algol, flat out?  Less-than-great code quality can
>>      flatter gc performance. 
>
>Very fast for an interpreted language.  Much better than any Smalltalk I've
>seen (Smalltalk is probably the closest familiar programming environment).

Okay, that makes sense.  Not to demean either Smalltalk or PS-Algol (two
languages I like a lot), but this doesn't sound really fast, by my standards.
I'm thinking of things like T (the object-oriented extended Scheme from Yale)
and commercial Common Lisps, which have optimizing native-code compilers.
(Though I'm actually unclear on current CL speeds.)  Self seems to be
approaching this kind of performance, and the techniques involved should
work at least as well for future-generation Smalltalks.  And the fast
implementations should get even faster over time.  GC's will have to 
keep up.

I'd guess that if your worked at it, a good native-code compiled PS-Algol
would run several times as fast, and your gc costs, as a percentage of run
time, would be several times higher.  This may or may not be a big issue
for PS-Algol, depending on whether typical applications are compute-
bound or I/O bound. 

>There is a much faster native-code-generated version used by ICL on their
>mainframes; I don't have any performance figures on this (that I can give
>out, anyway).  Maybe comparable to a good interpreted Lisp, but that's
>guesswork.  I wasn't surprised that Poplog led to similar figures on
>something; anyone got a nice closure-bashing benchmark that we could try
>in Smalltalk, Scheme, Poplog and PS-algol to firm this up a bit?

There's a closure-intensive benchmark or two in the appendix to David Kranz'
thesis on Orbit (the T Compiler) from Yale.  I don't know whether that's
the right thing to test, though.  It's hard to come up with a fair closure
benchmark that will work across languages.  In some languages you might
use closures in certain situations, but in others you might not, even
if they're available.  For example, in vanilla Scheme (or PS-Algol)
you might use closures to implement objects (which can get you in 
trouble performance-wise -- there's another subtle interaction with 
generational gc's).  But in T, a full object system is provided, so 
you'd use that instead.

>>   2) what kind of gc was used? 
>
>Sliding compaction; Deutsch-Schorr-Waite or Morris, I forget which.

I take it this is not a generational gc, though you're probably treating
the persistent store as a kind of separate entity.  Also, for programs
with a lot of live data at gc time, sliding compaction, etc., can be
pretty slow -- several passes over the data.

>                                                                     Later
>versions add a feature that makes a big difference: sometimes treating
>read-only persistent objects as garbage (you can always get them back from
>disk later if this is wrong).  This can reduce heap size drastically.

That seems very reasonable.

>>   3) what kind of programs were used for testing?
>
>Recompiling the compiler.

Depending on your compiler, this may or may not be a reasonable test.  For
our bytecode compiler, our generational gc is extremely effective.  If hte
first generation is of reasonable size, nothing except the actual generated 
code (and associated data structures like literal frames) survives to be
advanced out of the first generation.  When the compiler starts compiling
modules instead of just independent procedures, we expect the copying 
costs to go up significantly.   (At a gc, we'll typically have some
intermediate data structures of a whole module to copy, as well as those
involved in compiling a single procedure.)  Further in this direction
is Lucid's Common Lisp compiler, which may have a *megabyte* of inter-
procedural analysis goop lying around when doing an optimized system build.
For that kind of "serious" stuff you'd want a serious generational gc.

>>   4) did this involve persistent objects?
>
>Yes, but that doesn't make much difference to in-core gc, except for the
>point mentioned in 2).  The garbage collector only needs to test bit 31 of
>a 4-byte address to tell persistent and virtual-memory pointers apart.
>Persistent storage garbage collection is a different kettle of ballgames.
>Time for a lunchbreak with our persistent store (~100 MB of data, gc
>running on fast Suns).  Only done every few weeks.

  I seem to recall that PS-Algol used to have a rather strange allocation/gc 
scheme, with objects allocated in areas by type for locality reasons.  Were
any locality effects ever measured?  (Also, I seem to recall someone saying
that there was a possibility of storage leaks in the P-store, under certain
circumstances, or something like that.  Is the same scheme in use, or has 
it been modified?)

>-- 
>--  Jack Campin   Computing Science Department, Glasgow University, 17 Lilybank
>Gardens, Glasgow G12 8QQ, Scotland   041 339 8855 x6044 work  041 556 1878 home
>JANET: jack@cs.glasgow.ac.uk    BANG!net: via mcvax and ukc   FAX: 041 330 4913
>INTERNET: via nsfnet-relay.ac.uk   BITNET: via UKACRL   UUCP: jack@glasgow.uucp

   -- Paul

Paul R. Wilson                         
Software Systems Laboratory               lab ph.: (312) 996-9216
U. of Illin. at C. EECS Dept. (M/C 154)   wilson@bert.eecs.uic.edu
Box 4348   Chicago,IL 60680 

Paul R. Wilson                         
Software Systems Laboratory               lab ph.: (312) 996-9216
U. of Illin. at C. EECS Dept. (M/C 154)   wilson@bert.eecs.uic.edu
Box 4348   Chicago,IL 60680

jack@cs.glasgow.ac.uk (Jack Campin) (03/21/90)

wilson@carcoar.Stanford.EDU (Paul Wilson) wrote:

>> anyone got a nice closure-bashing benchmark that we could try
>> in Smalltalk, Scheme, Poplog and PS-algol to firm this up a bit?

> There's a closure-intensive benchmark or two in the appendix to David Kranz'
> thesis on Orbit (the T Compiler) from Yale.  I don't know whether that's
> the right thing to test, though.  It's hard to come up with a fair closure
> benchmark that will work across languages.  In some languages you might
> use closures in certain situations, but in others you might not, even
> if they're available.  For example, in vanilla Scheme (or PS-Algol)
> you might use closures to implement objects (which can get you in 
> trouble performance-wise -- there's another subtle interaction with 
> generational gc's).  But in T, a full object system is provided, so 
> you'd use that instead.

I don't see why closures and explicitly created objects should affect gc
techniques differently.  The way PS-algol does closures certainly can,
but that's a detail of implementation.  (Too much is retained, since the
abstract machine objects that hold the environment of a partially applied
procedure are way too big and flat).  As the storage system sees it,
closures and objects are the same thing.

>>> what kind of programs were used for testing?
>> Recompiling the compiler.
> Depending on your compiler, this may or may not be a reasonable test.  For
> our bytecode compiler, our generational gc is extremely effective.  If hte
> first generation is of reasonable size, nothing except the actual generated 
> code (and associated data structures like literal frames) survives to be
> advanced out of the first generation.  When the compiler starts compiling
> modules instead of just independent procedures, we expect the copying 
> costs to go up significantly.   (At a gc, we'll typically have some
> intermediate data structures of a whole module to copy, as well as those
> involved in compiling a single procedure.)  Further in this direction
> is Lucid's Common Lisp compiler, which may have a *megabyte* of inter-
> procedural analysis goop lying around when doing an optimized system build.
> For that kind of "serious" stuff you'd want a serious generational gc.

The test I was quoting would have taken 12MB of heap to run without gc;
the heap was constrained to be 1MB. The compiler takes a fairly simple-
minded approach and passes the entire parse tree between the parser and
code generator in one lump.  How does this relate to your setup?

> I seem to recall that PS-Algol used to have a rather strange allocation/gc 
> scheme, with objects allocated in areas by type for locality reasons.  Were
> any locality effects ever measured?  (Also, I seem to recall someone saying
> that there was a possibility of storage leaks in the P-store, under certain
> circumstances, or something like that.  Is the same scheme in use, or has 
> it been modified?)

This is correct.  The persistent store is partitioned into containers
(mis)named "databases".  The idea of this model is that programmers would
control inter-database references so that the databases formed a directed
acyclic graph; the leaf databases could then be garbage-collected
individually without disturbing others.  I implemented a persistent store
garbage collector that took advantage of this; it permitted storage leaks,
but these were completely avoidable by any sane PS-algol programmer.

The locality effects are thus very dependent on what people do with the
system.  The problem was getting our users to realize the persistent store
implications of their programs' name structures.  Let enough naive student
hackers loose with no documentation, and inter-database references will
rapidly degenerate into a spaghetti mess that can only be globally gc'd.
We have another gc that does mark/sweep on the whole persistent universe -
obviously, this is slower - and didn't have much option but to use it
instead (this one clears up storage leaks).  Forethought and a tougher
attitude to system administration would have prevented this debacle - but
since nobody else at the time had a persistent store capable of
accumulating a ragbag of inter-referential student projects for several
years, we couldn't predict what would happen.

A final question: how do object sizes affect GC techniques?  Most of the
fans of generational methods seem to be LISPers, whose objects are mostly
the same size.  PS-algol objects are more or less Poisson-distributed (mean
50-60 bytes) and I suspect that Smalltalk and Poplog ones are too.
-- 
--  Jack Campin   Computing Science Department, Glasgow University, 17 Lilybank
Gardens, Glasgow G12 8QQ, Scotland   041 339 8855 x6044 work  041 556 1878 home
JANET: jack@cs.glasgow.ac.uk    BANG!net: via mcvax and ukc   FAX: 041 330 4913
INTERNET: via nsfnet-relay.ac.uk   BITNET: via UKACRL   UUCP: jack@glasgow.uucp