[comp.lang.misc] garbage collection

sommar@enea.se (Erland Sommarskog) (03/15/88)

Edmund E Howland (cbseeh@fang.ATT.COM) writes:
>Huh?
>Since when have these three ever had a need for garbage collection?
>I am not too familiar with Ada, but garbage collection exists in
>languages for whom dynamic binding is a way of life, not an option.

If I define and implement a package for a data type of some sort in 
Ada I can hide the implementation of the type, just providing a set
of operations. At a later state I can change the implementation without
affecting the interface. 
  However, I must already in the first step, decide whether objects of 
this type should be dynamic or not. Not so much for a allocation routine, 
you quite often need an initiation routine, anyway. But do I need a 
deallocation routine? If there is garbage collection, I don't have to 
think of this problem.
  Yes, I can one there just in case, but then the calling units have to
bother about using them, which may require a lot of clean-up code, which
might be totally unnecessary.

Generally, garbage collection is a must in object-oriented programming.
(No, I don't belive Ada is a true OO-language.)

-- 
Erland Sommarskog       
ENEA Data, Stockholm        
sommar@enea.UUCP           "Si tu crois l'amour tabou...
                            Regarde bien, les yeux d'un fou!!!" -- Ange

g-rh@cca.CCA.COM (Richard Harter) (03/31/88)

In article <628@ra.rice.edu> boehm@titan.rice.edu (Hans Boehm) writes:

>  My experience is that any attempt at manipulation of interesting linked
>structures in a language that doesn't support real automatic storage
>management will either fail, or waste large amounts of debugging time.
>(My experience includes a (working) 40,000 line compiler, written in C, that
>manipulates a reference counted syntax "tree", or more accurately, a
>reference counted syntax graph.)  Normally, it is extremely difficult
>to track down bugs created by premature deallocation.  When such bugs are
>finally removed, the resulting programs normally include a substantial
>number of storage leaks.
>  Some recent experiments by Mark Weiser and myself with retrofitting a
>garbage collector to existing C programs, verify the latter point.
>(The garbage collector should never free anything since that was the
>programmers responsibility.  It does.  In other people's code.
>Our paper on this should appear in Software P&E shortly.)
>Mike Caplinger reported similar results for another application at the last
>USENIX conference, I believe.  We have resurrected C code with storage
>management bugs by linking it against a garbage collector (which in the case
>of C doesn't always work either, but it has a better track record than manual
>storage management).

	There is problably something I am missing, but I don't quite see
how you can implement garbage collection in C without collateral assumptions
-- how is the garbage collector supposed to know that that a block is free?

	Your main point is certainly true - manual allocation and deallocation
with the programmer being responsible for ensuring that all comes out right
really doesn't work very well.  This was a critical issue for us -- our
software has to run on systems which do not have pseudo-infinite memory
and it sometimes has to run for very long runs.  We can't afford memory
leaks (or premature deallocation).  What we did was to write our own
storage allocation package.  Salient features:

The allocation routine takes two arguments, a size and an ID.  The latter
is a unique integer specifying that particular call.  (We manage the ID list
manually.)  The allocator creates a node for each request block.  In this
node are sundry links, the ID, and (in effect) a date stamp.  There is also
a hash table that contains the address of every allocated block.  All allocator
control information is stored in a separate space from allocated memory itself.
(This eliminates a large class of mystery bugs associated with overwriting
allocator control data.)

The deallocation routine takes one argument, the address of the block being
returned.  The deallocation routine validates the address as being an
allocated address.  This eliminates another class of bugs caused by passing
an improper or stale address to the deallocation routine.

There is a facility for printing out a complete map of allocated memory
including call ID's and date stamps.  We use this, from time to time, to
check the code for storage leaks.

This scheme is not as good as automatic garbage collection, but we have
found it to be satisfactory.  
-- 

In the fields of Hell where the grass grows high
Are the graves of dreams allowed to die.
	Richard Harter, SMDS  Inc.

boehm@titan.rice.edu (Hans Boehm) (04/01/88)

In reference to g-rh@cca.CCA.COM (Richard Harter)'s question:
>        There is problably something I am missing, but I don't quite see
>how you can implement garbage collection in C without collateral assumptions
>-- how is the garbage collector supposed to know that that a block is free?

  This is usually, but not always, possible.  The idea is to use a traditional
non-compacting mark-sweep collector.  On a UN*X system, we start by scanning
the registers, stack, data, and (statically allocated) bss segments for
any addresses of valid allocated objects.  We subsequently scan allocated
objects we find in the same way.  For a typical C program, all reachable
objects can be found in this manner.  (Storing exclusive or'ed pointer
values, tagged pointers, and similiar programming tricks, as well as some
rarely used (for C) compiler optimization techniques may break this scheme.)
  In theory, we may end up marking a few unreachable objects as reachable
as well, since integers may be misinterpreted as pointers.  The collector
can be designed so that the only result of this is excess storage retention.
(This does require that we don't compact.)  In my experience, I have never
seen any indication of such unnecessary retention in practice.  There is
an argument to be made that no garbage collector is completely immune
from this phenomenon.
  The fact that the collector has to perform a fairly complicated (but still
O(1)) check for pointer validity does slow it down a little.  On a Sun 3/260,
with a 2 Meg heap, half of which is in use, we normally see collection times of
about 3 seconds.  (This assumes longword aligned pointers and small data and
static bss areas.)  All of this is largely irrelevant if the
collector is used as a debugging tool to detect storage leaks.
  More details can be found in Mark Weiser's and my upcoming Software P&E
article entitled "Garbage Collection in an Uncooperative Environment".

Hans-J. Boehm
boehm@rice.edu

g-rh@cca.CCA.COM (Richard Harter) (04/01/88)

In article <632@ra.rice.edu> boehm@titan.rice.edu (Hans Boehm) writes:
>>        There is problably something I am missing, but I don't quite see
>>how you can implement garbage collection in C without collateral assumptions
>>-- how is the garbage collector supposed to know that that a block is free?

>  This is usually, but not always, possible.  The idea is to use a traditional
>non-compacting mark-sweep collector.  On a UN*X system, we start by scanning
>the registers, stack, data, and (statically allocated) bss segments for
>any addresses of valid allocated objects.  We subsequently scan allocated
>objects we find in the same way.  For a typical C program, all reachable
>objects can be found in this manner.  (Storing exclusive or'ed pointer
>values, tagged pointers, and similiar programming tricks, as well as some
>rarely used (for C) compiler optimization techniques may break this scheme.)

I see.  My block was in not thinking of the entire address space of the
program as accessible.  For our purposes it is not (we live in portability
land and are not allowed to do things like that :-)).  That is one reason
we did our own allocator (in C) -- it allows us to track down memory problems
portably.

I still don't see how a garbage collector helps with premature deallocation,
unless you scrap deallocation entirely, and rely entirely on garbage
collection.

The problem with garbage collectors in the old days was that they did
garbage collection when the available space was allocated.  This works
well enough until the actual amount of space allocated starts to approach
the amount of available space, and then the garbage collector starts
thrashing.  Is this a problem?  If not, how do you deal with it?
-- 

In the fields of Hell where the grass grows high
Are the graves of dreams allowed to die.
	Richard Harter, SMDS  Inc.

boehm@titan.rice.edu (Hans Boehm) (04/02/88)

  Richard Harter writes:
>I still don't see how a garbage collector helps with premature deallocation,
>unless you scrap deallocation entirely, and rely entirely on garbage
>collection.

>The problem with garbage collectors in the old days was that they did
>garbage collection when the available space was allocated.  This works
>well enough until the actual amount of space allocated starts to approach
>the amount of available space, and then the garbage collector starts
>thrashing.  Is this a problem?  If not, how do you deal with it?

  Our current version of the collector doesn't help you with
premature deallocation.  Further instrumenting it might help.  For example,
you could use the marking algorithm to determine whether "free"d objects
really were inaccessible.  (This requires a coding style in which
unused references are explicitly cleared.  Replacing "free" by a
macro that clears its argument takes care of a lot of that and,
by itself, occasionally helps track down bugs.)  I don't have much experience
with this though.
  In non-real-time applications, we tend to use explicit deallocation where
it's easy to do, and let the collector take care of the error-prone cases.
  You are right in that the garbage collector code is not completely portable.
It should really be viewed as part of the run-time library, which is rarely
completely portable anyway.  Actually porting it isn't terribly hard.  There
is only about a screenful of machine dependent code in the collector.
The longest part of that deals with marking from the machine registers.
  Frequent collections in small address spaces can become a problem.
We operate in a virtual memory environment, so we deal with it by expanding
the heap size whenever a collection fails to return enough memory.  This
requires allocating a bit of extra memory.  For debugging, this hardly
matters.  (Memory is supposedly cheap.) In a production environment,
especially without virtual memory, it could be a problem.  On the other hand,
for some existing code, this can be much less significant than memory
otherwise lost to leaks.

Hans-J. Boehm  (boehm@rice.edu)

nevin1@ihlpf.ATT.COM (00704a-Liber) (04/05/88)

[followups to comp.lang.misc]

In article <26358@cca.CCA.COM> g-rh@CCA.CCA.COM.UUCP (Richard Harter) writes:
>	There is problably something I am missing, but I don't quite see
>how you can implement garbage collection in C without collateral assumptions
>-- how is the garbage collector supposed to know that that a block is free?

The only way to do this is to allocate a giant heap with malloc()
and do *all* the memory management on your own.  It is no different
than implementing in C or C++ a language that does automatic garbage
collection.

The problem I found with languages that do automatic garbage collection is
that there is no way for me to modify the implementation of the class (or
the type, for those of you unfamiliar with C++ lingo :-)) to make it better
suit my needs.

For applications where the programmer cost is greater than the run-time
cost (such as prototypes, personal tools, one-shot programs), I tend to use
languages with built in memory management.  For other applications, however,
I would rather use a language like C++ which allows me to change the
implementation of a class without affecting the rest of my code.
-- 
 _ __			NEVIN J. LIBER	..!ihnp4!ihlpf!nevin1	(312) 510-6194
' )  )				"The secret compartment of my ring I fill
 /  / _ , __o  ____		 with an Underdog super-energy pill."
/  (_</_\/ <__/ / <_	These are solely MY opinions, not AT&T's, blah blah blah

ifocs9d@aucs.uucp (Rick Giles) (08/01/90)

I'm investigating memory management and garbage collection schemes for
Prograph, an object-oriented, data-flow programming language. The current
implementation of Prograph uses reference counting for garbage collection.
We're considering the pros and cons of using mark and sweep or generational
garbage collection algorithms. If you have some first-hand recommendations
or information I'd be pleased to hear from you. I can give further info
on Prograph and the memory management issues of the language if that
will help. Thanks.

Rick Giles

Bitnet:   FRGILES@Acadia.ca
Internet: FRGILES%Acadia.BITNET@CUNYVM.CUNY.EDU 
UUCP:     uunet!dalcs!aucs!ifocs9d