[comp.sw.components] Component with garbage collection

johnson@p.cs.uiuc.edu (09/27/89)

I'm building a compiler with a complex type system.  There is an
abstract data type "type" that describes the various kinds of types.
There are only a few public operations on types, but there are many
private operations that are used to implement the public operations.

Types are immutable values, so it should be fairly easy to manage their
storage by hand.  However, types take up a significant percentage of
memory, so we work hard to have different types share structure.  This
works well because they are immutable.  However, sharing structure means
that deleting type T will not necessarily delete the components of T.
This might seem like a good place to use reference counts, but
unfortunately types are circular structures.  Thus, there is really
no alternative but garbage collection.

We could avoid garbage collection by not sharing structure.  This would add
a megabyte to the amount of memory needed to run our programs.  Moreover,
since the compiler doesn't work all that well, we only compile small programs.
It is hard to tell how much memory structure sharing would save on a large
program, but I bet it would be a lot.

If I were programming this in C++ then I would write the type module
to collect its own garbage.  However, this would certainly be no
faster and would be a lot more error-prone than automatic garbage collection.

Ralph Johnson

billwolf%hazel.cs.clemson.edu@hubcap.clemson.edu (William Thomas Wolfe, 2847 ) (09/27/89)

From article <130200012@p.cs.uiuc.edu>, by johnson@p.cs.uiuc.edu:
> [Types in a compiler, which share structure]

   This is basically equivalent to the disconnected subgraph problem;
   both are tantamount to presenting the general garbage collection
   problem and saying "here's a problem which requires garbage collection".

   Obviously, for any problem which either is the general GC problem or
   is analytically equivalent, GC algorithms can be used.  Maybe there
   are even a fair number of situations in which this is the case.  If
   so, then GC is worth keeping around.  In these cases in which the ADT
   implementor doesn't have any useful information to communicate to the
   storage management system; the GC system can still raise an exception 
   if it can't find any more space.  Also, the implementor can still do
   a managed version if variable response time is a problem. 

   In general, though, it still seems more efficient to communicate any
   information regarding the availability of storage which might be known
   by the ADT implementor, so that the work of searching for garbage can
   be concentrated on those situations in which it cannot be avoided. 


   Bill Wolfe, wtwolfe@hubcap.clemson.edu

scc@cl.cam.ac.uk (Stephen Crawley) (09/27/89)

Bill Wolfe writes:
> Obviously, for any problem which either is the general GC problem or
> is analytically equivalent, GC algorithms can be used.

.. GC algorithms MUST be used.

> Maybe there are even a fair number of situations in which this is the case. 

Indeed there are!

Furthermore, there are a fair number of other situations where the problem
is not analytically equivalent to the general GC problem, BUT a GC solution
is the most efficient solution anyway. 

> If so, then GC is worth keeping around.

[Shock horror!]

> In these cases in which the ADT implementor doesn't have any useful 
> information to communicate to the storage management system; the GC 
> system can still raise an exception if it can't find any more space.  

I'm not sure what you are getting at here.  There is the obvious point
that any dynamic storage management system can completely run out of 
space, be it a GC'ed on a non-GC'ed system.  However, I've argued before 
that a non-GC'ed system is likely to get into this state sooner than
a GC'ed system:

  o If the non-GC'ed solution needs to be implemented using reference 
    counts, there is a 1 word overhead for each ref counted object.
    
  o Objects are likely to be explicitly freed later in the non-GC'ed 
    case than they become inaccessible in a GC'ed case.  [Admittedly,
    this argument assumes a degree of intelligence on the part of the
    compiler in the GC'ed case]

> Also, the implementor can still do a managed version if variable 
> response time is a problem.

Alternatively use a real-time garbage collector.

> In general, though, it still seems more efficient to communicate any
> information regarding the availability of storage which might be known
> by the ADT implementor, so that the work of searching for garbage can
> be concentrated on those situations in which it cannot be avoided. 

Alternatively let a smart compiler optimise the storage reclamation.
At least that way we know that the program will work.