johnson@p.cs.uiuc.edu (09/27/89)
I'm building a compiler with a complex type system. There is an abstract data type "type" that describes the various kinds of types. There are only a few public operations on types, but there are many private operations that are used to implement the public operations. Types are immutable values, so it should be fairly easy to manage their storage by hand. However, types take up a significant percentage of memory, so we work hard to have different types share structure. This works well because they are immutable. However, sharing structure means that deleting type T will not necessarily delete the components of T. This might seem like a good place to use reference counts, but unfortunately types are circular structures. Thus, there is really no alternative but garbage collection. We could avoid garbage collection by not sharing structure. This would add a megabyte to the amount of memory needed to run our programs. Moreover, since the compiler doesn't work all that well, we only compile small programs. It is hard to tell how much memory structure sharing would save on a large program, but I bet it would be a lot. If I were programming this in C++ then I would write the type module to collect its own garbage. However, this would certainly be no faster and would be a lot more error-prone than automatic garbage collection. Ralph Johnson
billwolf%hazel.cs.clemson.edu@hubcap.clemson.edu (William Thomas Wolfe, 2847 ) (09/27/89)
From article <130200012@p.cs.uiuc.edu>, by johnson@p.cs.uiuc.edu:
> [Types in a compiler, which share structure]
This is basically equivalent to the disconnected subgraph problem;
both are tantamount to presenting the general garbage collection
problem and saying "here's a problem which requires garbage collection".
Obviously, for any problem which either is the general GC problem or
is analytically equivalent, GC algorithms can be used. Maybe there
are even a fair number of situations in which this is the case. If
so, then GC is worth keeping around. In these cases in which the ADT
implementor doesn't have any useful information to communicate to the
storage management system; the GC system can still raise an exception
if it can't find any more space. Also, the implementor can still do
a managed version if variable response time is a problem.
In general, though, it still seems more efficient to communicate any
information regarding the availability of storage which might be known
by the ADT implementor, so that the work of searching for garbage can
be concentrated on those situations in which it cannot be avoided.
Bill Wolfe, wtwolfe@hubcap.clemson.edu
scc@cl.cam.ac.uk (Stephen Crawley) (09/27/89)
Bill Wolfe writes: > Obviously, for any problem which either is the general GC problem or > is analytically equivalent, GC algorithms can be used. .. GC algorithms MUST be used. > Maybe there are even a fair number of situations in which this is the case. Indeed there are! Furthermore, there are a fair number of other situations where the problem is not analytically equivalent to the general GC problem, BUT a GC solution is the most efficient solution anyway. > If so, then GC is worth keeping around. [Shock horror!] > In these cases in which the ADT implementor doesn't have any useful > information to communicate to the storage management system; the GC > system can still raise an exception if it can't find any more space. I'm not sure what you are getting at here. There is the obvious point that any dynamic storage management system can completely run out of space, be it a GC'ed on a non-GC'ed system. However, I've argued before that a non-GC'ed system is likely to get into this state sooner than a GC'ed system: o If the non-GC'ed solution needs to be implemented using reference counts, there is a 1 word overhead for each ref counted object. o Objects are likely to be explicitly freed later in the non-GC'ed case than they become inaccessible in a GC'ed case. [Admittedly, this argument assumes a degree of intelligence on the part of the compiler in the GC'ed case] > Also, the implementor can still do a managed version if variable > response time is a problem. Alternatively use a real-time garbage collector. > In general, though, it still seems more efficient to communicate any > information regarding the availability of storage which might be known > by the ADT implementor, so that the work of searching for garbage can > be concentrated on those situations in which it cannot be avoided. Alternatively let a smart compiler optimise the storage reclamation. At least that way we know that the program will work.