[comp.sw.components] Problem requiring GC -- try again.

djones@megatest.UUCP (Dave Jones) (09/27/89)

From article <911@scaup.cl.cam.ac.uk>, by scc@cl.cam.ac.uk (Stephen Crawley):
> Golden Richard III writes:
> 
> Here are some problems where programmer-controlled storage management 
> would be very difficult or impossible.
> 
>   Interactive editting of cyclic graph data structures.  You have a 
>   heterogeneous cyclic graph data structure and the editor must be able
>   to make arbitrary sequences of changes to the data structure.  How
>   do you detect that a subgraph has become detached?   
>   

Funny you should ask, because I'm working on such a problem even as we
speak, sans garbage-collector. The editor is not interactive -- it is
essentially an optimizer which reduces graphs by calculating equivalency
classes of subgraphs -- but I don't see that that matters much whether it
is interactive or not. I handled the sitch by making an "arrow" a proper
data-object, not just a vanilla pointer. One extra pointer's worth of
memory per arrow, but cheap at twice the price!

Before I had that brainstorm I had all sorts of problems, the very least
of which was reclaiming memory. (A wise man once said, if you
don't understand the problem well enough to know when an object
is unreachable, you don't understand the problem well enough.)

Here it is, simplified only a little, to be generic:

    typedef struct {
      Info node_info;     /* Info that the node carries */
      List arrows_out;    /* List of all arrows pointing to this node */
      List arrows_in;     /* List of all arrows pointing out of this node */	
    }Node;

    typedef struct {
      Symbol label;       /* Symbol which labels this arrow */
      Node  *from;        /* Node at the base of this arrow */
      Node  *to;          /* Node at the point of this arrow */
    }Arrow;

scc@cl.cam.ac.uk (Stephen Crawley) (09/28/89)

I wrote:
>> Here are some problems where programmer-controlled storage management 
>> would be very difficult or impossible.
>>
>>   Interactive editting of cyclic graph data structures.  You have a 
>>   heterogeneous cyclic graph data structure and the editor must be able
>>   to make arbitrary sequences of changes to the data structure.  How
>>   do you detect that a subgraph has become detached?

Dave Jones replied:
> Funny you should ask, because I'm working on such a problem even as we
> speak, sans garbage-collector. The editor is not interactive -- it is
> essentially an optimizer which reduces graphs by calculating equivalency
> classes of subgraphs -- but I don't see that that matters much whether it
> is interactive or not. I handled the sitch by making an "arrow" a proper
> data-object, not just a vanilla pointer. One extra pointer's worth of
> memory per arrow, but cheap at twice the price!

First: you are solving a special case of a graph consisting of nodes
connected by two way links.  I was thinking of the more general case 
(harder) problem where the links are ordinary pointers.  

Try to imagine a randomly connected network of records of various types 
with pointers going all over the place.  The problem is to construct
a graph editor that will allow me to make modifications to this data 
structure in place.  It might be part of an interactive debugger, or
a tool for patching a knowledge base or something like that.

Second: have you tried measuring the performance of your non-GC'ed
solution to your problem against that of an equivalent GC'ed solution
(same problem, same language/compiler, decent garbage collector ...) 
Without taking proper measurements, your pronouncements about the 
efficiency advantages of your solution have no scientific basis.
Even then, the special nature of the graphs you are dealing with
and the calculations your program does tends to make performance
generalisations.

Third: I presume that you realise that with your data structures, adding 
and deleting an arrow both have complexity order (M + N) where M and N
are the average number of in arrows and out arrows respectively per node.
You are trading off reclamation overheads against the overheads of 
constructing and mutating the data structure.  (Of course there may be
other application specific reasons for using your data structures ...
but that is not at issue)

Finally: you must admit that even your special case solution was hard
to arrive at, and that at least the storage management aspects would
have been considerably easier if you had a GC in your language.  

> (A wise man once said, if you don't understand the problem well 
> enough to know when an object is unreachable, you don't understand 
> the problem well enough.)

That sounds more like the pronouncement of a fool to me ...  

-- Steve

dwiggins@atsun.a-t.com (Don Dwiggins) (09/29/89)

OK, here's another shot at a problem that seems to call for GC; let's see
what y'all can make of it.

One of the elements of the Prolog language is the "atom" (read "name" or
"symbol").  In the language model, atoms aren't created or destroyed, they
just *are* (like the integers, for instance).  To support this model, the
Prolog interpreter must allocate a block in "atom space" whenever a mention
of a new atom is required (either read in or needed as the result of a
string-to-atom conversion).  Problem: when to delete blocks in atom space?
The Prolog user is no help, having no concept of "atom deletion".

Reference counting is one possibility, but it has one serious defect.
During backtracking, which is a common Prolog operation, potentially large
chunks of stack space are "cut away"; to have to wade through that space
finding pointers and decrementing counts would be a major performance hit.
IMHO, a good asynchronous garbage collector working in the background would
be the best solution.

Historical note: at the time I was last involved in implementations of
Prolog (several years ago), atom space was generally not "cleaned up" at
all; when it filled up, the system had to abort.

--
Don Dwiggins				"Solvitur Ambulando"
Ashton-Tate, Inc.
dwiggins@ashtate.uucp

djones@megatest.UUCP (Dave Jones) (09/29/89)

From article <923@scaup.cl.cam.ac.uk>, by scc@cl.cam.ac.uk (Stephen Crawley):
...
> 
> First: you are solving a special case of a graph consisting of nodes
> connected by two way links.  I was thinking of the more general case 
> (harder) problem where the links are ordinary pointers.  
> 

I was solving a problem in graph reduction. Using two way links was
an implementation decision. 

> Try to imagine a randomly connected network of records of various types 
> with pointers going all over the place.  The problem is to construct
> a graph editor that will allow me to make modifications to this data 
> structure in place.

Sounds a lot like the LISP heap to me. Garbage collection is known
to be a dandy way to handle that. I'll give that much ground. I have
yet to encounter a real problem where I ultimately decided that the data
should be quite that scrambled and unorganized, but if I do, I'll clammor
for LISP or something like it. It's not a religion with me.

> Second: have you tried measuring the performance of your non-GC'ed
> solution to your problem against that of an equivalent GC'ed solution
> (same problem, same language/compiler, decent garbage collector ...) 
> Without taking proper measurements, your pronouncements about the 
> efficiency advantages of your solution have no scientific basis.

At a midwestern university, I once taught a couple of graduate courses in
the Design and Analysis of Algorithms. (Ooooooh. Impressed?  No?
Oh, well.) I somehow managed to convince some of my students that you could
usually make accurate estimates as to how fast things would run without
actually running them. Almost convinced myself at one point.

> ...
> 
> Finally: you must admit that even your special case solution was hard
> to arrive at, and that at least the storage management aspects would
> have been considerably easier if you had a GC in your language.  
> 

So what? I like a good challenge. Keeps me out of the cardrooms at night.
Well, some nights.