[comp.lang.c++] GC

bet@orion.mc.duke.edu (Bennett Todd) (06/14/89)

In article <77300030@p.cs.uiuc.edu>, johnson@p.cs writes:
>[...]
>                                      However, the main reason for a garbage
>collector in C++ would be to catch "space leaks", so it would not have to
>be called very often.

I think this observation is crucial to appreciating the gap between
those of you who like garbage-collecting environments and those of us
who don't (what, me biased?:-). Part of the job I take on in program
design and implementation is closely tracking allocation of crucial
resources, including memory. I view a memory leak as a bug; when I write
something big enough that I am not absolutely positive I didn't miss any
potential leaks, I always drive it for a long while with a steady load
of work which should settle down to a steady state of memory
consumption, then set back and examine its memory use via top(1) (or
ps(1) or whatever tool is the least unpleasant to use for the job:-).
I'm not sure where the damage was done to my brain; I guess it is
probably because I got taught techniques that enable me to write
extremely large, complex applications while tracking the memory
consumption myself, before I ever got exposed to a garbage-collecting
environment. By then I didn't feel any strong need to abandon these
techniques, since they don't seem cumbersome to me.

In particular I remember fondly a really bizarre package I wrote, more
as an exercise in "can I do this?" than for practical application. As I
recall, I wanted to write the closed form roots of polynomial expression
in C, so I wrote a complex library. I wanted to be able to write
composed expressions, and not have the implicit intermediates get
dropped on the floor, e.g.

	result = plus(complex(1.0,0.0),complex(0.0,1.0));

Necessarily the two calls to the constructor complex() will produce two
intermediates; I decided to mandate that all arithmetic operators would
destroy their arguments. I called this convention "parameter passing by
call and destroy" and it always struck me as pleasantly perverse. These
days, with structs being treated to call-by-value and properly supported
assignment, and struct return generating suitably allocated anonymous
copies, the above shenanighans aren't needed any more (and C++ offers,
at last, the ability to make a complex type both as convenient to use
and as efficient as if it were defined as an arithmetic primitive in the
language -- death to Fortran!:-). Nonetheless, it wouldn't appeal to me
to create a package which goes about allocating memory, and dropping it
on the floor, on the assumption that a garbage collector will pick up
the pieces after me.

-Bennett
bet@orion.mc.duke.edu

bet@orion.mc.duke.edu (Bennett Todd) (06/15/89)

In article <14739@duke.cs.duke.edu>, I wrote:
>                                                             [...] As I
>recall, I wanted to write the closed form roots of polynomial expression
>in C, [...]
Needless to say, I should have proofread more closely. What I was
actually coding up was the closed form roots of *cubic* polynomials.
Sorry if I got anyone's hopes up:-).

-Bennett
bet@orion.mc.duke.edu

jima@hplsla.HP.COM (Jim Adcock) (06/22/89)

I started playing with Boehm's garbage collector yesterday, and found it
easy [dare I say "trivial"] to get it running with C++.  It took maybe 2 hours
for me to get up to speed on this with C++.  I think many people will find this
an entirely suitable GC for many C++ applications where traditionally one
would want to use GC.  Clearly, the 2.0 features ark has been talking about
in terms of "new" and "delete" overloading are what one wants to be able
to hook in various memory management schemes.  On my mid-range 68020 
system GC took 1-3 seconds on a couple meg of heap with a few thousand
"lost" objects floating around.  Your mileage may vary.  Boehm's GC worked
well even when I was deliberately very malicious to it -- filling up my
objects with random numbers many of which might very well look like pointers.

If you'd like a copy of the couple little files I needed to write to do
this, please ask. [These files are not "publication quality", but would
provide good hints to get you started.]

Boehm's GC clearly isn't the solution for people with strict real-time
needs.  It would be nice if some C++ specific hooks could be placed in the
GC to allow a virtual destructor to be called [where applicable] when such
a C++ object is found.  Right now when a "lost" object is found, no destructor
is called [which is really pretty reasonable, since you lost the object in
the first place.]  Even if the GC was modified to call the appropriate
destructor, you'll still have no guarantee that destructors are called in
the order you might like.  So destructors with side effects might still 
be problematic.  Still, I think this is about what one can reasonably expect
from a GC.  As it stands, the GC is fine for traditional situations where
one loses track of objects -- lisp-like cells, binary math ops that need
to create temporaries, etc.

"Obviously," Boehm's collector is exactly what one needs to "solve" [patch]
the problem that Johnson mentions where a program runs for a couple days,
months, or weeks, and "suddenly" springs a leak. 

On my system the entire GC archive is about 32K.  In a program that just
filled objects with random numbers then "lost" track of them, memory 
allocation, management and garbage collection took about 10% of the total
execution time, with the GC's marking routine taking most of the time.

parag@hp-ses.SDE.HP.COM (Parag Patel) (06/22/89)

[drifting away here]

> As far as I know, it is possible to provide garbage collection for C,
> and some systems have actually done so.  The GC has to be "conservative"
> in that it must regard anything that might be a pointer as a pointer.
> So there might be some garbage it can't collect, but it will never
> collect non-garbage.

This sort of garbage-collector has an interesting problem handling C++
classes because there's no way to guarantee that the destructor for
every class could actually be called.  All it could do would be to
counteract memory leaks and free unused memory (still useful tho').

Now if it could figure out when a random memory object has a virtual
jump table or not, it could at least handle the classes with virtual
destructors (I think).  Then if you want your object GCed, make your
destructor virtual, then link to the libGC++malloc.a instead!

Still not quite a general-purpose GC, but it could be useful...

jima@hplsla.HP.COM (Jim Adcock) (06/24/89)

The following simple program makes random length trees -- and fools Boehm's
GC.  I seem to remember something about this in his README file.  Maybe one
had better bone up on his algorithm before getting too deep in this.

(change either 2 to a 3 (25% chance verses 50% chance) in randomlinks and
 the GC works fine.  Program is not "publication quality.")

-----------
#include <stdio.h>
#include "gc++.h"

extern "C"
{
  int rand();
};

void* ::operator new(long sz)
{
  return gc_malloc((sz < 8) ? 8 : sz);
}

void ::operator delete(void *p)
{
  if (p) gc_free(p);
}

class link
{
public:
  link* head;
  link* tail;
  link() {head = 0; tail = 0;}
  void randomlinks();
};

long sum;

void link::randomlinks()
{
  if (!(rand() & 2))
  {
    head = new link;
    head->randomlinks();
    sum += sizeof(link);
  }

  if (!(rand() & 2))
  {
    tail = new link;
    tail->randomlinks();
    sum += sizeof(link);
  }
}
  
main()
{
  link * p[4096];
  initialize_allocator();

  long cntinc = 4096;
  int n, q;

  while (1)
  {
      q = 0x0ff & rand();
      p[q] = new link;
      sum += sizeof(link);
      (p[q])->randomlinks();

      if (sum > cntinc)
      {
	  cntinc <<= 1;
	  printf("\n *** total 'lost' so far: %d ***\n\n",sum);
	  if (sum > 4000000) exit(0);
      }
  }
}

jima@hplsla.HP.COM (Jim Adcock) (06/29/89)

> The following simple program makes random length trees -- and fools Boehm's
> GC.  I seem to remember something about this in his README file.  Maybe one
> had better bone up on his algorithm before getting too deep in this.
> 
> (change either 2 to a 3 (25% chance verses 50% chance) in randomlinks and
>  the GC works fine.  Program is not "publication quality.")

Oops, sorry, as Boehm pointed out to me, the problem is apparently not with
Boehm's GC, but in my "logic" in analysing the expected behavior of my
"simple" program.

I didn't check how much these things were actually growing.  I thought
about a single leaf, and said to myself that with each child having a
50% probability of not existing, there is a 25% probability of any given
leaf not having any children, therefor the tree "must" terminate.

But when I think of it another way, each leaf has an "expected" amount of
2*0.5 = 1.0 children, (2*0.5)*(2*0.5) = 4*0.25 = 1.0 grandchildren, 
(2*0.5)*(2*0.5)*(2*0.5) = 8*0.125 = 1.0 great-grand-children.... so the
size of a tree would be marginally ill-conditioned, and could grow 
"indefinitely."

Changing a 2 to a 3 makes for "expected" 0.75 children, 0.5625 grand-children,
etc, so then one doesn't have indeterminate growth of the trees.

Now, if Boehm's GC *could* handle my program, then he'd *really* have
something! :-)

drich@klaatu.lanl.gov (David O. Rich) (05/08/91)

>   For a discussion of the "mostly copying" garbage collection algorithm, see
>   "Compacting Garbage Collection with Ambiguous Roots", WRL Research Report
>   88/2, February 1988, and "Mostly-Copying Garbage Collection Picks Up
>   Generations and C++", WRL Technical Note TN-12, October 1989.

Are these TR's FTP-able from somewhere? Thanks!

--Dave

bartlett@decwrl.dec.com (Joel Bartlett) (05/08/91)

Both tech reports are available via DECWRL's tech report server.  Send a
message to WRL-Techreports@decwrl.dec.com with "help" in the subject
line and it will return you instructions.

jfb