[comp.lang.c++] garbage collection and member functions

dougm@zamenhof.rice.edu (Doug Moore) (12/20/90)

Please give a thought to the following problem if garbage collection
interests you.  Otherwise, skip it.

I have implemented a mini-Scheme interpreter in C++.  Underlying the
implementation are the classes Cons and SExpression, where

class Cons
{
   SExpression car, cdr;
};

and

class SExpression
{
  bool isAnAtom;
  union
    {
      Atom *xatom;
      Cons *xcons;
    };
};

are the basic representations of the two types.  I have implemented
the evaluator as a recursive member function of SExpression, and I
have lots of other SExpression member functions lying about.

The implementation as it stands allocates new Conses regularly, and
most of them become garbage; I desire to add a copy-collect garbage
collector, to better control memory usage.  In order to avoid
rewriting all my code, I choose to overload operator new for Conses,
so that it copies and collects when a call for a new Cons cannot be
immediately satisfied.

But this blows my member functions all to hell.

Consider one particular context.  The evaluator has called for the
evaluation of a subexpression and, indirectly, triggered a garbage
collection.  *This SExpression, the car part of a Cons, is dead.
That is, this points to garbage.  I can take precautions to make sure
that 'this' SExpression is properly copied from the dying Cons space to
the new Cons space during the garbage collection, and I do, but 'this'
continues to point to something in the dead Cons space.

So what can I do?  Well, I already add *this to a set of SExpression
roots to ensure that it is copied, and I can manage to retrieve its
new location after I finish the subexpression evaluation, but then
what?  Assign the new location to 'this'?  I'd rather not use obsolete
language features.  Don't use the implicit this anymore, but instead
explicitly dereference 'that', a pointer that I update after every
operation that might have triggered a garbage collection?  Well, that
idea makes me regret ever using member functions.

Or, there's idea that I just tried without success: add a reference to
'this' to a collection of SExpression pointer references updated by
the garbage collector.  The intent was that the garbage collector
would know where 'this' lived ('this' the SExpression *), and update
it when *this was copied, so that 'this' would be magically changed to
point into the new Cons space after garbage collection.  Well, the
compiler I used wasn't too fond of references to 'this'.

I can certainly solve my problems if I'm willing to abandon member
functions and let my memory management implementation spill all over
the rest of my code.  But then I'd might as well be writing C if I
have to do all that.  How can I do it elegantly?

In case you wondered, this was a school assignment.  I was the
assigner, not an assignee, and I'm beginning to wonder if it is an
appropriate assignment in light of the ugliness of any solution I can
find.

Thanks for any suggestions you have to offer.

Doug Moore
(dougm@rice.edu)

aaron@backyard.Berkeley.EDU (Aaron Akman) (12/21/90)

In article <1990Dec20.045909.7681@rice.edu>, dougm@zamenhof.rice.edu (Doug Moore) writes:
|> 
|> Please give a thought to the following problem if garbage collection
|> interests you.  Otherwise, skip it.
|> 
To be honest I didn't exactly understand the problem you are running into,
but I tried to implement something similar:  lisp-style cons-ing w/auto
garbage collection.  I also overrode new for underlying class(es?), and
I had some success.  I don't have the code before me, but here's what
I remember:

I created 2 objects, one roughly the same as your SExpression and one
called LispObj.  When I declared:

	LispObj x;

The object 'x' automatically linked itself into a list that I wanted to
mimic the old Obarray in lisp (I hope I'm getting the jargon right).  That's
the list of things my garbage collector won't collect (their still in-use).

The LispObj points to one of my SExpression objects which it gets by
calling the SExpression's overridden new().  The SExpression's are
managed in a big pool, and new() gets one from the free list, which starts
out as everything.

When 'x' goes out of scope, and its destructor is called, it takes itself
out of the list of LispObj's that are still in-use.  I used a regular mark
and sweep GC, and used my Obarray to identify what's still in-use.

My CONS function operated on 2 LispObj's, and I had user-defined conversions
from string, SExpression, and integer to LispObj.  The compiler-supplied
LispObj didn't go out of scope until the next close brace, so this:

	LispObj y;

	{
		x = cons("hello", cons("goodbye", y));
	}

would safely tuck away anything important dangling off of y before the
close brace, and any excess would automatically get GC-ed.

I realize you are trying to do something much bigger (your own eval()),
but the basic idea I used was:  have 2 objects, (1) that comes from the
pool and isn't used directly, and (2) one that encapsulates the first
and links itself into an Obarray.  GC uses the list of (2) to figure out
which (1) are still in-use.

Hope that helps.

Aaron, bellcore!cheetah!aaron

thomasw@hpcupt1.cup.hp.com (Thomas Wang) (12/21/90)

/ hpcupt1:comp.lang.c++ / dougm@zamenhof.rice.edu (Doug Moore) /  8:59 pm  Dec 19, 1990 /

>Please give a thought to the following problem if garbage collection
>interests you.  Otherwise, skip it.

> The implementation as it stands allocates new Conses regularly, and
> most of them become garbage; I desire to add a copy-collect garbage
> collector, to better control memory usage.  In order to avoid
> rewriting all my code, I choose to overload operator new for Conses,
> so that it copies and collects when a call for a new Cons cannot be
> immediately satisfied.

> But this blows my member functions all to hell.

This is a known problem with C++.  Copy collector does not work, because
of the passing of 'this' pointer.

Only way for copying collector to work for C++ is to change the compiler.

Precise collector without supporting multiple inheritance can be implemented
without changing the compiler.  I did the implementation as a master thesis,
so this might not be a good class assignment.

Conservative collectors are always available though.  You can get Boehm's
collector via anonymous FTP.

> In case you wondered, this was a school assignment.  I was the
> assigner, not an assignee, and I'm beginning to wonder if it is an
> appropriate assignment in light of the ugliness of any solution I can
> find.

Most straight forward thing I can think of is to use reference counting.
It's only a class assignment, so kludgy solution should be fine.  Or even
don't delete any memory!  As long as the program don't run too long, the
program should run fine.

>Doug Moore


 -Thomas Wang
              (Everything is an object.)
                                                     wang@hpdmsjlm.cup.hp.com
                                                     thomasw@hpcupt1.cup.hp.com

cline@cheetah.ece.clarkson.edu (Marshall Cline) (12/22/90)

In article <1990Dec20.045909.7681@rice.edu> dougm@zamenhof.rice.edu (Doug Moore) writes:
[...]
>I have implemented a mini-Scheme interpreter in C++.  Underlying the
>implementation are the classes Cons and SExpression, where
>   class Cons { SExpression car, cdr; };
>and
>   class SExpression { bool isAnAtom; union { Atom *xatom; Cons *xcons; }; };
>are the basic representations of the two types.  I have implemented
>the evaluator as a recursive member function of SExpression, and I
>have lots of other SExpression member functions lying about.

This wasn't your point, but wouldn't an SExpr be either an Atom or a Cons?
Said another way, a Cons is-a SExpr and an Atom is-a SExpr.
Then you use virtual dispatching to determine which you have rather than
using a union discriminator...
If you needed to make the thing value-oriented, you could wrap an SExpr* in
a class SExpression, and then SExpression::~SExpression() would delete the
pointed-to SExpr.
Just a thought.

>The implementation as it stands allocates new Conses regularly, and
>most of them become garbage; I desire to add a copy-collect garbage
>collector, to better control memory usage.  In order to avoid
>rewriting all my code, I choose to overload operator new for Conses,
>so that it copies and collects when a call for a new Cons cannot be
>immediately satisfied.
>But this blows my member functions all to ****.
>Consider one particular context.  The evaluator has called for the
>evaluation of a subexpression and, indirectly, triggered a garbage
>collection.  *This SExpression, the car part of a Cons, is dead.
>That is, this points to garbage.  I can take precautions to make sure
>that 'this' SExpression is properly copied from the dying Cons space to
>the new Cons space during the garbage collection, and I do, but 'this'
>continues to point to something in the dead Cons space.

So the basic problem is that C++ pointers aren't ``smart'' enough.  Ie:
a copy-collect GC routine moves objects, and the pointers are machine
addresses rather than some higher level `logical' pointers, so the
machine addresses become invalid.

A possible solution is to ban `Cons*' and replace it with ConsPtr,
which is a smart pointer.  GC will then inform any ConsPtrs if it moves
their Cons cell, so all accesses to a Cons thru its ConsPtr will be ok.

The `this' pointer is still a machine address, naturally, and therefore
has no smarts.  Therefore a Cons:: member function might have its `this'
invalidated, just as before.  However if you only do things in ConsPtr's
rather than Cons objects themselves, then you'll be alright (ie: if the
Cons' machine address changes during some subcall, the ConsPtr will know
and update itself appropriately, so subsequent references to the Cons
will remain logically correct, albeit they'll refer to a different piece
of physical RAM).

Perhaps you're jumping from the frying pan into the fire (since you may
have to collect dead ConsPtr's now), but it may be a start.

Marshall Cline

PS: ConsPtr is going to cause tons of -> in your code.  It may look
cleaner to have a struct called `Cons_' that actually contains the cons'
data ptrs, then have a `Cons' object which acts like a *reference* to a
`Cons_'.  Then all your code will *look* the same as it does presently
(ie: you might pass `Cons& c' to a function f(), where you could say
c.foo() [rather than c->foo() if you had passed a ConsPtr], etc).  The
car and cdr in a `Cons_' could also be `Cons' (ie: smart references to
other `Cons_' cells).

If you adopt the earlier idea of using inheritance/polymorphism rather
than tagged unions, then the light-weight `wrapper' class called
`SExpression' could be the `reference to an SExpr' abstraction.  This
would give you reasonably clean syntax, *and* virtual dispatching to
determine whether an SExpression has a car, cdr, is an atom, etc.

Pushing this one step further, you might be able to ban the null pointer
and replace it with an object of class Nil.  Nil would also be-a SExpr,
but the recursive routines (such as `do X on yourself and pass the
message to your cdr') would be no-ops.  The end result is that you may
have fewer `if (cons_cell.cdr != 0) ...'  tests in your code.  Obviously
there needs be only one physical object of class Nil, and the SExpresion
which `refers' to that Nil object would be what Scheme programs know as
`nil'.

--
PS: If your company is interested in on-site C++/OOD training, drop me a line!
PPS: Career search in progress; ECE faculty; research oriented; will send vita.
--
Marshall Cline / Asst.Prof / ECE Dept / Clarkson Univ / Potsdam, NY 13676
cline@sun.soe.clarkson.edu / Bitnet:BH0W@CLUTX / uunet!clutx.clarkson.edu!bh0w
Voice: 315-268-3868 / Secretary: 315-268-6511 / FAX: 315-268-7600