jordan@aero.org (Larry M. Jordan) (03/13/91)
I'm endeavoring to implement an interpreter/compiler for a LISP-like language
in C++. The question I wish to have answered is "Can I implement a
type-less lang. like LISP in C++ without resorting at some point to
actually asking an object for its type?" I just don't see how not to.
Also, I'm implementing a simple mark/sweep garbage collector which takes
liberties with Cons cell pointers to avoid recursion. I don't see a
reasonable way to do this without asking objects during the mark phase
'are you a Cons?'. (Of course, I will need to implement the predicate
functions like 'null, consp, listp, etc.'). I'd also like some
remarks on the 'reasonableness' of the implementation scheme I've
chosen (to be described next paragraph).
Techniques I've seen in the past for implementing LISP objects in
languages like C or Modula-2 define an enumeration type and create
a discriminated union using the enum as a type 'tag' or discriminant.
XLISP and XSCHEME are implemented this way and it seems perfectly
reasonable. I've taken a different approach since I'm using C++.
I'm implementing LISP objects as C++ classes--Cons, Vector, Fixnum,
Closure, Environment...--derived from a common base class Node. Most
methods are virtual and defined in class Node with a call to an error
routine. It is each derived class's responsiblity to redefine the
methods which are meaningfull--i.e. the Cons class redefines
'car() and cdr()':
class Cons : Node {
Node *theCar;
Node *theCdr;
virtual Node *car() { return theCar; }
virtual Node *cdr() { return theCdr; }
virtual void setCar(Node *n) { theCar = n; }
virtual void setCdr(Node *n) { theCdr = n; }
...
};
Consequently, there are a large number of virtual methods in the base
class Node--20+ and growing. At first I thought there would be a lot
of space dedicated to large vft's--sizeof(char*) *
NumberOfVirtualMethodsInBase for each LISP object
derived from Node! I was concerned about using vast amounts of memory
for vft's. But then I considered what the alternative would be--
using code space to check the type of the object I'm about to
reference as a 'Cons' object (which would be an operation that
would widely distributed throughout the system) rather that just
envoke the method. I get the error checking at the expense of a
calling a virtual method. I'm convinced that this experiment is
worthwhile to continue.
Thanks in advance.
--Larrydsr@mir.mitre.org (Douglas S. Rand) (03/13/91)
In article <1991Mar12.221015.22144@aero.org>, jordan@aero.org (Larry M. Jordan) writes: > I'm endeavoring to implement an interpreter/compiler for a LISP-like language > in C++. The question I wish to have answered is "Can I implement a > type-less lang. like LISP in C++ without resorting at some point to > actually asking an object for its type?" I just don't see how not to. > Also, I'm implementing a simple mark/sweep garbage collector which takes > liberties with Cons cell pointers to avoid recursion. I don't see a > reasonable way to do this without asking objects during the mark phase > 'are you a Cons?'. (Of course, I will need to implement the predicate > functions like 'null, consp, listp, etc.'). I'd also like some > remarks on the 'reasonableness' of the implementation scheme I've > chosen (to be described next paragraph). > Well to start with, why not have a virtual function like GCmarkPhase which does something with conses or their decendents but not with other LISP objects? > Techniques I've seen in the past for implementing LISP objects in > languages like C or Modula-2 define an enumeration type and create > a discriminated union using the enum as a type 'tag' or discriminant. > XLISP and XSCHEME are implemented this way and it seems perfectly > reasonable. I've taken a different approach since I'm using C++. It is if the language doesn't do this for you with virtuals. > ... some text deleted ... > Consequently, there are a large number of virtual methods in the base > class Node--20+ and growing. At first I thought there would be a lot > of space dedicated to large vft's--sizeof(char*) * > NumberOfVirtualMethodsInBase for each LISP object > derived from Node! I was concerned about using vast amounts of memory > for vft's. But then I considered what the alternative would be-- Why, do you think each instance would have a copy of the vtab? The vtab is equivalent to a class variable. There is one copy per class. > using code space to check the type of the object I'm about to > reference as a 'Cons' object (which would be an operation that > would widely distributed throughout the system) rather that just > envoke the method. I get the error checking at the expense of a And the type check via the virtual function call is much faster and more foolproof. > calling a virtual method. I'm convinced that this experiment is > worthwhile to continue. > Have fun. -- Douglas S. Rand Internet: <dsrand@mitre.org> Snail: MITRE, Burlington Road, Bedford, MA Disclaimer: MITRE might agree with me - then again... Amateur Radio: KC1KJ
jordan@aero.org (Larry M. Jordan) (03/14/91)
Newsgroups: comp.lang.c++
Subject: Re: Implementing LISP in C++ (type discrimination)
Summary:
Expires:
References: <17238@cadillac.CAD.MCC.COM> <1991Mar12.221015.22144@aero.org> <1991Mar13.145107.19724@linus.mitre.org>
Sender:
Followup-To:
Distribution:
Organization: The Aerospace Corporation, El Segundo, CA
Keywords:
Doug,
Could you elaborate on the marking technique via 'GCmarkPhase'. I
just don't see what you are driving at. What I'm currently planning
to do is this:
[1] for each LISP object class (I realize there is only one vtab per
class) I have overloaded 'new' and 'delete'. Each class
'manages' its own object space. (i.e. when a new Cons cell is
'allocated', the method Cons::new() will fetch and return a Cons
cell from Cons::FreeList. If none are available, the
garbage collector will be called. It (gc) will need to
mark all objects in use, won't it? See [2]. Then sweep. If
Cons::FreeList is still empty, then the Cons object
space manager will allocate a new chuck of Cons's
and return one of these.
[2] during the mark phase, each LISP object class will be asked to
mark all its internal objects which are in use. (i.e. the Symbol
class maintains the Obarray. The Interpreter maintains the
evaluation stack, registers, etc.).
[3] each LISP object class will be asked (after marking) to
sweep, collecting objects into free lists.
'sweep' will be a static member in each class. There will be a static
'mark' as well, but this will call a generic mark function 'void mark(Node*)'
which knows how to traverse LISP structures, as I indicated before
in a non recursive way, saving backpointer when chasing car's and
cdr's and restoring pointers when unwinding. Writing such a generic
mark function seems to require doing a case analysis, doesn't it?
I have envisioned two methods one for marking during descent--
'int markAndDecend(?)' which marks and does the pointer reversal--
and a second on ascent--'int ascend(?)' which will restore pointers
up to a point where descending can continue or the marking phase is
complete. These are two more virtual functions I will need to add
to the base class 'Node'! (The number continues to grow). But, more
importantly, there are a lot of calls to virtual methods during
the marking phase. How can this be as efficient as an up front
case analysis?! I am concerned about the overhead. Is the technique
you've intimated substantially different? Maybe more sophisticated
storage reclamation techniques are called for?!
--Larry
chip@tct.uucp (Chip Salzenberg) (03/16/91)
According to jordan@aero.org (Larry M. Jordan): >... a generic mark function 'void mark(Node*)' which knows how to >traverse LISP structures, as I indicated before in a non recursive >way, saving backpointer when chasing car's and cdr's and restoring >pointers when unwinding. Writing such a generic mark function seems >to require doing a case analysis, doesn't it? In general, an "isKindOf()" test can almost always be replaced by a virtual function. For example: switch (p->type()) { case Cons: do_cons_thing((Cons *)p); break; case String: do_string_thing((String *)p); break; } can be replaced with a virtual do_thing() function, with implementations Cons::do_thing() and String::do_thing(). Then the above code becomes simply: p->do_thing(); >there are a lot of calls to virtual methods during the marking phase. >How can this be as efficient as an up front case analysis?! A switch() is typically a table lookup. So is a virtual function lookup. Any more questions? -- Chip Salzenberg at Teltronics/TCT <chip@tct.uucp>, <uunet!pdn!tct!chip> "Most of my code is written by myself. That is why so little gets done." -- Herman "HLLs will never fly" Rubin
jordan@aero.org (Larry M. Jordan) (03/20/91)
I'm implementing a simple LISP interpreter in C++...
I've succeeded in implementing a marking routine (for a mark and sweep
garbage collector) WITOUT type discrimination!
I'll present type-discrimating version first (which is implemented in Ada)
and follow with the C++ version.
--Ada
type NodeKind is (Free, Cons, Fixnum, ...);
type MarkSet is (Unmarked, Marked, MarkLeft);
type NodePtr is access Node;
type Node(kind: NodeKind) is
record
mark: MarkSet;
case kind is
when Free => null;
when Cons => car, cdr: NodePtr;
when Fixnum => fix: FixnumType;
...
end case;
end record;
Here's the mark function, a tricky non-recursive, routine for marking
all active structures (alg. thanks to David Betz):
procedure mark(p: NodePtr) is
current: NodePtr := p;
previous: NodePtr := null;
begin
--mark this node
loop
--descend as far as we can
while not current.mark = Marked loop
--mark this node and trace its children
case current.kind is
when Cons =>
current.mark := Marked;
if car(current) /= null then
declare
down: NodePtr := car(current);
begin
current.mark := MarkLeft;
rplaca(current, previous);
previous = current;
current = down;
end;
elsif cdr(current) /= null then
declare
down: NodePtr := car(current);
begin
rplacd(current, previous);
previous = current;
current = down;
end;
end if;
-- mark other nodes that require tracing
when ...
-- mark objects that don't contain pointers
when Fixnu =>
current.mark := Marked;
when ...
end case;
end loop;
--backup to a point where we can continue descending
loop
--make sure there is a previous node
if previous /= null then
if previous.mark = MarkLeft then --came from left side
previous.mark := Marked;
declare
up: NodePtr := car(previous);
begin
rplaca(previous, current);
if cdr(previous) /= null then
rplacd(previous, up);
exit;
else
current := cdr(previous); --step back up the branch
previous := up;
end if;
end
else --came from right side
declare
up: NodePtr := cdr(previous);
begin
rplacd(previous, current);
current := previous; --step back up the branch
previous := up;
end;
end if;
else
-- no previous node, must be done
return;
end if;
end loop;
end loop;
end mark;
Granted this is tricky to follow and the logic of what is really
happening here is hard to follow. Worse, every time I add a new kind of
object to the LISP system, say Symbols or Environments, I must
change this marking procedure (not to mention the many other places where
case analysis is going on).
Here's the not-type-descriminating version implemented in C++!
I've made Node a base class and derive all Node kinds from this class.
The mark routine is implemented entirely abstractly in terms of Nodes
and the case anaysis is replaced with dynamic binding--virtual method calls.
class Node;
typedef boolean int;
enum { FALSE, TRUE };
struct MarkState {
Node *current;
Node *previous;
};
class Node {
enum MarkSet { Unmarked, Marked, MarkLeft };
virtual boolean descend(MarkState &m) = 0; --implementation deferred
virtual boolean ascend(MarkState &m) = 0; --implementation deferred
}
void mark(Node *p)
{
MarkState m;
m.previous = NIL;
m.current = p;
while (1) {
while (m.current->descend(m))
;
while (1) {
if (m.previous == NIL)
return;
if (!m.previous->ascend(m))
break;
}
}
}
// that's it!
class Cons : Node {
...
virtual boolean descend(MarkState &m)
{
// mark and save back pointers
// return TRUE if still descending else return FALSE
}
virtual boolean ascend(MarkState &m)
{
// ascend (restoring pointers) to a point where descending can continue
// and return TRUE
// else return FALSE if cannot ascend
}
}
Chip Salzenburg raised an interesting point: 'switch() is a table
lookup'. Has anyone done a study to see just how much storage is
used in switch() jump tables for a large non-OOP implementation vs. an
OOP implementation? I'm less concern about the storage occupied by vft's
than I earlier--I'd rather pay the price for the dispatch table once
(in the class) than many times (at each switch()!?).
--Larryroger@eos.eos.scg.hac.com (Roger Sheldon (Loral)) (04/12/91)
In article <1991Mar12.221015.22144@aero.org> jordan@aero.org (Larry M. Jordan) writes: >I'm endeavoring to implement an interpreter/compiler for a LISP-like language >in C++. Larry, I've been working on Lisp in C++ also. I've discovered a garbage collection technique which you and others may find useful. Here's a summary of the technique. The class hierarchy is similar to yours: Base -------------- Object |-----------------|---------------| Cons Symbol Integer | |-----|-----| Null Static_symbol Function (The relationship of the Object class will be explained below.) Garbage collection is accomplished via a combination of reference maintenance and temporary objects. Reference maintenance is done with a few member functions: Base *Base::Ref() { refs++; return this; } void Base::Deref() { if (--refs == 0) delete this; } void Cons::Deref() { // hides inhereted Base::Deref() Base *a = this; while (a != nil.value && --a->refs==0) { a->Car().Deref(); // delete car obj (clobber if last ref) Base *save = & a->Cdr(); delete a; // may be deleting *this* ! a = save; } } The Cons::Deref() method is a variation of the deref() method found in Gnu's libg++ parameterized List class. (Thanks FSF) That's pretty much it for reference maintenance. The Null and Static_symbol classes override the inherited Deref() method with no ops so there's no need to worry about refs and derefs to nil and t. Now, the temporary objects work with the reference maintenance as follows. The Object class is defined as: class Object { Base *value; public: Object() { value = nil.value; } Object(Object &a) { value = a.value->Ref(); } Object(Base &a) { value = a.Ref(); } ~Object() { value->Deref(); } operator=(Object &a){ Base *save=value; value = a.value->Ref(); save->Deref(); } ... }; User code manipulates Lisp objects only through the Object class. Specifically, all user functions must use the same signature: Object foo(Object &a, Object &b <possibly more args>); The main point is that Lisp objects are always returned through copied objects. This allows user code to be written without the need to do reference maintenance -- the Object class takes care of reference maintenance automatically. Here's an example of some user code: Object shove_pl(Object &variable, Object &item, Object &alist) { if (!alist) return list(list(variable, list(item))); else if (variable == caar(alist)) return cons(list(variable, append(cadar(alist), list(item))), cdr(alist)); else return cons(car(alist), shove_pl(variable, item, cdr(alist))); } This is a translation of the Common Lisp shove_pl function found in Winston's Lisp book: (defun shove-pl (variable item a-list) (cond ((null a-list) (list (list variable (list item)))) ((equal variable (caar a-list)) (cons (list variable (append (cadar a-list) (list item))) (cdr a-list))) (t (cons (car a-list) (shove-pl variable item (cdr a-list)))))) OK, so this approach allows you to write elegant code. But how expensive is this garbage collection technique (which I'm calling "Immediate" garbage collection since objects are clobbered immediately after the last reference to it is removed) ? I'm not going to get into a big discussion on performance. I will say that I've converted user functions to do their own reference maintenance (they deal with Base *'s rather than Objects) and the performance improvement has ranged from 0 to 3 times. Feel free to call me at 301 805-0463. Roger Sheldon
bpendlet@bambam.dsd.es.com (Bob Pendleton) (04/18/91)
In article <14342@hacgate.UUCP>, roger@eos.eos.scg.hac.com (Roger Sheldon (Loral)) writes: > OK, so this approach allows you to write elegant code. But how > expensive is this garbage collection technique (which I'm calling > "Immediate" garbage collection since objects are clobbered immediately > after the last reference to it is removed) ? It's called reference counting. It's been used since the dawn of time. Reference counting usually uses more total time than traditional garbage collection. The other cost of reference counting is the storage used by the count associated with each object. For cons cells the counts add 2 to 4 bytes to the size of each cell. Usually a cons cell is only 8 bytes to begin with. A cons cell can be as small as 4 bytes if it contains indexes instead of pointers. So a reference count can chew up a lot of memory. In older languages keeping track of the reference count was error prone. But C++ classes let you hide the details and be sure that the reference count is incremented and decrement at the right times. (Some of the folks around here have coffee cups with "How could the reference count be zero?" (Hi Janet) on them. They're left over from a project written in C that used reference counting. It was a famous and much dreaded runtime error message.) The advantage of reference counting is that the program never "goes to lunch" garbage collecting. Reference counting gives the user consistent response times. Which is something that users love. I think that this one benefit more than justifies the use of reference counts. One thing to consider doing; When the reference count goes to zero put the object on a type specific free list. When you need a new object of a given type try to get it from its type specific free list first. This way your not always going through the general purpose heap code. Avoiding the general purpose code can give you a real performance boost. -- Bob Pendleton, speaking only for myself. bpendlet@dsd.es.com or decwrl!esunix!bpendlet or utah-cs!esunix!bpendlet Tools, not rules.
davis@barbes.ilog.fr (Harley Davis) (04/22/91)
In article <1991Apr17.233653.25149@dsd.es.com> bpendlet@bambam.dsd.es.com (Bob Pendleton) writes:
It's called reference counting. It's been used since the dawn of time.
Reference counting usually uses more total time than traditional
garbage collection. The other cost of reference counting is the
storage used by the count associated with each object.
One of the most important problems with reference counting is that it
doesn't collect all the garbage. In particular, circular data
structures never get collected. Consider the following program:
(defvar *some-list* '(a b))
(setf (cddr *some-list*) *some-list*)
(setf *some-list* ())
Now, despite the fact that there is no way to access any of the cells
in *some-list*, all of them have a reference count of 1. Oops.
-- Harley Davis
--
------------------------------------------------------------------------------
nom: Harley Davis ILOG S.A.
net: davis@ilog.fr 2 Avenue Gallie'ni, BP 85
tel: (33 1) 46 63 66 66 94253 Gentilly Cedex, Francegintera@fsd.cpsc.ucalgary.ca (Andrew Ginter) (04/22/91)
In article <DAVIS.91Apr22105508@barbes.ilog.fr> davis@barbes.ilog.fr (Harley Davis) writes: >One of the most important problems with reference counting is that it >doesn't collect all the garbage. In particular, circular data >structures never get collected. This was true of reference counting schemes until 1986. An article published in Software-Practice and Experience that year describes how reference counting systems can reclaim circular lists fairly simply. The exact name and author of the article escape me right now - they're in my other office. Drop me a note if you're desperate and I can dig them up for you. The circular list collection technique goes something like this: * Scan the collected heap. For each object in the heap, chase all pointers in the object and decrement the refcount on the target object. At the end of the scan, all of the contribution of internal pointers to refcount fields has been eliminated. * Scan the heap again. Mark any object with a non-zero refcount field. These are the objects which are referenced by "root pointers" (pointers which reside outside of the heap and which refer to collected objects). * Scan the heap again. Recursively mark any object with a non-zero refcount, incrementing reference counts as you traverse pointers in objects. * Scan the heap again. Any objects in the heap which still has a zero reference count can be reclaimed - these are the objects which were part of unreachable circular lists. I'm writing this from memory, so this may not be exactly right, but it gives the general idea. I suspect that some of the heap scans can be collapsed too, but again, I don't remember the details. Andrew Ginter, 403-220-6320, gintera@cpsc.ucalgary.ca
tmb@ai.mit.edu (Thomas M. Breuel) (04/23/91)
From: davis@barbes.ilog.fr (Harley Davis)
In article <1991Apr17.233653.25149@dsd.es.com> bpendlet@bambam.dsd.es.com (Bob Pendleton) writes:
It's called reference counting. It's been used since the dawn of time.
Reference counting usually uses more total time than traditional
garbage collection. The other cost of reference counting is the
storage used by the count associated with each object.
One of the most important problems with reference counting is that it
doesn't collect all the garbage. In particular, circular data
structures never get collected. [...]
Reference counting is, however, ideal for garbage collecting large,
non-recursive objects like arrays of type real, complex, bit, etc.
Any other kind of garbage collection will have much higher overhead
in this case, and you can never build circular data structures out
of them.hoelzle@neon.Stanford.EDU (Urs Hoelzle) (04/23/91)
tmb@ai.mit.edu (Thomas M. Breuel) writes: > [Others saying that reference counting is slower than other forms > of GC] >Reference counting is, however, ideal for garbage collecting large, >non-recursive objects like arrays of type real, complex, bit, etc. >Any other kind of garbage collection will have much higher overhead ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ >in this case, and you can never build circular data structures out >of them. What is the reason why this must be so? Do you have any numbers to show that reference counting is superior in this case? (Read: I don't believe it.) -- ------------------------------------------------------------------------------ Urs Hoelzle hoelzle@cs.stanford.EDU Center for Integrated Systems, CIS 42, Stanford University, Stanford, CA 94305
marti@mint.inf.ethz.ch (Robert Marti) (04/23/91)
In article <1991Apr23.000632.9175@neon.Stanford.EDU> hoelzle@neon.Stanford.EDU (Urs Hoelzle) writes in response to an earlier message by tmb@ai.mit.edu (Thomas M. Breuel): >>[Any kind of garbage collection other than reference counting] will have >>much higher overhead [in the case of non-recursive data structures]. >What is the reason why this must be so? Do you have any numbers to >show that reference counting is superior in this case? (Read: I don't >believe it.) Neither do I. In fact, I was surprised to find that using the conservative garbage collector due to Hans Boehm and Mark Weiser (and described in Software Practice and Experience in 1988?) instead of doing reference counting myself was _considerably_ faster in a C++ implementation which used many dynamically allocated arrays. If my memory serves me correctly, the running times dropped to about 60% of the original ones. (And, yes, the application ran long enough to incur several garbage collections!) Robert Marti | Phone: +41 1 254 72 60 Institut fur Informationssysteme | FAX: +41 1 262 39 73 ETH-Zentrum | E-Mail: marti@inf.ethz.ch CH-8092 Zurich, Switzerland |
davis@barbes.ilog.fr (Harley Davis) (04/24/91)
In article <1991Apr22.165825.13552@cpsc.ucalgary.ca> gintera@fsd.cpsc.ucalgary.ca (Andrew Ginter) writes: In article <DAVIS.91Apr22105508@barbes.ilog.fr> davis@barbes.ilog.fr (Harley Davis) writes: >One of the most important problems with reference counting is that it >doesn't collect all the garbage. In particular, circular data >structures never get collected. This was true of reference counting schemes until 1986. An article published in Software-Practice and Experience that year describes how reference counting systems can reclaim circular lists fairly simply. The exact name and author of the article escape me right now - they're in my other office. Drop me a note if you're desperate and I can dig them up for you. The circular list collection technique goes something like this: [mark and sweep too] Of course, if you use other gc methods with reference count, the problem can be fixed. But this was obvious way before 1986. I think Danny Bobrow did an implementation of a mixed ref count/full scan gc in InterLisp in the late 70s/early 80s. -- Harley Davis -- ------------------------------------------------------------------------------ nom: Harley Davis ILOG S.A. net: davis@ilog.fr 2 Avenue Gallie'ni, BP 85 tel: (33 1) 46 63 66 66 94253 Gentilly Cedex, France