[comp.lang.smalltalk] implementing an interpreter

rick@tripos.com (Rick Moll) (03/08/91)

I have been thinking a lot about data-structures for implementing 
a Scheme interpreter, and have the following question:

There appears to be two fundamentally different approaches to the 
design of language interpreters.

1) Make all pointers to objects have a type tag.  This is typical 
in lisp and scheme.

2) Make all pointers to objects be in index into an object table 
that contains the type information.  The object's data may be in 
the object table for small objects, or point to a data-structure 
elsewhere for large object types.  This is typical in SmallTalk.

The first approach allows low cost access to the type info.  But 
is more costly to do a compaction after a garbage collection.

The second approach is more costly to access the type, but allows 
compaction with only one pointer being changed since all objects 
point to other objects only indirectly through the object table.

I'm looking for information about this issue.  Has anyone studied 
the tradeoffs involved here for traditional architecture machines 
(68030, 386, etc)?

Thanks in advance for your help,

    Rick Moll
    503 McLain Lane
    Kirkwood, MO 63122
    314/822-4263

    uunet!tripos!rick

moss@cs.umass.edu (Eliot Moss) (03/13/91)

In article <1991Mar8.153120.19836@tripos.com> rick@tripos.com (Rick Moll) writes:

   I have been thinking a lot about data-structures for implementing 
   a Scheme interpreter, and have the following question:

   There appears to be two fundamentally different approaches to the 
   design of language interpreters.

   1) Make all pointers to objects have a type tag.  This is typical 
   in lisp and scheme.

   2) Make all pointers to objects be in index into an object table 
   that contains the type information.  The object's data may be in 
   the object table for small objects, or point to a data-structure 
   elsewhere for large object types.  This is typical in SmallTalk.

How about what most modern Smalltalk systems do:

3) Make all pointers to object point to the objects, which have a header that
includes the type.

Given Ungar's generation scavenging, etc., there has been a move away from
object tables towards direct pointers to objects in the heap. You still need
pointer/non-pointer tags on the quantities that can be stored in slots
(instance variables, etc.), to distinguish small integers (e.g.) from
pointers, but in our Smalltalk implementation, all pointers objects have an 8
byte header consisting of the class and some packed information about the
format and size of the object. If the object has indexable fields, then there
are two more words: a pointer to those fields (which need not be continuous),
and the number of indexables (i.e., the size; prepended to the indexables).
--

		J. Eliot B. Moss, Assistant Professor
		Department of Computer and Information Science
		Lederle Graduate Research Center
		University of Massachusetts
		Amherst, MA  01003
		(413) 545-4206, 545-1249 (fax); Moss@cs.umass.edu

mikel@Apple.COM (Mikel Evins) (03/14/91)

In article <1991Mar8.153120.19836@tripos.com> rick@tripos.com (Rick Moll) writes:
>I have been thinking a lot about data-structures for implementing 
>a Scheme interpreter, and have the following question:
>
>There appears to be two fundamentally different approaches to the 
>design of language interpreters.
>
>1) Make all pointers to objects have a type tag.  This is typical 
>in lisp and scheme.
>
>2) Make all pointers to objects be in index into an object table 
>that contains the type information.  The object's data may be in 
>the object table for small objects, or point to a data-structure 
>elsewhere for large object types.  This is typical in SmallTalk.

A third approach is the Big Bag of Pages (BiBoP) arrangement, in
which objects of various types are each allocated only in certain
ranges of addresses. In this way, an object's type can be
determined by examining its address. This scheme somewhat complicates
an already somewhat complicated gc algorithm if you plan to
use, let us say, generation scavenging, but type determination
is fairly cheap.

moss@cs.umass.edu (Eliot Moss) (03/15/91)

I don't think BiBoP would be all that good for Smalltalk, because (unlike
LISP), Smalltalk has a very open ended collection of types of objects (users
add new classes). If you're just interested in segregating based on low level
properties (e.g., bytes objects versus pointers objects versus methods (which
mix the two)), it might work, but there are many flavors of each of these.
Perhaps the only aspect I would consider is classes treated specially by the
interpreter, such as Float. It's not at all clear that the gain in complexity
is worth the space savings, or that there is any time performance advantage
(and perhaps a disadvantage).
--

		J. Eliot B. Moss, Assistant Professor
		Department of Computer and Information Science
		Lederle Graduate Research Center
		University of Massachusetts
		Amherst, MA  01003
		(413) 545-4206, 545-1249 (fax); Moss@cs.umass.edu

gjc@paradigm.com (03/15/91)

In article <50211@apple.Apple.COM>, mikel@Apple.COM (Mikel Evins) writes:
> 
> A third approach is the Big Bag of Pages (BiBoP) arrangement, in
> which objects of various types are each allocated only in certain
> ranges of addresses. In this way, an object's type can be
> determined by examining its address. This scheme somewhat complicates
> an already somewhat complicated gc algorithm if you plan to
> use, let us say, generation scavenging, but type determination
> is fairly cheap.

No heaper than the smalltalk technique that the poster mentioned.
(Because of the extra shift. Although you may need a shift anyway).

In fact, as far as types are concerned the smalltalk technique
could be considered BIBOP with a page size of 1 word.

int bibop_typeof(ptr)
 LISP ptr;
{return(type_table[((long)ptr) >> 9].typeof);}

Although specific type checking is usually optimized by having a precooked
set of bits, thereby avoiding:

(defun numberp (x) (memq (type-of x) '(fixnum bignum flonum))

Instead:

LISP numberp(ptr)
 LISP ptr;
{if (type_table[((long)ptr) >> 9].numberp)
  return(A_TRUE_VALUE);
 else
  return(A_FALSE_VALUE);}

mikel@Apple.COM (Mikel Evins) (03/15/91)

In article <3653@paradigm.com> gjc@paradigm.com writes:
>In article <50211@apple.Apple.COM>, mikel@Apple.COM (Mikel Evins) writes:
>> 
>> A third approach is the Big Bag of Pages (BiBoP) arrangement, in
>> which objects of various types are each allocated only in certain
>> ranges of addresses. In this way, an object's type can be
>> determined by examining its address. This scheme somewhat complicates
>> an already somewhat complicated gc algorithm if you plan to
>> use, let us say, generation scavenging, but type determination
>> is fairly cheap.
>
>No heaper than the smalltalk technique that the poster mentioned.
>(Because of the extra shift. Although you may need a shift anyway).

Sorry, I didn't mean to imply that BiBoP was cheaper, only that it
was an, as yet unmentioned, alternative.

oz@yunexus.yorku.ca (Ozan Yigit) (03/18/91)

In article <1991Mar8.153120.19836@tripos.com> rick@tripos.com
(Rick Moll) writes:

>I'm looking for information about this issue.  Has anyone studied 
>the tradeoffs involved here for traditional architecture machines 
>(68030, 386, etc)?

Rick, you may find the following reference useful. It includes a cost
study of low/high-bit tagging and pointer spaces.

%A Stan T. Shebs
%A Robert R. Kessler
%T Automatic Design and Implementation of Language Data Types
%J Sigplan 87 Symposium on Interpreters and Interpretive Techniques
%I Association for Computing Machinery
%C St Paul, Minnesota
%P 26-37
%D June 87
%O Published as Sigplan Notices Vol. 22 Number 7, July 87

enjoy.	oz

---
Not all good things come with three	 | internet: oz@nexus.yorku.ca
pages of dogma and an attitude. - anon   | uucp: utzoo/utai!yunexus!oz