rick@tripos.com (Rick Moll) (03/08/91)
I have been thinking a lot about data-structures for implementing a Scheme interpreter, and have the following question: There appears to be two fundamentally different approaches to the design of language interpreters. 1) Make all pointers to objects have a type tag. This is typical in lisp and scheme. 2) Make all pointers to objects be in index into an object table that contains the type information. The object's data may be in the object table for small objects, or point to a data-structure elsewhere for large object types. This is typical in SmallTalk. The first approach allows low cost access to the type info. But is more costly to do a compaction after a garbage collection. The second approach is more costly to access the type, but allows compaction with only one pointer being changed since all objects point to other objects only indirectly through the object table. I'm looking for information about this issue. Has anyone studied the tradeoffs involved here for traditional architecture machines (68030, 386, etc)? Thanks in advance for your help, Rick Moll 503 McLain Lane Kirkwood, MO 63122 314/822-4263 uunet!tripos!rick
moss@cs.umass.edu (Eliot Moss) (03/13/91)
In article <1991Mar8.153120.19836@tripos.com> rick@tripos.com (Rick Moll) writes:
I have been thinking a lot about data-structures for implementing
a Scheme interpreter, and have the following question:
There appears to be two fundamentally different approaches to the
design of language interpreters.
1) Make all pointers to objects have a type tag. This is typical
in lisp and scheme.
2) Make all pointers to objects be in index into an object table
that contains the type information. The object's data may be in
the object table for small objects, or point to a data-structure
elsewhere for large object types. This is typical in SmallTalk.
How about what most modern Smalltalk systems do:
3) Make all pointers to object point to the objects, which have a header that
includes the type.
Given Ungar's generation scavenging, etc., there has been a move away from
object tables towards direct pointers to objects in the heap. You still need
pointer/non-pointer tags on the quantities that can be stored in slots
(instance variables, etc.), to distinguish small integers (e.g.) from
pointers, but in our Smalltalk implementation, all pointers objects have an 8
byte header consisting of the class and some packed information about the
format and size of the object. If the object has indexable fields, then there
are two more words: a pointer to those fields (which need not be continuous),
and the number of indexables (i.e., the size; prepended to the indexables).
--
J. Eliot B. Moss, Assistant Professor
Department of Computer and Information Science
Lederle Graduate Research Center
University of Massachusetts
Amherst, MA 01003
(413) 545-4206, 545-1249 (fax); Moss@cs.umass.edu
mikel@Apple.COM (Mikel Evins) (03/14/91)
In article <1991Mar8.153120.19836@tripos.com> rick@tripos.com (Rick Moll) writes: >I have been thinking a lot about data-structures for implementing >a Scheme interpreter, and have the following question: > >There appears to be two fundamentally different approaches to the >design of language interpreters. > >1) Make all pointers to objects have a type tag. This is typical >in lisp and scheme. > >2) Make all pointers to objects be in index into an object table >that contains the type information. The object's data may be in >the object table for small objects, or point to a data-structure >elsewhere for large object types. This is typical in SmallTalk. A third approach is the Big Bag of Pages (BiBoP) arrangement, in which objects of various types are each allocated only in certain ranges of addresses. In this way, an object's type can be determined by examining its address. This scheme somewhat complicates an already somewhat complicated gc algorithm if you plan to use, let us say, generation scavenging, but type determination is fairly cheap.
moss@cs.umass.edu (Eliot Moss) (03/15/91)
I don't think BiBoP would be all that good for Smalltalk, because (unlike LISP), Smalltalk has a very open ended collection of types of objects (users add new classes). If you're just interested in segregating based on low level properties (e.g., bytes objects versus pointers objects versus methods (which mix the two)), it might work, but there are many flavors of each of these. Perhaps the only aspect I would consider is classes treated specially by the interpreter, such as Float. It's not at all clear that the gain in complexity is worth the space savings, or that there is any time performance advantage (and perhaps a disadvantage). -- J. Eliot B. Moss, Assistant Professor Department of Computer and Information Science Lederle Graduate Research Center University of Massachusetts Amherst, MA 01003 (413) 545-4206, 545-1249 (fax); Moss@cs.umass.edu
gjc@paradigm.com (03/15/91)
In article <50211@apple.Apple.COM>, mikel@Apple.COM (Mikel Evins) writes: > > A third approach is the Big Bag of Pages (BiBoP) arrangement, in > which objects of various types are each allocated only in certain > ranges of addresses. In this way, an object's type can be > determined by examining its address. This scheme somewhat complicates > an already somewhat complicated gc algorithm if you plan to > use, let us say, generation scavenging, but type determination > is fairly cheap. No heaper than the smalltalk technique that the poster mentioned. (Because of the extra shift. Although you may need a shift anyway). In fact, as far as types are concerned the smalltalk technique could be considered BIBOP with a page size of 1 word. int bibop_typeof(ptr) LISP ptr; {return(type_table[((long)ptr) >> 9].typeof);} Although specific type checking is usually optimized by having a precooked set of bits, thereby avoiding: (defun numberp (x) (memq (type-of x) '(fixnum bignum flonum)) Instead: LISP numberp(ptr) LISP ptr; {if (type_table[((long)ptr) >> 9].numberp) return(A_TRUE_VALUE); else return(A_FALSE_VALUE);}
mikel@Apple.COM (Mikel Evins) (03/15/91)
In article <3653@paradigm.com> gjc@paradigm.com writes: >In article <50211@apple.Apple.COM>, mikel@Apple.COM (Mikel Evins) writes: >> >> A third approach is the Big Bag of Pages (BiBoP) arrangement, in >> which objects of various types are each allocated only in certain >> ranges of addresses. In this way, an object's type can be >> determined by examining its address. This scheme somewhat complicates >> an already somewhat complicated gc algorithm if you plan to >> use, let us say, generation scavenging, but type determination >> is fairly cheap. > >No heaper than the smalltalk technique that the poster mentioned. >(Because of the extra shift. Although you may need a shift anyway). Sorry, I didn't mean to imply that BiBoP was cheaper, only that it was an, as yet unmentioned, alternative.
oz@yunexus.yorku.ca (Ozan Yigit) (03/18/91)
In article <1991Mar8.153120.19836@tripos.com> rick@tripos.com (Rick Moll) writes: >I'm looking for information about this issue. Has anyone studied >the tradeoffs involved here for traditional architecture machines >(68030, 386, etc)? Rick, you may find the following reference useful. It includes a cost study of low/high-bit tagging and pointer spaces. %A Stan T. Shebs %A Robert R. Kessler %T Automatic Design and Implementation of Language Data Types %J Sigplan 87 Symposium on Interpreters and Interpretive Techniques %I Association for Computing Machinery %C St Paul, Minnesota %P 26-37 %D June 87 %O Published as Sigplan Notices Vol. 22 Number 7, July 87 enjoy. oz --- Not all good things come with three | internet: oz@nexus.yorku.ca pages of dogma and an attitude. - anon | uucp: utzoo/utai!yunexus!oz