dean@sun.soe.clarkson.edu (Dean Swan) (03/12/91)
I'm working on an implementation of Smalltalk, designed to work with large disk based images, and run on machines with limited primary memory (i.e. 640k MS-DOS machines, and the like). I will probably base my class hierarchy on that delivered with GNU Smalltalk, which means that my implementation will ultimately end up as "free software". My implementation is most likely to be completely interpreted, except for a minimal set of primitives, due to the nature of my storage manager, which uses no object pointer table, and is not based on absolute pointers either. My storage manager *does* however, support incremental garbage collection, and fast object lookups without using generation scavenging, or reference counts. It also incorporates image compression automatically, so redundant chunks of memory are minimized. The virtual machine "runs" the compressed image, not an uncompressed copy. My motivation for this project: dBase sucks. I'm intending to use the finished product to produce large information systems, based on a large, persistent object space. Now, on to my question: Could someone be so kind as to explain the workings of the Behavior, Class, MetaClass hierarchy? I've read the Blue Book, traced through Smalltalk/V, and GNU Smalltalk, and I'm still thouroghly confused. As I understand it, each instance of MetaClass corresponds to a 'real' Class, but in Smalltalk/V, Class has no instances, and no subclasses, so how are it's instance methods ever reached? Behavior makes sense in that it just defines a common protocol for Class and MetaClass. Question #2: What is the purpose of this convoluted Behavior, Class, MetaClass thing? What does it do for you? I've got my storage manager designed, but the design is incomplete at this point, because some of it's operation must be dependent on the structure of the virtual image. I'd appreciate any and all help that anyone can offrer on this problem. E-mail replies would be preferred, and I can post a summary. The sooner I understand this, the sooner I can finish this thing, start using it, and give it to people. -Dean Swan Independent Software Consultant dean@sun.soe.clarkson.edu Tel: (315) 265-2679
sbb@laplace.eng.sun.com (Steve Byrne) (03/12/91)
In article <9103111620.AA01484@sun.soe.clarkson.edu> dean@sun.soe.clarkson.edu (Dean Swan) writes:
Path: exodus!newstop!sun-barr!cs.utexas.edu!usc!rpi!clarkson!sun.soe.clarkson.edu!dean
From: dean@sun.soe.clarkson.edu (Dean Swan)
Newsgroups: comp.lang.smalltalk
Date: 11 Mar 91 16:20:45 GMT
Lines: 45
I'm working on an implementation of Smalltalk, designed to work with large
disk based images, and run on machines with limited primary memory (i.e.
640k MS-DOS machines, and the like).
I will probably base my class hierarchy on that delivered with GNU Smalltalk,
which means that my implementation will ultimately end up as "free software".
Wonderful!!!
...
Now, on to my question: Could someone be so kind as to explain the workings
of the Behavior, Class, MetaClass hierarchy? I've read the Blue Book, traced
through Smalltalk/V, and GNU Smalltalk, and I'm still thouroghly confused.
As I understand it, each instance of MetaClass corresponds to a 'real' Class,
but in Smalltalk/V, Class has no instances, and no subclasses, so how are
it's instance methods ever reached? Behavior makes sense in that it just
defines a common protocol for Class and MetaClass.
Stare at the complex complete class (and metaclass) diagram in the blue book.
It takes time to figure out how things are arranged. At least it took me a
while before I was comfortable with it.
Question #2: What is the purpose of this convoluted Behavior, Class,
MetaClass thing? What does it do for you?
It makes the entire class system well defined in the sense that by exploring
the class hierarchy you won't get to a point at which you get a non-object as a
return value from a method. There are other ways to design the class hierarhcy
so that this is not required, but this is the "purest" in many respects.
Some places the class and parent class hierarchys are deliberately looped back,
such as Object class ==> Class; this is an example of this closing of the
inheritance hierarchy (here "closed" is used in the algebra sense of "closed
under addition").
I know it's confusing...but if you stare at the diagram and read the rules (1-7
just before the complex diagram) long enough, it will begin to make sense.
It may also be helpful to think about how you'd set things up differently ...
if you think things through well enough, you'll see the points at which it's
logical to do the things that the Smalltalk designers did.
Steve
dean@sun.soe.clarkson.edu (Dean Swan) (03/12/91)
> >My implementation is most likely to be completely interpreted, except for a > >minimal set of primitives, due to the nature of my storage manager, which > >uses no object pointer table, and is not based on absolute pointers either. > >My storage manager *does* however, support incremental garbage collection, > >and fast object lookups without using generation scavenging, or reference > >counts. It also incorporates image compression automatically, so > >redundant chunks of memory are minimized. The virtual machine "runs" the > >compressed image, not an uncompressed copy. > > > > A very impressive-sounding scheme. I'd be interested in knowing how > that works. This would seem to be reversing the standard tradeoff > (using space to get speed) in order to fit into limited memory, but is > then likely to suffer from very slow speeds, both from disk accesses > and from dealing with compressed data. > The scheme sounds like it would be of value even for "normal" > machines and such technology would be a good thing to feed back into > GNU Smalltalk. How it works? Hmm. Where to start? Ok... My basic model for memory is a large array of what I call 'associations'. All objects in my system are assigned a 32 bit unsigned number when they are created. An 'association' is an ordered pair of object numbers, a 'from' object and a 'to' object. This is the heart of the system. To build a complex object, you simply create as many associations as you need, where the 'from' object is the same for each association, and the 'to' object is the object number of a component object of the complex object you are building. I impose a few rules to make this scheme practical. First, memory is a sorted array of associations sorted by 'from', then by 'to'. The object number FFFFFFFF (hex) is used to denote the NULL object, and it is ignored by the sort algorithm for memory. This way, you can delete an object by blanking both the 'from' and 'to' fields to NULL, and leave a hole in memory that will get cleaned up by the incremental garbage collector. Obviously, some other object numbers will have to be reserved for things like primitive methods, character constants, small integers, and the like. The second rule is that only the highest numbered object in the system can grow (i.e. add more associations to the object). This is not a big problem since all newly created objects are the highest numbered object when they are created, and for the few other objects that need to get bigger dynamically, they will be 'translated' to the highest number, by creating a new object, copying the 'to' fields of the old object into the new object and fixing up references to the old object, then deleteing the old object by filling it with NULLs. To speed up the fix of references to the old object number, every object will maintain a Dependents collection that can be referred to ONLY by it's owner object. This is also not a problem Smalltalk/V does this anyway in the global dictionary 'Dependents'. Also, to avoid wild amounts of object tranlation, every class will have an 'initialSize', and a 'growSize'. For fixed size objects (which most objects are) the 'growSize' would be 0. For variable sized objects, the 'growSize' can be tuned based on profiles of the running system to optimize performance. Unused associations that are created in a 'grow' operation will have their 'to' field set to NULL. The garbage collector will also incrementally translate objects down to the lowest unused object number, so you don't run out of object numbers. Now regarding the disk read time problem, main memory of the target machine will be used as a dynamic object cache. Ideally I'd like to implement an 'association' cache rather than an object cache, so the system can deal with objects that are larger than main memory. One of my potential applications would be working with digitally sampled sounds, which could easily be very large. As to the speed of dealing with a compressed image, I expect that the compression will help rather that hurt performance. I think I/O is the real bottleneck on most machines, so the fewer bytes you have to deal with, the faster it should run. My compression scheme is based on the idea that, for example, text is composed of paragraphs, which are composed of sentances, which are composed of phrases, which are composed of words, which are composed of syllables, which are composed of syllables, which are composed of characters. The idea is that you build a tree of characters, where each leaf node represents a syllable, then you build a tree of syllables where each leaf node represents a word, then a tree of words, where each leaf is the first word of a phrase, and so on. To add a paragraph to a document, you just add a single 'association' to the document object. To compare two words, you need only compare the 'to' field of two associations. Since the each word will exist exactly once, if the object numbers are identical, the two objects being compared point to the same word. This concept can be extrapolated to encompass any kind of complex object. Comparisons for greater-than and less-than would be a little more involved, but if you keep each level of the trees sorted, then it would be just a comparison of two pointers again. Keeping the trees sorted might cost more than it's worth though. I don't know yet. It would make 'read' operations much faster, but would slow down object creation quite a bit, I suspect. Object lookups will be performed by a modified binary search on the memory image. (The array of 'associations', of course exists as a large disk file). Given that the average object is small, and that most objects don't change size, I don't think that the restriction of memory space being sorted will pose a big problem. Most objects will be sorted as they are created, and remain that way. The *MAJOR* advantage of this whole scheme is that object numbers have no direct correlation, whatsoever, to physical addresses. This means that as long as you can live with the limitations of 4 billion or so objects, the image can grow as large as disk space allows, and you don't have to deal with lots of messy pointers that are 'married' in some way to physical addresses. I don't have any real numbers yet, but if an average system has an image size of say 8 Meg, you'll never have to probe the image more than 23 times to find the object you need, and the binary search algorithm could be optimized by caching physical adresses of recently used objects to narrow down the search. I'd like to think that the real base image will be well under 1 Meg, with the source code included, so it could concievably even run off a high density floppy. If the caching scheme is done right, this could work well. Another thought for speed optimization would be 'expression caching'. Numbers, for example are immutable objects, so if you perform a multiplication of two numbers, (especially Floats or LargeIntegers), as long as you have space in the image (i.e. free disk space) you could cache expressions and perform lookups instead of recomputing things. This would greatly speed up operations like Fourier Transforms, and the like. This expression cacheing could all be implemented in Smalltalk, and just have the garbage collector generate an interrupt so that Smalltalk code could purge the expression cache when the image starts to get 'full'. Because of the built in compression scheme, each number that is in use by the system will exist in the image exactly once, so expression caching shouldn't cause the image to grow too much. Other expression types besides numeric computations could also be similarly cached. > >My motivation for this project: dBase sucks. I'm intending to use the > >finished product to produce large information systems, based on a large, > >persistent object space. > > Don't forget hypercard/form generator like tools. A nice class > library for that sort of thing would be very useful. Definately! Forms are very common in everyday business applications, and my implementation of Smalltalk must include a real development environment, and user interface tools. > >Now, on to my question: Could someone be so kind as to explain the workings > >of the Behavior, Class, MetaClass hierarchy? I've read the Blue Book, traced > >through Smalltalk/V, and GNU Smalltalk, and I'm still thouroghly confused. > > The superclass of an Object is nil, BUT the superclass of the class > Object is Class. Thus class methods go further up the chain than > object, and finally reach Behavior, where 'new' is defined. Ok, this makes sense. Not obvious though, even when looking at a running system. > Note: Class and Metaclass have lots of subclasses, they're just > invisible. How is it that they are invisible? > What this gets you is reflectiveness. I.E. the way that the entire > One might also consider that metaclasses must be objects and members > of some class. This could be a metametaclass, but in fact here the > system twists around on itself and a metaclass is an instance (and a > subclass) of Metaclass. (I think, even I'm getting confused now). So basically this is all just to avoid infinite recursion, yes? > Anyway, the best thing to do is avoid thinking about these things > whenever possible. Agreed, but unfortunately, I have to think about it for a while, at least until I get an initial image built and working. I'm still a little confused about the actual mechanism, but at least now I know why it's there. > Alan Knight knight@mrco.carleton.ca Outlaw > Dept. of Mechanical and Aeronautical Engineering Software > Carleton University, Ottawa, Ontario, Canada, K1S 5B6 Patents -Dean Swan Independent Software Consultant dean@sun.soe.clarkson.edu Tel: (315) 265-2679
johnson@cs.uiuc.EDU (Ralph Johnson) (03/13/91)
Here are the rules that I use to understand the subclasses of Behavior. 1) Everything in Smalltalk is an object. 2) Every object is a class. Obviously, classes must be objects, too. Therefore, they must have classes. So, what is the class of Array, or View, or Object? There are two possibilities: all classes can be instances of a single class Class, or different classes can have different subclasses. In order to reduce confusion, I'll call the class of a class a metaclass. Thus, I'll rephrase this as: there is only one metaclass, or different classes can have different metaclasses. It is a lot simpler if all classes had the same class, i.e. if there was only one metaclass, but that is not the way Smalltalk did it. If it had, Class would be the only metaclass, and in fact it wouldn't even be worth calling it a metaclass. If it had, Smalltalk wouldn't have class methods, and class methods are useful (though not necessary). 3) Every (nonmetaclass) class has its own metaclass. Smalltalk not only lets every class have its own metaclass, it forces each class to have its own metaclass. Metaclasses don't have their own names, thus the name of the class of Array is 'Array class'. All metaclasses are instances of Metaclass, thus 'Array class class' is Metaclass. Pretty boring. Thus, class can be divided into regular classes, which each have their own metaclasses, and metaclasses, which each are instances of Metaclass. 4) The metaclass of a superclass is the superclass of a metaclass. This seems pretty confusing when you say it in English, but it makes a lot of sense in a picture. Basically, it means that since Set is a subclass of Collection, Set class is a subclass of Collection class. There is a general rule that 'If X is a Y then the class of X is a subclass of the class of Y or one of its subclasses'. Thus, since Array is a class then the class of Array is a subclass of Class or one of its subclasses. By the same logic, the class of Object is a subclass of Class or one of its subclasses. We know that Object has no superclasses. However, we just said that Object class is a subclass of Class. Thus, we might as well make Object class be a direct subclass of Class. This results in some funny code, since the methods you see in the 'instance' side of the browser on Class get inherited by the 'class' side of the browser on all classes (including Class!). One of the interesting things about the class/metaclass notion is that it is completely invisible to the interpreter. As far as the interpreter is concerned, only Behavior (the common superclass of Class and Metaclass) is real. The only thing the interpreter does with classes is look up methods in their method dictionary, and any kind of Behavior does that just fine. Thus, the class/metaclass notion is defined entirely by the programming environment of Smalltalk, and not the interpreter. Thus, if you don't like it, you could change it! (Good luck!!!!) Ralph Johnson -- University of Illinois at Urbana-Champaign
briot@pollux.usc.edu (Jean-Pierre Briot) (03/14/91)
After the nice and concise explanation (*) of Ralph about the metaclass architecture of Smalltalk, I'd just like to add some pointers to alternative designs. ("Thus if you don't like it, you could change it! (Good luck!)" [Ralph #2846].) (*: this is indeed not easy to explain reflexive systems within a single pass, just as to implement them:-) One alternative is the DeltaTalk design, which basically backtracks to Smalltalk-76 architecture with a UNIQUE metaclass, named Class. The resulting architecture is definitely SIMPLER, BUT there are no more class methods (actually there are but they are shared by all classes), and other (less famous: see below) possibilities of metaclasses. (LittleSmalltalk is even further simplified having none metaclass if I remember well. My bibliography is out of close hand at that time). An Empirically and Aesthetically Motivated Simplification of the Smalltalk-80 Language, A. Borning and T. O'Shea, ECOOP'87, Lecture Notes in Computer Science, No 276, pages 1-10, Springer Verlag, June 1987. In an opposite direction, the ClassTalk design, keeps metaclasses, but makes them being explicit (and not implicit as in Smalltalk). It is also based on one unique PRIMITIVE metaclass, named Class, like in DeltaTalk, BUT it is EXTENSIBLE (i.e., add new metaclasses). The key is the design of Class as a self-descriptive (and consequently instance of itself) class. This avoids the structural distinction between metaclasses (instances of Metaclass) and other classes which leads to the Class/Metaclass duality and complexity/opacity of the kernel. ClassTalk has been implemented as a simulation in Smalltalk-80. Historically it is a transposition of the original explicit metaclasses architecture of ObjVlisp (ref. in the paper, also more ref. about so-called reflective class-based systems, such as CLOS, and the "advanced" uses of metaclasses). This architecture is both simple and extensible. However having explicit metaclasses leads to some slight incompatibilities with the standard modeling/organization of class lattices in Smalltalk-80 (further discussed in the paper). Programming with ObjVlisp Metaclasses in Smalltalk-80, J.-P. Briot and P. Cointe, OOPSLA'89, Special Issue of Sigplan Notices, Vol. 24, No 10, pages 419-432, ACM, October 1989. I personally believe that although there were good reasons for the current Smalltalk-80 metaclass architecture, this design does improve it. But the standard Smalltalk-80 class lattice then needs some kind of reorganization for a better compatibility and to fully use the new possibilities. (This is the experience we got from our simulation of the architecture in Smalltalk-80). Thus it depends how you want to keep the full compatibility with standard Smalltalk-80, and reuse the class lattice as it is, or to propose a new improved system on many respects. Through your description, it seems that your focus is on different points (management of objects). On the other hand, using (various) metaclasses to control the implementation of objects may be useful (ref. on use of metaclasses in CLOS in the paper). I don't want to go into more details in this followup to keep it short (failed already!-), please see in the papers. (They should be easy to access to). Good luck for your project, Dean. Sounds great! Have a good reading. Jean-Pierre Briot LITP University Pierre et Marie Curie Paris, France briot@litp.ibp.fr currently visiting: Dept. of Computer Science University of Southern California L.A., CA briot@pollux.usc.edu