fritz@polecat.caltech.edu (Fritz Nordby) (05/22/87)
I've received a number of requests for information regarding my posting about object-oriented systems where I mentioned a machine called the Caltech Object Machine (or, to us cognoscenti, the COM). Since I don't want to work on the project just at the moment, and since I'd rather postpone the next round of (for the net, unusually polite, but still inevitable) "my RISC's better than your RISC," I thought I'd try to stir up something by way of a discussion on new, different, and even radical architectures by posting something about the machine we're working on. First, background: There has only been one paper published on the COM: W.J. Dally & J.T. Kajiya, ``An Object Oriented Architecture,'' Proceedings of the 12th Annual International Symposium on Computer Architecture 1985, pp.154-161. This paper describes the original ideas in the machine; however, since I took over the project certain aspects of the architecture and many of the ideas about its implementation have evolved to the point of being almost unrecognizable compared to the original. We (F.Nordby & J.Kajiya) have submitted a paper to this year's OOPSLA conference covering the present form of the architecture including how and why it has changed. Now to some meat. In many regards, the COM appears to the user as a traditional 3-address register-register load-store machine. Operands are limited to registers, small integer constants, and constants taken from a portion of the code space. There are hardware instructions for integer arithmetic and logic operations, etc. just like a conventional machines. One notable lack, however, is that there is no (explicit) subroutine call instruction. This last is critical. The first (and perhaps most striking) difference between the COM and traditional machine architectures is in instruction interpretation. In traditional systems, the association between opcode and instruction semantics is fixed at design time: there are different instructions for integer addition and floating point addition, and doing a matrix or bignum addition requires explicit invocation of a precise subroutine to perform the operation in question. Programming languages and systems which are late binding (a la Smalltalk-80, APL, etc.) have large performance costs because of this static opcode-operation binding. The COM takes a different approach. Each instruction in the COM is an "abstract instruction": it includes an opcode and specification of three operands (typically two inputs and one output), but the operation alone doesn't solely determine the function to be performed. Instead, it is the combination of the opcode and the operand types (or, in Smalltalk- lingo, the receiver and argument classes) which determine how to execute the instruction. If the combination is recognized as being primitive (e.g., adding two SmallIntegers), then the appropriate operation is performed, and the next instruction is started; otherwise, a subroutine call (method call to all you Smalltalk fans) is performed to a code block determined by an associative lookup on the opcode and receiver type (er, class). In terms of implementation, our present design is quite simple and rather conservative: it is essentially a traditional data path (made with byte-slice ALUs), a couple of stack caches (that's right, two stacks), some main memory and one cache (to do method lookups when appropriate). The result is a machine that executes Smalltalk-80 code at a rate competitive with a Sun-3/160 running compiled C code (about 0.8*Sun for the sieve of Eratosthenes, which does no calls; 2.0*Sun for Ackermann's function, which is call/return intensive). A more advanced implementation could achieve much higher performance. Techniques such as distributing instruction decode, returning result futures, register flagging and scoreboarding, etc. are quite applicable to this architecture. Even more simple, interleaving memory would provide a significant performance benefit over our present system, as would the use of instruction and data caches. More interesting than the raw performance (at least, more interesting to me) is that, while the COM can execute instructions at a similar rate to other (more traditional) machines, the COM is actually doing more during the execution of those instructions. For example, the Ackermann's benchmark, running the same code, would still work (albeit a bit more slowly) if the result were a bignum rather than a SmallInteger. Factorial code on the COM will work smoothly and gracefully if wether evaluating 10! or 1000! because the machine smoothly and gracefully handles the case of overflow (in particular, if an integer multiplication (for example) would overflow the range of SmallInteger, the instruction will be "unrecognized" by the hardware, and will revert to a software backup routine which can promote the result to a bignum). Thus, for a quite similar investment of hardware, the inherent instruction-level polymorphism of the COM allows each instruction to do a great deal more work than an instruction on a traditional machine. Fritz Nordby. fritz@vlsi.caltech.edu cit-vax!fritz