[comp.arch] The COM -- an object-oriented architecture

fritz@polecat.caltech.edu (Fritz Nordby) (05/22/87)

I've received a number of requests for information regarding my posting
about object-oriented systems where I mentioned a machine called the
Caltech Object Machine (or, to us cognoscenti, the COM).  Since I don't
want to work on the project just at the moment, and since I'd rather
postpone the next round of (for the net, unusually polite, but still
inevitable) "my RISC's better than your RISC," I thought I'd try to
stir up something by way of a discussion on new, different, and even
radical architectures by posting something about the machine we're
working on.

First, background: There has only been one paper published on the COM:
	W.J. Dally & J.T. Kajiya, ``An Object Oriented Architecture,''
		Proceedings of the 12th Annual International Symposium
		on Computer Architecture 1985, pp.154-161.
This paper describes the original ideas in the machine; however, since
I took over the project certain aspects of the architecture and many
of the ideas about its implementation have evolved to the point of
being almost unrecognizable compared to the original.  We (F.Nordby &
J.Kajiya) have submitted a paper to this year's OOPSLA conference
covering the present form of the architecture including how and why
it has changed.

Now to some meat.

In many regards, the COM appears to the user as a traditional 3-address
register-register load-store machine.  Operands are limited to registers,
small integer constants, and constants taken from a portion of the code
space.  There are hardware instructions for integer arithmetic and logic
operations, etc. just like a conventional machines.  One notable lack,
however, is that there is no (explicit) subroutine call instruction.

This last is critical.  The first (and perhaps most striking) difference
between the COM and traditional machine architectures is in instruction
interpretation.  In traditional systems, the association between opcode
and instruction semantics is fixed at design time: there are different
instructions for integer addition and floating point addition, and doing
a matrix or bignum addition requires explicit invocation of a precise
subroutine to perform the operation in question.  Programming languages
and systems which are late binding (a la Smalltalk-80, APL, etc.) have
large performance costs because of this static opcode-operation binding.

The COM takes a different approach.  Each instruction in the COM is an
"abstract instruction": it includes an opcode and specification of three
operands (typically two inputs and one output), but the operation alone
doesn't solely determine the function to be performed.  Instead, it is
the combination of the opcode and the operand types (or, in Smalltalk-
lingo, the receiver and argument classes) which determine how to execute
the instruction.  If the combination is recognized as being primitive
(e.g., adding two SmallIntegers), then the appropriate operation is
performed, and the next instruction is started; otherwise, a subroutine
call (method call to all you Smalltalk fans) is performed to a code
block determined by an associative lookup on the opcode and receiver
type (er, class).

In terms of implementation, our present design is quite simple and
rather conservative: it is essentially a traditional data path (made
with byte-slice ALUs), a couple of stack caches (that's right, two
stacks), some main memory and one cache (to do method lookups when
appropriate).  The result is a machine that executes Smalltalk-80
code at a rate competitive with a Sun-3/160 running compiled C code
(about 0.8*Sun for the sieve of Eratosthenes, which does no calls;
2.0*Sun for Ackermann's function, which is call/return intensive).

A more advanced implementation could achieve much higher performance.
Techniques such as distributing instruction decode, returning result
futures, register flagging and scoreboarding, etc. are quite applicable
to this architecture.  Even more simple, interleaving memory would
provide a significant performance benefit over our present system,
as would the use of instruction and data caches.

More interesting than the raw performance (at least, more interesting
to me) is that, while the COM can execute instructions at a similar
rate to other (more traditional) machines, the COM is actually doing
more during the execution of those instructions.  For example, the
Ackermann's benchmark, running the same code, would still work (albeit a
bit more slowly) if the result were a bignum rather than a SmallInteger.
Factorial code on the COM will work smoothly and gracefully if wether
evaluating 10! or 1000! because the machine smoothly and gracefully
handles the case of overflow (in particular, if an integer multiplication
(for example) would overflow the range of SmallInteger, the instruction
will be "unrecognized" by the hardware, and will revert to a software
backup routine which can promote the result to a bignum).  Thus, for
a quite similar investment of hardware, the inherent instruction-level
polymorphism of the COM allows each instruction to do a great deal more
work than an instruction on a traditional machine.

		Fritz Nordby.	fritz@vlsi.caltech.edu	cit-vax!fritz