[comp.lang.smalltalk] The Caltech Object Machine

fritz@polecat.caltech.edu (Fritz "3 points" Nordby) (08/27/87)

In article <1061@pdn.UUCP> ken@pdn.UUCP (Ken Auer) writes:
>The other question is whether anyone is working on a commercial
>workstation based on the COM (Caltech Object Machine) architecture on 
>which to run Smalltalk-80.  There was an article about the machine in 
>comp.arch in May by Fritz Nordby which really impressed me.  Some of 
>the highlights of the article included:
[excerpt deleted for brevity]
>Perhaps Fritz will get psyched and repost the article for us
>Smalltalk-types, and give us an update.

Well, I'd rather not repost the whole thing, I'd rather post something
interesting.  Failing that ...

Some background:

The COM developed out of a language-directed architecture project that
a few people started working on here a few years ago.  Originally the
machine was being designed for a different language (not Smalltalk-80,
but a stack-based object-oriented language); by the time I got into the
project, though, the machine was intended to be for running Smalltalk.

There's been one paper published:
	W.J.Dally & J.T.Kajiya, ``An Object Oriented Architecture,''
		Proceedings of the 12th Annual International Symposium
		on Computer Architecture 1985, pp.154-161.
That paper was written before I took over the project, and many of the
architectural features it includes have been eliminated from the COM
(things like floating-point addresses), and others have been changed
to the point of unrecognizability (like the procedure linkage mechanism).

An overview of the architecture:

Perhaps the most basic idea in the COM architecture is that machine
instructions should provide the same late-bound polymorphism that
Smalltalk expressions provide.  COM instructions look like instructions
for most other 3-address, register-register machines: "ADD r1,r2->r3"
for example.  The difference is that in the COM there's only one "ADD"
instruction (not one for integers, and one for floats, and one for
double precision floats, etc.).  The action taken by the machine for
such an instruction depends on the types of the instruction's operands.

To support this idea, the machine must provide two basic facilities:
first, the machine must provide a simple facility for determining
instruction operand types at instruction execution time; and second,
the machine must provide a simple facility for handling exceptions
and traps arising from either unknown operand type combinations or
from some condition encountered in executing an instruction.

The first of these is straightforward enough: each word of COM memory
is tagged according to the type of data it contains.  This extends
to tagging pointers (Smalltalk OOPs) according to the type of object
they point to.  (Note: the decision to put the class of the object
in the pointer tag was made for performance reasons; it is slightly
more flexible to put the class in the object table or in an object
header, but it takes much longer to access the information.  The
only case when the class of an object changes is when a "become:"
is done between objects of different classes, which is fairly rare;
all other cases ("become:" between objects of the same class, growing
an object, etc.) are still handled fairly easily.)  During instruction
execution, the tags are fetched along with the data, and are used to
determine how the instruction should be executed.

The second facility required to support late-binding instruction-level
polymorphism as provided by the COM is a fast, simple, yet flexible
execption- and trap-handling mechanism.  If an instruction fails for
any reason, the COM does a method call.  A new context is allocated,
and the operands of the failing instruction are saved there; then
a method lookup is performed, and the resulting block of code is called
in the new context.  When the called method returns, the value it gives
is stored in the destination of the failing instruction.  In order to
keep method invocation time to a minimum, the machine keeps a cache
of method call translations; if the desired translation cannot be found
there, a backup method is called which typically does the full lookup
and method call needed.

Performance estimates:

We've evaluated the performance in two basic ways.  First, we took the
worst-case execution times for the COM instruction set.  Then we looked
at the Smalltalk-80 Virtual Machine bytecode instruction set, and we
made what we felt would be an appropriate translation of bytecodes into
COM instructions.  (For example, we eliminated all the "push temporary"
bytecodes, since the COM doesn't take any extra time to access temporary
variables with its 3-address instructions; for the "add" bytecode, we
took the translation to be the COM "ADD" instruction; etc..)  Finally,
based on the bytecode frequencies from the analysis at HP, given in the
Green Book (the one by Krasner, not the one by Kaddhafi! -- see the
reference below), we calculated an equivalent bytecode rate for the COM.
With a 75ns clock and a single-bank, non-interleaved memory, the COM
should show an equivalent bytecode rate of 3.3 Mbytecodes/sec.  Improving
the memory system should raise this above 4 Mbytecodes/sec.
(The reference is:
	J.Falcone, ``The Analysis of the Smalltalk-80 System at Hewlett-
		Packard,'' in _Smalltalk-80, Bits of History, Words of
		Advice_, G.Krasner (ed.), Addison-Wesley, 1983.)

The second evaluation of the COM is a bit unorthodox, at least in the
Smalltalk world.  We took two traditional benchmarks, the sieve of
Eratosthenes and Ackermann's function, and we evaluated the performance
of the COM for these problems.  We didn't use the standard Smalltalk
benchmarks because (1) the microbenchmarks are too specific to the
Smalltalk-80 Virtual Machine, and (2) the macrobenchmarks require more
of the system than we've been able to bring up yet (although I'm working
on that).  Also, we realized that the COM is fast, and we want to compare
its performance for Smalltalk to other machines performances for languages
like C, a comparison that the standard Smalltalk benchmarks don't easily
allow.

The two benchmarks are very different.  The sieve does no procedure calls
(and the translation into Smalltalk was done with some care to avoid any
non-primitive methods, and to do flow control as conditional branches);
its execution is dominated by memory access, simple integer arithmetic,
and simple conditional flow control.  Ackermann's function, on the other
hand, is doubly recursive: evaluating (3 ackermann: 6) on the COM involves
172233 method calls in 1462223 instructions (one call and one return every
8.5 instructions).  Execution of the Ackermann's function benchmark measures
the call-return speed of the machine.

The results are just as one might expect.  For the call-return intensive
Ackermann's benchmark, the fast procedure call mechanism of the COM shows
up quite clearly: the Smalltalk version on the COM is about twice as fast
as the standard compiled C version on a Sun-3/160, about 10 times faster
than a VAX-11/780.  The `flat' sieve benchmark shows the somewhat slower
speed of the COM, and especially of its memory: here, the COM came out to
about 0.8 times the speed of the Sun-3/160, and about 1.6 times the speed
ot the VAX-11/780.

Status:

Actually, we're a bit up in the air right now.  We've got a design for the
first implementation (which is what all the performance figures above are
for; we call it the COM v.0), and an undergraduate (J.P.Alfke, now graduated)
has modified the standard Smalltalk compiler to produce COM assembly code
(the modified version is called, appropriately enough, the COMpiler).
I've written a microcode-level simulator for the COM v.0, and recently I've
been working on modifying the SystemTracer to produce an image that I can
run under the simulator.

We'd very much like to build the thing and see if it'll really run as fast
as we think it will.  Unfortunately, we haven't found anyone out there with
money who's also interested in having us build one, and if we don't find
someone soon the project is going to go on the shelf and start gathering
dust.  No, I don't know of anybody whose building (or thinking of building)
a machine based on the COM architecture.  Quite honestly, I don't think
it's ready for commercial implementation -- there are just too many questions
which need to be answered.  Another year or so of research is needed (at
minimum) before this beast will be ready to fly in the commercial world.


		Fritz Nordby.	fritz@vlsi.caltech.edu	cit-vax!fritz