fritz@polecat.caltech.edu (Fritz "3 points" Nordby) (08/27/87)
In article <1061@pdn.UUCP> ken@pdn.UUCP (Ken Auer) writes: >The other question is whether anyone is working on a commercial >workstation based on the COM (Caltech Object Machine) architecture on >which to run Smalltalk-80. There was an article about the machine in >comp.arch in May by Fritz Nordby which really impressed me. Some of >the highlights of the article included: [excerpt deleted for brevity] >Perhaps Fritz will get psyched and repost the article for us >Smalltalk-types, and give us an update. Well, I'd rather not repost the whole thing, I'd rather post something interesting. Failing that ... Some background: The COM developed out of a language-directed architecture project that a few people started working on here a few years ago. Originally the machine was being designed for a different language (not Smalltalk-80, but a stack-based object-oriented language); by the time I got into the project, though, the machine was intended to be for running Smalltalk. There's been one paper published: W.J.Dally & J.T.Kajiya, ``An Object Oriented Architecture,'' Proceedings of the 12th Annual International Symposium on Computer Architecture 1985, pp.154-161. That paper was written before I took over the project, and many of the architectural features it includes have been eliminated from the COM (things like floating-point addresses), and others have been changed to the point of unrecognizability (like the procedure linkage mechanism). An overview of the architecture: Perhaps the most basic idea in the COM architecture is that machine instructions should provide the same late-bound polymorphism that Smalltalk expressions provide. COM instructions look like instructions for most other 3-address, register-register machines: "ADD r1,r2->r3" for example. The difference is that in the COM there's only one "ADD" instruction (not one for integers, and one for floats, and one for double precision floats, etc.). The action taken by the machine for such an instruction depends on the types of the instruction's operands. To support this idea, the machine must provide two basic facilities: first, the machine must provide a simple facility for determining instruction operand types at instruction execution time; and second, the machine must provide a simple facility for handling exceptions and traps arising from either unknown operand type combinations or from some condition encountered in executing an instruction. The first of these is straightforward enough: each word of COM memory is tagged according to the type of data it contains. This extends to tagging pointers (Smalltalk OOPs) according to the type of object they point to. (Note: the decision to put the class of the object in the pointer tag was made for performance reasons; it is slightly more flexible to put the class in the object table or in an object header, but it takes much longer to access the information. The only case when the class of an object changes is when a "become:" is done between objects of different classes, which is fairly rare; all other cases ("become:" between objects of the same class, growing an object, etc.) are still handled fairly easily.) During instruction execution, the tags are fetched along with the data, and are used to determine how the instruction should be executed. The second facility required to support late-binding instruction-level polymorphism as provided by the COM is a fast, simple, yet flexible execption- and trap-handling mechanism. If an instruction fails for any reason, the COM does a method call. A new context is allocated, and the operands of the failing instruction are saved there; then a method lookup is performed, and the resulting block of code is called in the new context. When the called method returns, the value it gives is stored in the destination of the failing instruction. In order to keep method invocation time to a minimum, the machine keeps a cache of method call translations; if the desired translation cannot be found there, a backup method is called which typically does the full lookup and method call needed. Performance estimates: We've evaluated the performance in two basic ways. First, we took the worst-case execution times for the COM instruction set. Then we looked at the Smalltalk-80 Virtual Machine bytecode instruction set, and we made what we felt would be an appropriate translation of bytecodes into COM instructions. (For example, we eliminated all the "push temporary" bytecodes, since the COM doesn't take any extra time to access temporary variables with its 3-address instructions; for the "add" bytecode, we took the translation to be the COM "ADD" instruction; etc..) Finally, based on the bytecode frequencies from the analysis at HP, given in the Green Book (the one by Krasner, not the one by Kaddhafi! -- see the reference below), we calculated an equivalent bytecode rate for the COM. With a 75ns clock and a single-bank, non-interleaved memory, the COM should show an equivalent bytecode rate of 3.3 Mbytecodes/sec. Improving the memory system should raise this above 4 Mbytecodes/sec. (The reference is: J.Falcone, ``The Analysis of the Smalltalk-80 System at Hewlett- Packard,'' in _Smalltalk-80, Bits of History, Words of Advice_, G.Krasner (ed.), Addison-Wesley, 1983.) The second evaluation of the COM is a bit unorthodox, at least in the Smalltalk world. We took two traditional benchmarks, the sieve of Eratosthenes and Ackermann's function, and we evaluated the performance of the COM for these problems. We didn't use the standard Smalltalk benchmarks because (1) the microbenchmarks are too specific to the Smalltalk-80 Virtual Machine, and (2) the macrobenchmarks require more of the system than we've been able to bring up yet (although I'm working on that). Also, we realized that the COM is fast, and we want to compare its performance for Smalltalk to other machines performances for languages like C, a comparison that the standard Smalltalk benchmarks don't easily allow. The two benchmarks are very different. The sieve does no procedure calls (and the translation into Smalltalk was done with some care to avoid any non-primitive methods, and to do flow control as conditional branches); its execution is dominated by memory access, simple integer arithmetic, and simple conditional flow control. Ackermann's function, on the other hand, is doubly recursive: evaluating (3 ackermann: 6) on the COM involves 172233 method calls in 1462223 instructions (one call and one return every 8.5 instructions). Execution of the Ackermann's function benchmark measures the call-return speed of the machine. The results are just as one might expect. For the call-return intensive Ackermann's benchmark, the fast procedure call mechanism of the COM shows up quite clearly: the Smalltalk version on the COM is about twice as fast as the standard compiled C version on a Sun-3/160, about 10 times faster than a VAX-11/780. The `flat' sieve benchmark shows the somewhat slower speed of the COM, and especially of its memory: here, the COM came out to about 0.8 times the speed of the Sun-3/160, and about 1.6 times the speed ot the VAX-11/780. Status: Actually, we're a bit up in the air right now. We've got a design for the first implementation (which is what all the performance figures above are for; we call it the COM v.0), and an undergraduate (J.P.Alfke, now graduated) has modified the standard Smalltalk compiler to produce COM assembly code (the modified version is called, appropriately enough, the COMpiler). I've written a microcode-level simulator for the COM v.0, and recently I've been working on modifying the SystemTracer to produce an image that I can run under the simulator. We'd very much like to build the thing and see if it'll really run as fast as we think it will. Unfortunately, we haven't found anyone out there with money who's also interested in having us build one, and if we don't find someone soon the project is going to go on the shelf and start gathering dust. No, I don't know of anybody whose building (or thinking of building) a machine based on the COM architecture. Quite honestly, I don't think it's ready for commercial implementation -- there are just too many questions which need to be answered. Another year or so of research is needed (at minimum) before this beast will be ready to fly in the commercial world. Fritz Nordby. fritz@vlsi.caltech.edu cit-vax!fritz