wilson@carcoar.Stanford.EDU (Paul Wilson) (09/27/89)
First, let me say that *yes*, it is worth compiling dynamically typed languages. Even if you still do all your type checking at run time, things will usually go several times as fast. The big win is that you avoid the overhead of interpreting the statements of the program one at a time. For a great introduction to interpretation and compilation, see Abelson and Sussman (with Sussman), _Structure_and_Interpretation_of_ Computer_Programs_." It's not light reading, but it is jammed with elegant, enlightening stuff. Scheme is used to implement a Scheme interpreter, as well as a Scheme compiler. (Scheme is a very elegant, modern, lexically-scoped dialect of Lisp.) Anybody who's interested in advanced compilers for dynamically-typed languages should check out Ungar & Chambers' compiler for the language Self, a prototype-based object-oriented language. Self is probably as dynamically typed as languages get, even more so than Smalltalk. See their paper in the SIGPLAN '89 PLDI Conference proceedings (which was a special issue of SIGPLAN Notices a few months ago). The Self compiler does very sophisticated dynamic compilation. That is, it compiles the procedures that are actually executed on an as-needed basis. It tentatively inlines and inter-procedurally optimizes methods (object-oriented procedures), invalidating and recompiling code when the compiler assumptions are violated. It does several kinds of type inference. (For example, noticing that the receiver of a message that an object sends to itself is necessarily of the same class as the sender. And by propagating type information differentially along branches from tests, and by splitting the control flow into two versions where type information can be preserved that way.) Some similar ideas were used in some APL compilers in the late 70's, and apparently forgotten by almost everybody since. (Greg Jaxon pointed this out to me when I started to reinvent these wheels and posted to the net about it.) See the APL '79 proceedings, and an article in the Hewlett-Packard Technical Journal from 1978 or thereabouts, called something like "The Dynamic Incremental Compiler of APL/3000" or something like that. (Sorry for the approximate citations. I'll try to dig up the details.) These techniques may still be in use in APL, but just not talked about much. Ungar and Chambers came up with the same basic idea, but have taken it *much* farther. I think theirs is the sexiest compiler in existence, bar none. (Subjective judgement -- please, no flames. Compilers that are really transparent just appeal to me.) The APL work does have its own interesting features, though. In APL you can tell statically if an array ever has more than one type of data stored into it. When you first encounter, for example, a 30-element one-dimensional array of integers at a certain point in the program, you can compile a special version of the code for it. In this case, it could be an unrolled loop with the type-testing moved out of the loop. Next time, you check to see if you're operating on another 30-element one-dimensional array again. If so, you just execute the same code. If not, you recompile, relaxing the assumptions and creating more generic (but slower) code. I think this could be a big win for polymorphic languages with homogeneous collection types. For example, I could write a program that applies a generic operation to every element of a collection, without knowing what kind of collection or what kind of elements it holds. (And without the compiler being able to statically determine it, as Ada generics require.) If the running program encountered an array of floats, it would compile an unrolled loop to apply the appropriate version of the generic operation (say, the floating point "add") with no type checks. And if next time it encountered a linked list of integers, it would compile an optimized routine for that as well. In a language with an extensible type system, this amounts to lazily compiling a potentially infinite number of special-case optimized code segments. (Which is why you can't do it statically.) Similar techniques could be applied to structures with typed fields. (E.g., variable x may be a foo or a bar, but if it's a bar, its first field is always an int.) It's important to note that none of this sacrifices dynamic type checking -- it just optimizes most of it away. This is important in interactive programming environments and fault-tolerant systems, because you don't want a simple bug to corrupt the integrity of data structures. -- Paul ~ Paul R. Wilson Software Systems Laboratory lab ph.: (312) 996-9216 U. of Illin. at C. EECS Dept. (M/C 154) wilson@carcoar.stanford.edu Box 4348 Chicago,IL 60680 -- Send compilers articles to compilers@esegue.segue.boston.ma.us {spdcc | ima | lotus}!esegue. Meta-mail to compilers-request@esegue. Please send responses to the author of the message, not the poster.