moss@cs.umass.edu (Eliot Moss) (11/25/90)
Mention of architectural speedups via special branch target cache mechanisms, and the fact that they do not apply to "object-oriented" procedure calls because they are indirect, encouraged me to open up wider discussion of the issues surrounding object-oriented procedure calls (OOPCs for short). First, a review of some of the linguistic features. A number of languages can compile a "dynamically bound" (OO) procedure call into an indirect branch, i.e., a branch through a compile-time-known offset in a run-time-known table. Other OO languages require lookup that makes obtaining a fixed, compile-time-known offset difficult. Smalltalk, for example, requires run-time method lookup in general. On the other hand, it has been empirically demonstrated that any given call site rarely varies in the class/type of the argument(s) determining the call binding (i.e., which actual code is run). In 1984 (see POPL for that year) Deutsch and Schiffman presented the *inline method cache*, where a direct call is performed, and the prolog at the top of each method verifies that the class of the receiver is the same as it was at the previous call at the same call site. If the class is the same, then execution proceeds; if it is different, then a lookup is performed (using a full, software managed cache, backed up by an exhaustive search up the class hierarchy) and the code at the call site is changed to refer to the appropriate method, and the class-last-time is changed to reflect the appropriate binding. The calling sequence is something like this: call code-last-time class-last-time ... return point ... Note that the return is 4 bytes beyond the usual return address. Anyway, this technique is very effective (very good hit rate), but does require code modification. It appears it might work well with a properly designed smart I-cache, though. Another thing that I should mention is that some informal, unpublished measurements that I heard about indicate that over 95% of call sites in a statically typed OO language (Trellis/Owl) could be statically bound. That is, there is only one *possible* target for the dynamically bound call at that many call sites. Clearly this is an opportunity for compiler optimization! It does require *global knowledge* of the program, though. The inline method cache and similar techniques can bring the cost of OOPC within an instruction or so of ordinary procedure call, even with hardware speedups for statically known calls. The REAL loss of performance with an object oriented style is losing the ability to inline the whole procedure call! (IMHO.) In the long run, we will take a considerable performance hit if we do not perform appropriate global analysis and optimization to recognize and statically bind every call we can, and thus gain the ability to inline and apply further optimizations. Interesting related ideas include method splitting and customized compilation (see the Chambers and Ungar paper from SIGPLAN 90, for example). This is yet another case where we need experts from different specialties to try to understand each others area to some extent, and cooperate to achieve the best result. Yours in OO performance ..... Eliot Moss -- J. Eliot B. Moss, Assistant Professor Department of Computer and Information Science Lederle Graduate Research Center University of Massachusetts Amherst, MA 01003 (413) 545-4206; Moss@cs.umass.edu -- Send compilers articles to compilers@iecc.cambridge.ma.us or {ima | spdcc | world}!iecc!compilers. Meta-mail to compilers-request.