[comp.lang.c++] Optimized virtual function invocation

druschel@cs.arizona.edu (Peter Druschel) (08/25/90)

Consider the following code (compiled with g++ 1.37.1):

class Base {
 public:
  inline virtual void foo() {};
};

class Derived : public Base {
 public:
 inline virtual void foo() {};
};

class Container {
 public:
  inline void bar(Base &b) {b.foo();};
};

main()
{
  Derived d;
  Container c;

  d.foo();  /* this call to Derived::foo() is statically resolved and 
               inline expanded
             */

  c.bar(d); /* this call to Container::bar() is also statically resolved 
               and inline expanded 
               the virtual function foo(), called within bar(), however,
	       is invoked indirectly through Derived's virtual table.
             */    
}

   Note that bar() is inline expanded in main() and the argument d passed
   to bar() is known to be of class Derived. Thus, the exact type of the 
   argument passed to foo() from within bar() is also known. Consequently,
   foo() could also be statically resolved and inline expanded.
   It seems to me that with little effort, the compiler could notice
   this situation and take advantage of it.

   This optimization would be very useful; in many cases, it could even 
   eliminate the need for parametric container classes.
   Usually, the member functions of container classes invoke small virtual 
   functions of their arguments (e.g. the contained objects). In many cases,
   the member functions (of the container class) themselves are short and 
   can be inlined. 
   If the compiler was capable of doing the above optimization, it could
   eliminate all the virtual invocations. The resulting container class
   would be just as efficient as a parametric container class created
   specifically for a particular (contained) object class.
   
   What do people think about this? Is there any chance this (and other
   strategies for exploring static resolution of virtual functions, as
   hinted in the "ideas" file in the distribution) will be implemented
   any time soon?

   I have thought about making the required modifications to g++, but I
   don't know enough about g++ internals to be able to estimate what
   would be involved. Some information about how g++ does inlining
   and a few hints might encourage me to do it.

-- 
Peter Druschel				Internet:  druschel@cs.arizona.edu
Dept of Computer Science
University of Arizona 
Tucson, AZ 85721   USA

dl@g.g.oswego.edu (Doug Lea) (08/25/90)

As Paul Vaughan mentioned (thanks!), I've looked into ways of
permitting and performing static resolution, and described my best
thoughts for doing so in the '90 USENIX C++ proceedings. I'll outline
the highlights here.

First, it's important to realize that the goal of static resolution is
NOT merely to replace indirect function calls with direct function
calls. Happily, virtual function calls usually result in very small,
often unnoticeable, performance penalties over direct calls. The goal
is instead to enable PROCEDURAL INTEGRATION, achieved by the inlining
of a typically small number of crucial member functions, which in turn
enables other standard powerful optimization techniques to take place.
For example, the sum(Matrix&) routine listed in my paper, in which all
access functions (rows(), operator()(i, j), etc.)  are virtual,
executes less than 2 times faster when virtual calls are replaced by
direct calls, but runs 25 times faster when hand-optimized in a way
that static resolution could automate. (These are on a Sun4/110 using
g++.)

Factors of 25 are hard to ignore. People (myself included) sometimes
use highly compromised, obscure, and even unsafe design and coding
schemes to avoid such things. My goals in exploring static resolution
lied not so much in getting raw speed per se (since this can almost
always be done now via ugly hacks) but instead in encouraging better
design in the first place, by eliminating much of the motivation for
such compromise. (See my discussions about `Abstract Base Classes'
(ABCs), etc., in the paper.)

After thinking to themselves, ``why can't my compiler just figure
out how to optimize this code all by itself'', most people's first
good idea about how to enable static resolution is to be able to
somehow indicate whether a class is a `leaf' class. Since methods of
such a class could not be overridden, hardwiring would be possible. I
found this unnacceptable for several reasons:

    1) As Steve Clamage wrote, it disables an important software reuse
        idiom in a thoroughly unacceptable manner, since leaf classes
        could never be subclassed.  Thus, the strategy hurts the
        common practice of inheriting mechanism across implementation
        classes.

    2) Unless ALL leaf classes are required to be designated as such
        (and perhaps even still, depending on various other details),
        programmers would still have to write out each and every
        special leaf class version of polymorphic functions
        themselves, even though, as is almost always the case, the
        special versions would contain EXACTLY the same C++
        statements, just optimized in different ways for different
        arguments. This gets out of hand very quickly for functions
        with several arguments. Moreover, if a programmer decides to
        modify a hierarchy, changing a leaf into a non-leaf class, all
        of this work would have to be manually redone.

   3) Because of (2), as Steve and others have also pointed out, this
        leads to no pragmatic improvement over existing constructs,
        since you can now manually invoke subclass routines inside
        functions anyway, albeit in a sometimes-ugly and
        always-dangerous manner. (Dangerous, since such routines can
        fail when new subclasses are introduced. Despite other failings,
        specially denoting leaf classes would prevent this.)

The alternative strategy I laid out is to make support for
polymorphism-by-synthesis (as opposed to polymorphism-by-dispatch) a
first-class mechanism in C++. The technique is called `customization',
and is based on work originally described by Ungar, Chambers, and
others developing the SELF compiler. The strategy amounts to alerting
the compiler (via a barely-defensible bastardization/alternative usage
of the `template' keyword) that all specializations of a method or
function should be resynthesized (customized) by the compiler rather
than relying on virtual function calls that otherwise allow the same
exectuable code to work with each different subclass argument it might
encounter.  This can be implemented via an internal
macro-expander-like system driven by a type inferencer.

For example, if there were a base class `Matrix', a function `float
sum(template Matrix& m)', a Matrix subclass `RowMajorMatrix', and
finally client code
    RowMajorMatrix a; float s = sum(a)
then a specialized version of `float sum(RowMajorMatrix& m)',
specifically optimized for RowMajorMatrix (and not any subclass
thereof!)  would be automatically be synthesized to handle the `float
s = sum(a)' call.  Of course, there are requirements for backup
strategies, involving continued use of dispatch-based versions for
cases where the exact type of an argument is unknown at compile time,
along with various other niceties described in the paper (as well
as a few that weren't -- write for details if you are interested.)

The most controversial aspects of the proposal are 

    1) It has surprisingly far-reaching effects on the C++ type system
        itself, including a few outright conflicts where static
        resolution and dynamic resolution can currently have differing
        effects.  (Of course, I argue that all such changes are for
        the best!  Just for one example, it provides a way for
        programmers to avoid `slicing' via customized call by value,
        which addresses a topic recently discussed here.)

    2) Even though several of the most important cases where
        customization can be applied are relatively straightforward to
        implement, getting the whole thing right doesn't look to be
        doable in an afternoon, a week, a month, or perhaps even a
        year. 

    3) It has consequences for C++ environments. Means for programmers
        to have some say about tradeoffs of potential code bloat for
        efficiency, prospects for integrating customization with
        parameterized types, and  impact on the separate
        compilation model all need to be addressed.

Many people apparently believe that there might be a means to get
similar effects with less impact on the language and current
compilers.  I remain unconvinced that this is possible. The fact that
both the SELF and TS (typed smalltalk) designers and implementers use
variations on customization extensively to obtain their impressive
performance improvements over standard OO compilation schemes lends
credence to the notion that customization can play an important
role in maintaining C++'s usual edge in efficiency compared to other
OO languages.

However, one viable alternative is to incorporate customization into a
metalanguage with associated tools comprising a C++ programmers
workbench, rather than into the language proper. I've been exploring
this too.

-Doug
--
Doug Lea, Computer Science Dept., SUNY Oswego, Oswego, NY, 13126 (315)341-2688
email: dl@g.oswego.edu            or dl@cat.syr.edu
UUCP :...cornell!devvax!oswego!dl or ...rutgers!sunybcs!oswego!dl

brucec@phoebus.phoebus.labs.tek.com (Bruce Cohen;;50-662;LP=A;) (08/28/90)

In article <DL.90Aug25123348@g.g.oswego.edu> dl@g.g.oswego.edu (Doug Lea) writes:
... a description of his proposal for "customization" ...
> 
> Many people apparently believe that there might be a means to get
> similar effects with less impact on the language and current
> compilers.  I remain unconvinced that this is possible. The fact that
> both the SELF and TS (typed smalltalk) designers and implementers use
> variations on customization extensively to obtain their impressive
> performance improvements over standard OO compilation schemes lends
> credence to the notion that customization can play an important
> role in maintaining C++'s usual edge in efficiency compared to other
> OO languages.
> 
It seems to me there's another viable approach to the optimization of leaf
object functions, etc: making the decisions in the linker rather than the
compiler.  The problem with doing this sort of thing in the compiler has
always been the file granularity of compilation in the Unix(tm)/C model of
development, which results in the compiler not being able to guarantee that
it has all the global information it needs to make the decisions.

On the other hand, the linker is in the right place in the pipeline; we
just have to ensure that the information it needs is available.  We also
have to make sure that the linker is smart enough, and the object module
format comprehensive enough, to be able to sew the functions and their
calls together properly.  It's theoretically possible that a smart enough
linker could even be able to sew an inline version of a virtual function into
a place where it could guarantee the class of object being invoked.

The compiler and linker techniques are complementary, and could easily be
made symbiotic.  I'll have to read your paper to be able to comment on
how other aspects of customization fit in, but I wouldn't be surprised if
some of the work could be foisted off on the linker.

> However, one viable alternative is to incorporate customization into a
> metalanguage with associated tools comprising a C++ programmers
> workbench, rather than into the language proper. I've been exploring
> this too.
> 
Are you proposing something like Demeter, which provides an object
metalanguage that can be compiled into code skeletons for several different
languages?  Or a language for controlling the tools separately from
specifying the program they'll transform?

---------------------------------------------------------------------------
NOTE: USE THIS ADDRESS TO REPLY, REPLY-TO IN HEADER MAY BE BROKEN!
Bruce Cohen, Computer Research Lab        email: brucec@tekcrl.labs.tek.com
Tektronix Laboratories, Tektronix, Inc.                phone: (503)627-5241
M/S 50-662, P.O. Box 500, Beaverton, OR  97077

--
---------------------------------------------------------------------------
NOTE: USE THIS ADDRESS TO REPLY, REPLY-TO IN HEADER MAY BE BROKEN!
Bruce Cohen, Computer Research Lab        email: brucec@tekcrl.labs.tek.com
Tektronix Laboratories, Tektronix, Inc.                phone: (503)627-5241
M/S 50-662, P.O. Box 500, Beaverton, OR  97077

glenn@huxley.huxley.bitstream.com (Glenn P. Parker) (08/29/90)

In article <BRUCEC.90Aug27134836@phoebus.phoebus.labs.tek.com> brucec@phoebus.phoebus.labs.tek.com (Bruce Cohen;;50-662;LP=A;) writes:
> It seems to me there's another viable approach to the optimization of leaf
> object functions, etc: making the decisions in the linker rather than the
> compiler.

This is what Eiffel does.  The big problem is the n-squared time (it might
not be that bad) it takes to link large projects.  A recent article in HOOT
pointed this out as a serious drawback, but I agree that the linker is
generally the right place for this kind of work.

--
--------------------------------------------------------------------------
Glenn P. Parker        Bitstream, Inc.
uunet!huxley!glenn     215 First Street
glenn@bitstream.com    Cambridge, MA 02142-1270