[comp.object] Static typing and OOP efficiency

pcg@cs.aber.ac.uk (Piercarlo Grandi) (02/28/91)

On 22 Feb 91 13:58:57 GMT, chip@tct.uucp (Chip Salzenberg) said:

chip> [ Please direct followups appropriately. ]

chip> But I am not willing to jump on the Objective-C bandwagon, if only
chip> because static typing is a useful way to avoid a great deal of
chip> run-time messaging and debugging overhead.

chip> According to pcg@cs.aber.ac.uk (Piercarlo Grandi):
pcg> I think this is a grave mistake of perspective, at least for application
pcg> programming, for the following reasons:
pcg>
pcg> 1) Objective C does allow you to declare explicitly the type of objects,
pcg> via a cast like notation.

chip> Allow, yes, but not require.

Just like C++, in the opposite direction. We should not be arguing about
which way the default should be; that's pretty irrelevant.

chip> Besides, as far as I know, there is no guarantee that the message
chip> set supported by a given class will not expand after the users of
chip> that class have already been compiled.

Same problem in C++, really, but in C++ it triggers recompilation
instead of just relinking like in Objective C (if you are happy with
dynamic overloading).

The idea in Objective C is that you resolve statically what you can, and
turn to dynamic overloading for what you cannot.

chip> So compiler is not free to reject a message out of hand on the
chip> basis that the recipient will not understand it.

Yeah, instead of rejecting, it just gets resolved dynamically instead of
statically. In C++ you simply don't have this option in the general case.

pcg> 2) In any case even in C++ there is a tendency to declared everything
pcg> virtual, just in case, so you still have dynamic dispatching.

chip> I only make a function virtual if I have an explicit reason for
chip> doing so.

That's what I do too, but I tend to have complete control over the shape
of my programs, because I tend to do monolithic standalone systems
software. The problem is that OO, at the application level, is supposed
to be about more reuse, and therefore libraries.

When you write a library in C++ you tend to make everything virtual that
can plausibly be, just in case. The people at AT&T did not make certain
things virtual in the 'iostream' library and a lot of people are unhappy
about that.

But after all virtual is not that bad; in many cases overloading
resolution can still be done at compiletime (especially if you use
appropriate casts), just like Objective C can.

pcg> 3) Dynamic dispatchin can be made impressively fast, by the use of
pcg> various tricks (hashed tables, type deductive compilers, clever
pcg> linkers, hinting and caching).

chip> There's no way, however, that the dynamic dispatching of
chip> Objective-C can outperform even a virtual C++ member function,
chip> much less a normal member function, assuming that the C++
chip> implementor was in her right mind when she wrote the code
chip> generator.

First, the speed on non dynamically overloaded messages is exactly the
same in C++ and Objective C, and as you note later, this is the
prevalent case.

As to dynamic overloading, C++ has an advantage in that it has manifest
types with bounded runtime overloading, while Objective C has latent
types with unbounded runtime overloading. This means that C++ can use
double indirection into vtbls, where Objective C has to use hash tables
or caching, hinting, and the like.

So Objective C is at a disadvantage here. There are some questions
though to consider:

[1] How large is the extra cost?
[2] How frequently is it incurred?
[3] Does it make a large difference to the overall runtime?

The answers are, according to some scant but reliable evidence:

[1] Possibly around 100-200%, if you are very clever.
[2] Depends on the application, but usually not often.
[3] For most applications, a quite small difference.

I would really like somebody who has got both compilers, e.g. a NeXT
user, to do a table with the relative costs of calling non virtual in
the two languages, and then virtual.

chip> [ ... C++ ... ] permitting most (yes, _most_) member functions to
chip> be non-virtual.

Agreed, that's why it is not so important that Objective C is at as
disadvantage for virtual functions.

Actually C++ has some advantages over Objective C, but these are not
speed advantages related to dynamic overloading. They are that static
overload resolution is done on all arguments to a function and not just
the first, and possibly
multiple inheritance.

Efficiency advantages that are important for systems programming, but
not for most applications, are C++ support for inline functions and
objects. These are IMNHO far more important than the relative speed of
relatively infrequent dynamic overload resolution.

I am not very current on Objective C, so maybe inline functions have
been added as well as inline objects. Hopefully the new Objective C book
will come out eventually.

pcg> On the other hand I have been serious considering writing an OS kernel
pcg> in bytecoded OO scheme, as after all a well designed OS kernel should be
pcg> very rarely invoked and so on...

chip> I hold almost the opposite view: A well-designed kernel is fast enough
chip> so that applications programmers will call it often, not fearing that
chip> the frequent calls will make their application inefficient.  Look at
chip> the periodic brouhaha about using spawn() instead of fork()/exec() for
chip> efficiency, all of which is moot if fork() is implemented efficiently.

Of course, but my kernel architecture does not look (internally)
anything like Unix; the traditional Unix kernel architecture is
acknowledged to advise fairly large granularity of interaction with it,
even if it written in C, but other architectures need not have the same
property.

For example in my kernel architecture fork() is not a kernel primitive,
and neither are the scheduler, memory manager, or drivers, for that
matter.

My kernel architecture just defines capability based communications
channels between user state modules, and these do all the work, and be
written in any language whatsoever. If you have a more conventional
frame of mind, my kernel architecture is just that of a dynamic linker.
--
Piercarlo Grandi                   | ARPA: pcg%uk.ac.aber.cs@nsfnet-relay.ac.uk
Dept of CS, UCW Aberystwyth        | UUCP: ...!mcsun!ukc!aber-cs!pcg
Penglais, Aberystwyth SY23 3BZ, UK | INET: pcg@cs.aber.ac.uk

sdm@cs.brown.edu (Scott Meyers) (02/28/91)

In article <PCG.91Feb27190941@odin.cs.aber.ac.uk> pcg@cs.aber.ac.uk (Piercarlo Grandi) writes:
| First, the speed on non dynamically overloaded messages is exactly the
| same in C++ and Objective C, and as you note later, this is the
| prevalent case.

Do either of you have a reference for this (presumably empirical)
observation?  Frankly, I'm skeptical.  Based purely on a seat-of-the-pants
feeling, my guess would be that the dynamic type of a pointer/reference in
C++ frequently differs from its static type, and that most function calls
are to virtual functions.  Or did I misunderstand the claim?

Scott

-------------------------------------------------------------------------------
What do you say to a convicted felon in Providence?  "Hello, Mr. Mayor."

jbuck@galileo.berkeley.edu (Joe Buck) (03/01/91)

In article <PCG.91Feb27190941@odin.cs.aber.ac.uk>, pcg@cs.aber.ac.uk (Piercarlo Grandi) writes:
|> When you write a library in C++ you tend to make everything virtual that
|> can plausibly be, just in case. The people at AT&T did not make certain
|> things virtual in the 'iostream' library and a lot of people are unhappy
|> about that.

Only people who don't understand the design.  They'd be even more unhappy
if the functions were virtual because they'd get function calls per
character of I/O.

streams and iostreams are designed to have the "streambuf" class
overloaded; almost everything you can do by overloading a hypothetical
iostream class with everything virtual you can do by overloading streambuf
instead, and more efficiently.  The important streambuf methods are
virtual, and the stream classes interface to streambuf and derived
classes in an efficient way.

That is, iostream is the way it is to exploit both the speed of inline
nonvirtual function calls and the flexibility of inheritance.  It's far
from perfect, but it is the way it is for good reason.

--
Joe Buck
jbuck@galileo.berkeley.edu	 {uunet,ucbvax}!galileo.berkeley.edu!jbuck

hoelzle@Neon.Stanford.EDU (Urs Hoelzle) (03/01/91)

pcg@cs.aber.ac.uk (Piercarlo Grandi) writes:

>pcg> 3) Dynamic dispatchin can be made impressively fast, by the use of
>pcg> various tricks (hashed tables, type deductive compilers, clever
>pcg> linkers, hinting and caching).

>chip> There's no way, however, that the dynamic dispatching of
>chip> Objective-C can outperform even a virtual C++ member function,
>chip> much less a normal member function, assuming that the C++
>chip> implementor was in her right mind when she wrote the code
>chip> generator.

Sure it can.  The fact that x.foo *conceptually* is a dynamically-
dispatched call does not mean that there has to be a dynamic dispatch
*at run-time*.  The keywords here are type "analysis" and "inlining".
If the compiler knows the type of x (e.g. because the previous
statement was "x = new baz") it can statically bind the call, and if
bar is small, it can inline it, thus eliminating the call overhead
entirely (and opening new opportunities for global optimizations).

For example, on a simple nested loop benchmark, the Self compiler
generates code which performs more than two conceptual message sends
*per machine cycle* because it is able to completely eliminate the
dynamically-bound sends.  (In Self, every method send is a "virtual"
in C++ terminology, and there are no type declarations.)  The
techniques used to achieve this are beyond the scope of this posting,
but I can give references if you need them.

This proves that, at least in principle, your assertion is incorrect.
Now, I'm not claiming that dynamically-dispatched calls are always
faster in untyped languages, or even that they are faster on average.
What I am saying is that dynamically-dispatched calls can be faster
than C++ virtual function calls *as they are implemented today*.  C++
could actually benefit from the same optimization techniques that the
Self compiler uses.

-- 
------------------------------------------------------------------------------
Urs Hoelzle                                            hoelzle@cs.stanford.EDU
Center for Integrated Systems, CIS 42, Stanford University, Stanford, CA 94305

andyk@kermit.UUCP (Andy Klapper) (03/01/91)

In article <66645@brunix.UUCP> sdm@cs.brown.edu (Scott Meyers) writes:
>In article <PCG.91Feb27190941@odin.cs.aber.ac.uk> pcg@cs.aber.ac.uk (Piercarlo Grandi) writes:
>| First, the speed on non dynamically overloaded messages is exactly the
>| same in C++ and Objective C, and as you note later, this is the
>| prevalent case.
>
>Do either of you have a reference for this (presumably empirical)
>observation?  Frankly, I'm skeptical.  Based purely on a seat-of-the-pants
>feeling, my guess would be that the dynamic type of a pointer/reference in
>C++ frequently differs from its static type, and that most function calls
>are to virtual functions.  Or did I misunderstand the claim?

The Objective-C uses a table of function pointers to handle what we call
'staticly' bound messages (ie. where the compiler knows the type of the
object and the name of the selector(method) at compile time).  If I understand
how C++ works (and I could well be wrong), this means that Objective-C
staticly bound messages are just as fast as C++ virtual functions.  While
slightly slower than C++'s name mangleing scheme it does allow you to change
any header files you like without haveing to recompile the world.  In
addition I would like to state that it is POSSIBLE to make dynamic
binding (type and/or selector(method) name are UNKNOWN at compile 
time) as fast as a compare and an indirect function call 85% of the
time.  (I hope to get these optimizations added to the Stepstone version
of Objective-C at some point in time.  I really don't know what changes to
the compiler NeXT made or is planning to make).

How much messageing do you guys do relative to the rest of your application ?
Did you know that the function calling sequence used by Pascal is faster AND
smaller than the one used in C ?  Why aren't you using Pascal ?

My much biased point is that concentrating on this one issue is myopic.  The
overall hit to your system is small and it can be made smaller.  If you are
that close to the edge 1) You most likely shouldn't be using Objective-C ,
2) you are most likely spending an additional 40% of your time just trying to
make your application fit/fast enough than if you were not so close to the
edge.

I hope the first paragraph at least was informative.

-- 
The Stepstone Corporation                    Andy Klapper
75 Glen Rd.                                  andyk@stepstone.com
Sandy Hook, CT 06482                         uunet!stpstn!andyk
(203) 426-1875    fax (203)270-0106

chip@tct.uucp (Chip Salzenberg) (03/02/91)

[ Followups to comp.lang.c++ ]

According to sdm@cs.brown.edu (Scott Meyers):
>Based purely on a seat-of-the-pants feeling, my guess would be that the
>dynamic type of a pointer/reference in C++ frequently differs from its
>static type, and that most function calls are to virtual functions.

My code has innumerable non-virtual member functions, most of which
are inherited by derived classes and made public.  Virtual functions
are relatively rare, but what they lack in numbers they make up for in
importance.

I suspect that converted C programmers like me tend to non-virtuals by
default, while converted Objective C and Smalltalk programmers tend to
virtuals by default.
-- 
Chip Salzenberg at Teltronics/TCT     <chip@tct.uucp>, <uunet!pdn!tct!chip>

pcg@cs.aber.ac.uk (Piercarlo Grandi) (03/02/91)

On 28 Feb 91 23:39:25 GMT, jbuck@galileo.berkeley.edu (Joe Buck) said:

jbuck> In article <PCG.91Feb27190941@odin.cs.aber.ac.uk>,
jbuck> pcg@cs.aber.ac.uk (Piercarlo Grandi) writes:

pcg> When you write a library in C++ you tend to make everything virtual that
pcg> can plausibly be, just in case. The people at AT&T did not make certain
pcg> things virtual in the 'iostream' library and a lot of people are unhappy
pcg> about that.

jbuck> Only people who don't understand the design.  They'd be even more
jbuck> unhappy if the functions were virtual because they'd get function
jbuck> calls per character of I/O.

Let me remark that this is the least of the overhead problems in the
AT&T streambuf. Still, the cost of the virtual call could be obviated if
the functions had been declared virtual inline. In many (most) cases the
overloading could be statically resolved.

jbuck> [ ... ] That is, iostream is the way it is to exploit both the
jbuck> speed of inline nonvirtual function calls and the flexibility of
jbuck> inheritance.  It's far from perfect, but it is the way it is for
jbuck> good reason.

I don't agree with many of the tradeoffs of iostream, and in particular
that some choices have been made with good reason. I must admit though
that IMNHO the particular one about making certain things non virtual is
probably a good choice.

But I was not arguing on this point; my point was that, right or wrong,
the debate about virtual in the stream and iostream classes is one of
the elements that has prompted several people to say "virtual unless you
are damned sure nobody will ever need otherwise".

A bit like "protected unless you are damn sure nobody will ever need
those names". Programmer's sociology, not the technical merits.
--
Piercarlo Grandi                   | ARPA: pcg%uk.ac.aber.cs@nsfnet-relay.ac.uk
Dept of CS, UCW Aberystwyth        | UUCP: ...!mcsun!ukc!aber-cs!pcg
Penglais, Aberystwyth SY23 3BZ, UK | INET: pcg@cs.aber.ac.uk

cox@stpstn.UUCP (Brad Cox) (03/04/91)

In article <PCG.91Feb27190941@odin.cs.aber.ac.uk> pcg@cs.aber.ac.uk (Piercarlo Grandi) writes:
>As to dynamic overloading, C++ has an advantage in that it has manifest
>types with bounded runtime overloading, while Objective C has latent
>types with unbounded runtime overloading. This means that C++ can use
>double indirection into vtbls, where Objective C has to use hash tables
>or caching, hinting, and the like.

Objective-C uses a central messager routine. This routine is replaceable
and customizable by any user. Objective-C does not *have* to use hash tables,
caching, etc. These are merely space optimization techniques that can
be changed by any user.

At one point we actually supported three different versions, one that
optimizes space (linear lookup), another speed (precisely the same as
C++'s vtbl idea), and one that puts an upper bound on space and speed.
We finally settled on the latter specifically to overcome the geometrical
growth in space with inheritance depth of the vtbl scheme.

-- 

Brad Cox; cox@stepstone.com; CI$ 71230,647; 203 426 1875
The Stepstone Corporation; 75 Glen Road; Sandy Hook CT 06482

chip@tct.uucp (Chip Salzenberg) (03/04/91)

According to andyk@kermit.UUCP (Andy Klapper):
>Objective-C staticly bound messages are just as fast as C++ virtual functions.

I believe this statement to be correct.

>How much messageing do you guys do relative to the rest of your application?

Given the low cost of "messaging" (calling of member functions) in
C++, I rely on it quite heavily -- certainly more than I would in
Objective C.

>Did you know that the function calling sequence used by Pascal is faster AND
>smaller than the one used in C ?  Why aren't you using Pascal ?

I am disappointed to see such misinformation from a language
technician.  Neither C, nor C++, nor Pascal has _a_ calling sequence.

It is true that typical C implementations have been less efficient
than typical Pascal implementations.  That is true only because
pre-ANSI C compilers were unable to determine whether a called
function was variadic.  ANSI C has made function prototypes mandatory
for variadic functions, and C++ requires prototypes for all functions.
So Pascal, C and C++ compilers are on equal ground.

This point is not simply academic.  Zortech C++, for example, uses the
Microsoft Pascal calling sequence for all non-variadic C++ functions.

Now perhaps we can leave aside piddling issues like calling sequence
and tend to the more significant issues of dynamic and static typing.

>My much biased point is that concentrating on this one issue is myopic.

Concentration on any one issue is myopic, if it is at the permanent
expense of other issues.

dsouza@gwen.cad.mcc.com (Desmond Dsouza) (03/05/91)

In article <11545@pasteur.Berkeley.EDU> jbuck@galileo.berkeley.edu (Joe Buck) writes:

  >   |> When you write a library in C++ you tend to make everything virtual that
  >   |> can plausibly be, just in case. The people at AT&T did not make certain
  >   |> things virtual in the 'iostream' library and a lot of people are unhappy
  >   |> about that.
  >
  >   Only people who don't understand the design.  They'd be even more unhappy
  >   if the functions were virtual because they'd get function calls per
  >   character of I/O.
  >

hoelzle@Neon.Stanford.EDU (Urs Hoelzle) writes:

  > This proves that, at least in principle, your assertion is incorrect.
  > Now, I'm not claiming that dynamically-dispatched calls are always
  > faster in untyped languages, or even that they are faster on average.
  > What I am saying is that dynamically-dispatched calls can be faster
  > than C++ virtual function calls *as they are implemented today*.  C++
  > could actually benefit from the same optimization techniques that the
  > Self compiler uses.

The Self compiler technique of 'customizing' (OOPSLA '89) seems like
it could really eliminate *much* of the overhead of virtual functions.

Doug Lea's paper in Usenix/C++ 90 described it well, if you are interested.

--
Desmond.
--

-------------------------------------------------------------------------------
 Desmond D'Souza, MCC CAD Program | ARPA: dsouza@mcc.com | Phone: [512] 338-3324
 Box 200195, Austin, TX 78720 | UUCP: {uunet,harvard,gatech,pyramid}!cs.utexas.edu!milano!cadillac!dsouza

tma@m5.COM (Tim Atkins) (03/23/91)

In article <PCG.91Feb27190941@odin.cs.aber.ac.uk> pcg@cs.aber.ac.uk (Piercarlo Grandi) writes:
>
>pcg> 3) Dynamic dispatchin can be made impressively fast, by the use of
>pcg> various tricks (hashed tables, type deductive compilers, clever
>pcg> linkers, hinting and caching).
>
>chip> There's no way, however, that the dynamic dispatching of
>chip> Objective-C can outperform even a virtual C++ member function,
>chip> much less a normal member function, assuming that the C++
>chip> implementor was in her right mind when she wrote the code
>chip> generator.
>
>As to dynamic overloading, C++ has an advantage in that it has manifest
>types with bounded runtime overloading, while Objective C has latent
>types with unbounded runtime overloading. This means that C++ can use
>double indirection into vtbls, where Objective C has to use hash tables
>or caching, hinting, and the like.
>

At my last company I spent a bit of time examining and working on improved
runtime dynamic message binding schemes.  My final dispatcher for Objective
C does a full run-time dynamic bind with only ~20% more time overhead
than C++ uses for its limited form of dynamic binding with known type
family and relative position of implementation info.  The principal change
to the scheme used by Objective C and Smalltalk was to invert the conceptual
lookup table so that the primary index is by message (selector) instead
of by class.   Both classes and selectors are encoded as relative indices
and the actual class/implementation list is further indexed by an
augmented bit-vector with on-bits signifying that the corresponding class
implements the message.    Therefore the principal space cost is the size
of the bit vector.  At a modest decrease in efficiency the bit vector may
be trimmed to only the set of classes that implement the message.  In space
this dispatch mechanism is smaller than that of either Objective C or 
C++.  The mechanism also does not preclude Multiple Inheritance as the standard
Objective C mechanism does.  

Of course the binding time is significantly improved in the presence of compile
time type hints.  The dispatch time is constant unlike in the traditional 
scheme and is twice as fast for a full dispatch as the speed acheived for
a cache hit in the original Objective C dispatcher!

One thing that seems to be forgotten in such comparisons is that the comparison
should not be the ratio of d(A) to d(B) where d is dispatch speed and A and
B are languages but should instead be the ration of (f + d(A)) to (f + d(B))
where f is the time taken for the average method execution.  This of course
assumes that f is comparable in both languages which is the case when 
comparing Objective C and C++ for methods that do normal C-ish things.

- Tim Atkins