[comp.compilers] Inline -- Oversold? No! Maybe!

burley@gnu.ai.mit.edu (Craig Burley) (05/03/91)
I disagree that "inline" in C/C++ compilers has been oversold...at least,
it hasn't in any literature I've read.

I agree with the idea that arbitrary inlining can be disadvantageous to
performance and program size.

However, the "marketing" for inline has always been, for me, the ability
to compose smaller semantically correct functions without undue
performance implications.

For example:

Used to be, if I wrote a name-space manager that happened to keep all the
names it knew about on a linked list, any users of that manager might do
the following to get the next name:

    name n;  /* "name" is, say, "typedef struct {...} name;" */

    ...
    n = n->next;

Then I learned (through education and the necessities of maintenance and
debugging) to abstract away such things so the implementation was known
only to the name-space manager, not to all its users:

    n = name_next(n);

(Or, better yet, define a nameHandle typedef and let users use that, in
case I needed to change name's implementation of lists for optimization or
other purposes.)

The obvious way to do this in C is via a macro (in name.h):

    #define name_next(n) ((n)->next)

All the parens are, in the general case, necessary.

Now, when I started down this path, I naturally found out (as do most C
programmers of library packages of this sort) that macro replacement
(string substitution) has its problems.  Parenthesizing to avoid
surprising precedence problems are one thing, but two other problems come
up: multiple side effects when referencing an argument twice, as in the
trivial case of:

    #define some_function(i) ((i) * 3 + (i) / 2)

(Where if "some_function" is invoked with an expression having a side
effect, the side effect happens twice.)

The other nastier problem (since I wasn't used to it) hit me the other day
when I used a macro like this:

    #define symbol_set_value(s,value) ((s) = expression_eval(value))

The compiler barfed, and after an unpleasant amount of mucking around, I
discovered that the macro's use of "value" as a dummy conflicted with
expression_eval (a macro) referencing a structure member named "value" (a
member of the dummy's structure type, for example), causing substitution
where it wasn't wanted.

Ideally, the "definer" of symbol_set_value shouldn't care what goes on in
the implementation of expression_eval.  (Changing the dummy argument name
from "value" to "v", for example, solved the problem, but obviously
shouldn't be necessary when looking at the situation as an
abstractionist.)

Anyway, the "proper" solution to all this is to use the mechanisms already
provided for in C to cause binding and evaluation to happen the way people
expect when using function-invocation form: i.e. use functions.

Unfortunately, calling functions for things like evaluating "n->next"
given n is almost always (on most architectures) slower AND larger (in
program memory size) than doing these things directly.

So, the reason for inline in C (as GCC provides), at least as I've always
understood it, is to allow the (usually) more semantically reliable and
predictable mechanism of function definition and reference (vs. macro
definition and reference) without sacrificing performance by putting the
function definition in the #include file with the "inline" attribute.
That is, to promote data abstraction and encapsulation to a higher degree
by offering more reliability without sacrificing performance.

In C++, since the idea is to use such abstraction as much as possible, and
things like references are provided so macros like

    #define set_me(x,v) ((x) = (v))

can be transparently made into functions (which they can't in ANSI C), it
apparently drove the original decision to provide the "inline" directive
so such trivial abstractions can still be compiled into efficient code.
(At least I think "inline" first appeared in C++, but I'm not sure.)

There are arguments against inlining that still apply to trivial
abstractions like the ones I've shown above:

    -  Changing them requires recompiling all user modules (sure, that's
       true, but changes to trivial abstractions, at least in my experience,
       tend to be changes to names and formal parameters at least as often
       as implementation, and the former changes require recompilation anyway)

    -  Inlining slows down compilation (yes, of course abstraction where
       direct expression entry would suffice will slow down compilation in
       most cases, but that's a case of having the computer do the kind of
       work it is supposed to do -- it should "know" what "symbol_next" means
       and how to do it, rather than the programmer "knowing" to do "n->next"
       each time, and having to change every such reference when the
       implementation changes

    -  Inlining slows down optimization (yes, because optimization is
       typically much better for trivial abstractions, so code surrounding
       these trivial inlines doesn't get broken up by a call possibly
       requiring dumping/restoring registers and breaking control-flow
       analysis)

    -  Inlining doesn't always do as good a job as interprocedural analysis
       (yes, and interprocedural analysis is orders of magnitude more
       difficult to implement so that it does at least as good as inlining
       in most cases...and even then, inlining gives interprocedural
       analysis a great head start as long as it is intelligently used)

Inlining naturally has advantages beyond what I've discussed for trivial
abstractions.  For example, in cases where an actual argument to an inline
procedure is a constant, the inline function definition might be easily
optimized by the compiler to fold the constant; in cases where a formal
argument isn't even referenced by the inline function definition, the
compiler might decide to avoid wasting time constructing the corresponding
actual argument (evaluating the expression, except for its side effects
perhaps); and so on.

However, I don't think "inline" is anything close to a GENERAL solution
for control over optimization of an application, and I always support the
idea of doing before/after performance evaluation when turning on
inlining.  (Note: "turning on" inlining in a decent compiler like GCC near
the end of a large programming project is much less likely to result in
lots of new bugs introduced compared to "turning on" macroization of
trivial abstractions as has been typical in the past; I've participated in
the latter sort of project to know how dangerous it is, which is why I
like "inline" so much.)

For example, when it comes to a function definition that isn't "trivial"
(simple get or set of a member value or evaluation of a simple function)
but still smallish, one doesn't always want to say "inline me always" in
the definition as "inline this guy here" in the reference, such as inside
a tight loop.  On the other hand, using naming conventions, #include, and
so on, I suppose "inline" could be used to create a mechanism for doing
this in a standard way on a given project.

Generally, I wouldn't recommend "inline" for any function that
incorporates a loop or a decision without first evaluating exactly the
extent to which the inline code differs from the code needed to make a
call to a procedure (non-inline) version of the function.  (That would be
a "nonportable" decision, i.e. one which needed to be reevaluated for
every port to a new machine; it is rarely necessary to do so for inlining
trivial abstractions.)

Oh, one other advantage of inline that I'm not sure of but makes
conceptual sense anyway: to a debugger, an inline function has a lot more
chance of looking like something useful (something "invokable" at the
debugger command line, for example) than a corresponding macro (since the
latter usually is removed by the preprocessor and thus never seen by the
debugger).

Hope this rambling helps people who thought inline was great then weren't
so sure.  It's great when you remember why it was created (IMHO): for
fairly trivial abstractions/functions.  To make a lexer run faster by
inlining the function that implements it?  Naaah.  :-) --

James Craig Burley, Software Craftsperson    burley@gnu.ai.mit.edu
-- 
Send compilers articles to compilers@iecc.cambridge.ma.us or
{ima | spdcc | world}!iecc!compilers.  Meta-mail to compilers-request.