jj@idris.id.dk (Jesper Joergensen [ris]) (11/30/89)
PREFACE:
I've already mailed this letter on Nov 27 1989 as a reply to a message from
tiemann@arkesden.Eng.Sun.COM, but due to troubles with our local networks
I'm not sure whether it has reached you at all. To be sure, I send it again
this time through "the official channels".
ATTN: Michael Tiemann
Sorry to have to bother you again, but my problem is still there. You
mailed me a sample program, which you've probably forgotten everything
about, so I include it below:
======= SAMPLE PROGRAM START =======
class Base {
public:
int b ;
Base() { b = 2 ; }
Base(Base& x) { b = x.b ; }
const Base operator=(const Base& x)
{ b = - x.b ;
return *this ; }
} ;
class Dpub : public Base {
public:
int d;
Dpub() { d = 7; }
Dpub(Base& x) : (x) { d = 1001; }
} ;
extern "C" void printf (char *, ...);
int main()
{
Base b ;
Dpub dpub ;
b = b ;
dpub = dpub ;
b = dpub ;
printf ("d.b == %d; d.d == %d\n", dpub.b, dpub.d);
return 0 ;
}
======== SAMPLE PROGRAM END ========
The program worked as you predicted (even under version 1.35.0), so I first
thought --- polite as I am --- that I'd made a mistake. However, after
doublechecking everything I found that the problem was still present in
my original program, so I tried to figure out what the differences between
the two cases were. I FOUND A GENERAL DIFFERENCE: you have data members in
the derived class I HAVEN'T, and if i removed the data member d from the
derived class in your program the error reappeared both in the output
(dpub.b is 2, not -2 after assignment) and the generated code. NOTICE that
this is invariant of whether you use "-O" or not. Just include the class:
class Dpub2 : public Base {
public:
Dpub2() { }
Dpub2(Base& x) : (x) { }
} ;
and the extra statements:
Dpub2 dpub2 ; /* In declarartions */
:
dpub2 = dpub2 ; /* After 2nd assignment */
:
printf ("d2.b == %d\n", dpub2.b); /* After original printf */
and you'll get the output:
d.b == -2; d.d == 7
d2.b == 2
My guess is that you have an error in some sort of short optimization for
derived classes without data members, but that is ONLY A GUESS (don't try
to figure out what you need a derived class without data members for,
I DO NEED ONE, but that hasn't got anything to do with the actual problem).
I hope this presentation keeps up with the short and precise style, that
you appreciate. I've been constructing compilers myself, so I like pursuing
the problems and if it can help you the incitement is even greater (that's
the least thing I can do, since I don't pay you).
Concerning the implementation of a "-Wref" option, you mention:
> ... Adding any kind of feature,
> be it a language extension or a flag, has the cost that it cannot
> easily be removed. I want to keep the compiler's growth rate down to
> some finite amount.
You've got an important point there, which I din't think of in the first
place (though I should have, since I've been maintaining software myself).
Thanks for the "do it yourself tip", maybe I'll try it someday, now that I
know where to start.
Thanks also for the reference on the C++ Report, I'll try and get one.
You write that my statement about improved code with the "-fforce-mem"
option is vaccuous, you're right. I've been too hasty and inaccurate when
reporting my observations when using this option, let me clarify:
1/ Computed temporary addresses were kept in registers instead
of being hidden within complex addressing modes, which is faster
when the addresses are used several times.
2/ If such addresses were used only once they didn't get a register,
but got hidden within complex addressing modes, which is the optimal
in this case.
3/ Many tests on memory operands were removed, since their loading into
registers implicitly sets the status codes on a VAX.
These three points were observed in code generated for manipulations of
linked lists, which I think is some of the most common code of all. As an
example let me mention the construct:
if (!p->next)
/* use next pointer */
else
/* what a pity ... */
Normally the "next" field would be tested in store with an offset reference
from "p" followed by a zero-branch to the else part, after which the "next"
pointer would be loaded into a register to be used as offset. With the
"-fforce-mem" option in effect the "next" field was loaded immediately and
the test therefore made superfluous. I know it is indeed a very small
optimization, but the construct occurs many places (in arithmetic ifs as
well as hidden logical && constructs), many times within loops making them
more significant. In general I only observed advantages, no disadvantages.
Concerning the VAX, it seems that you're a bit misinformed (no offense
intended) when you write:
> There are a number of problems with the VAX. The general call
> instruction is very expensive (= very slow). In fact, I would not be
> surprised if the microcode did not take the parameters from the memory
> argument list and shuttle them to the stack before making the call.
> Also, you have to know how to return from callg. Currently, the
> compiler always expects to be returning from a calls.
Since you've hit an area which I happens to know a lot about (believe
me, VAX assembler has been my job for three years) I'll try to explain
why I'm almost certain that you'll be able to improve performance, when
virtual functions are called (you are still advertising for ideas in the
manual).
First of all CALLG is not slower than CALLS, it is faster; I've just tested
it by running a test loop 1.000.000 times (corrected for the rest of the
instructions in the loop) calling an empty procedure, saving no registers
and having no arguments, which should give the pure call overhead. With
CALLG it takes approximately 8-9% less time than with CALLS to execute the
procedure. Everytime a register has to be saved+restored, this adds an
amount of time corresponding to the initial difference between the two;
pushed+popped arguments costs the same for CALLS.
The initial difference is easily explained, since CALLS has to push the
number of arguments to complete the argument list on the stack. Notice
that the procedure body does not differ for a CALLS or CALLG call, the RET
instruction and the data in the stack frame takes care of that (quite
actually both the stack and argument pointer may be corrupt, only the
frame pointer is used).
CALLG **doesn't** push the referenced argument list onto the stack (it
is easily verified with a debugger) and why the hell should it, it doesn't
even have to push the number of arguments. CALLG just loads the argument
pointer with the address of the given argument list after creating the
common stack frame. Originally this was meant for use in FORTRAN, where
the argument lists could be laid out in storage by the compiler, but it
fails to be reentrant so CALLS is needed for languages with recursive
procedures.
The CALLG technique, however, applies just as well in the case of chained
(virtual) function calls where a suitable argument list and count is
already on the stack. The latter is also reetrant provided that you don't
modify the argument list **itself** within the procedure, which is the
greatest sin of all.
The main problem of course is to incorporate this information in a general
and simple way into your compiler, which cannot handle it as it is now,
I know that perfectly well. Your goal is of course generality for all kinds
of machines, so I don't expect you to deal with this problem for a single
type of machine. The virtual function problem is interesting and I'd like
to contribute with ideas myself (working for the government, so I haven't
got any money to give you), but your advertising in the manual doesn't
describe the problem clearly, nor does it describe your normal member
calling standard (I assume that you haven't got the time to write about
it, but I can eat any shorthand private notation raw, if you have one).
Finally I'll just remind you to be aware that many performance statements
about the VAX aren't up to date anymore. Some tests I've seen (especially
during the RISC vs. CISC debate) was dated back in the old 11/7XX days
when there were still bugs and unoptimized passages in the microcode,
which has been changed through the years (yes! the old models had a
writeable microcode store, loaded from a console diskette at boottime).
I've deverified many of them myself by running them on our upgraded 11/780
and our new MicroVAX'es and Workstations.
Make your own tests and be careful, especially about what DEC and their
competitors say (don't believe in people who'll make money on it).
I have a tendency to become very talkative also in writing, please let me
know if I'm poking into too many problems that isn't worth it. I just try
to share the knowledge I have.
It's late and i start seeing pictures of beers on the screen,
have a nice day
Jesper Jorgensen (known as 'JJ the famous')
Research associate
Department of Computer Science
Technical University of Denmark
DK-2800 Lyngby
DENMARK