jj@idris.id.dk (Jesper Joergensen [ris]) (11/30/89)
PREFACE: I've already mailed this letter on Nov 27 1989 as a reply to a message from tiemann@arkesden.Eng.Sun.COM, but due to troubles with our local networks I'm not sure whether it has reached you at all. To be sure, I send it again this time through "the official channels". ATTN: Michael Tiemann Sorry to have to bother you again, but my problem is still there. You mailed me a sample program, which you've probably forgotten everything about, so I include it below: ======= SAMPLE PROGRAM START ======= class Base { public: int b ; Base() { b = 2 ; } Base(Base& x) { b = x.b ; } const Base operator=(const Base& x) { b = - x.b ; return *this ; } } ; class Dpub : public Base { public: int d; Dpub() { d = 7; } Dpub(Base& x) : (x) { d = 1001; } } ; extern "C" void printf (char *, ...); int main() { Base b ; Dpub dpub ; b = b ; dpub = dpub ; b = dpub ; printf ("d.b == %d; d.d == %d\n", dpub.b, dpub.d); return 0 ; } ======== SAMPLE PROGRAM END ======== The program worked as you predicted (even under version 1.35.0), so I first thought --- polite as I am --- that I'd made a mistake. However, after doublechecking everything I found that the problem was still present in my original program, so I tried to figure out what the differences between the two cases were. I FOUND A GENERAL DIFFERENCE: you have data members in the derived class I HAVEN'T, and if i removed the data member d from the derived class in your program the error reappeared both in the output (dpub.b is 2, not -2 after assignment) and the generated code. NOTICE that this is invariant of whether you use "-O" or not. Just include the class: class Dpub2 : public Base { public: Dpub2() { } Dpub2(Base& x) : (x) { } } ; and the extra statements: Dpub2 dpub2 ; /* In declarartions */ : dpub2 = dpub2 ; /* After 2nd assignment */ : printf ("d2.b == %d\n", dpub2.b); /* After original printf */ and you'll get the output: d.b == -2; d.d == 7 d2.b == 2 My guess is that you have an error in some sort of short optimization for derived classes without data members, but that is ONLY A GUESS (don't try to figure out what you need a derived class without data members for, I DO NEED ONE, but that hasn't got anything to do with the actual problem). I hope this presentation keeps up with the short and precise style, that you appreciate. I've been constructing compilers myself, so I like pursuing the problems and if it can help you the incitement is even greater (that's the least thing I can do, since I don't pay you). Concerning the implementation of a "-Wref" option, you mention: > ... Adding any kind of feature, > be it a language extension or a flag, has the cost that it cannot > easily be removed. I want to keep the compiler's growth rate down to > some finite amount. You've got an important point there, which I din't think of in the first place (though I should have, since I've been maintaining software myself). Thanks for the "do it yourself tip", maybe I'll try it someday, now that I know where to start. Thanks also for the reference on the C++ Report, I'll try and get one. You write that my statement about improved code with the "-fforce-mem" option is vaccuous, you're right. I've been too hasty and inaccurate when reporting my observations when using this option, let me clarify: 1/ Computed temporary addresses were kept in registers instead of being hidden within complex addressing modes, which is faster when the addresses are used several times. 2/ If such addresses were used only once they didn't get a register, but got hidden within complex addressing modes, which is the optimal in this case. 3/ Many tests on memory operands were removed, since their loading into registers implicitly sets the status codes on a VAX. These three points were observed in code generated for manipulations of linked lists, which I think is some of the most common code of all. As an example let me mention the construct: if (!p->next) /* use next pointer */ else /* what a pity ... */ Normally the "next" field would be tested in store with an offset reference from "p" followed by a zero-branch to the else part, after which the "next" pointer would be loaded into a register to be used as offset. With the "-fforce-mem" option in effect the "next" field was loaded immediately and the test therefore made superfluous. I know it is indeed a very small optimization, but the construct occurs many places (in arithmetic ifs as well as hidden logical && constructs), many times within loops making them more significant. In general I only observed advantages, no disadvantages. Concerning the VAX, it seems that you're a bit misinformed (no offense intended) when you write: > There are a number of problems with the VAX. The general call > instruction is very expensive (= very slow). In fact, I would not be > surprised if the microcode did not take the parameters from the memory > argument list and shuttle them to the stack before making the call. > Also, you have to know how to return from callg. Currently, the > compiler always expects to be returning from a calls. Since you've hit an area which I happens to know a lot about (believe me, VAX assembler has been my job for three years) I'll try to explain why I'm almost certain that you'll be able to improve performance, when virtual functions are called (you are still advertising for ideas in the manual). First of all CALLG is not slower than CALLS, it is faster; I've just tested it by running a test loop 1.000.000 times (corrected for the rest of the instructions in the loop) calling an empty procedure, saving no registers and having no arguments, which should give the pure call overhead. With CALLG it takes approximately 8-9% less time than with CALLS to execute the procedure. Everytime a register has to be saved+restored, this adds an amount of time corresponding to the initial difference between the two; pushed+popped arguments costs the same for CALLS. The initial difference is easily explained, since CALLS has to push the number of arguments to complete the argument list on the stack. Notice that the procedure body does not differ for a CALLS or CALLG call, the RET instruction and the data in the stack frame takes care of that (quite actually both the stack and argument pointer may be corrupt, only the frame pointer is used). CALLG **doesn't** push the referenced argument list onto the stack (it is easily verified with a debugger) and why the hell should it, it doesn't even have to push the number of arguments. CALLG just loads the argument pointer with the address of the given argument list after creating the common stack frame. Originally this was meant for use in FORTRAN, where the argument lists could be laid out in storage by the compiler, but it fails to be reentrant so CALLS is needed for languages with recursive procedures. The CALLG technique, however, applies just as well in the case of chained (virtual) function calls where a suitable argument list and count is already on the stack. The latter is also reetrant provided that you don't modify the argument list **itself** within the procedure, which is the greatest sin of all. The main problem of course is to incorporate this information in a general and simple way into your compiler, which cannot handle it as it is now, I know that perfectly well. Your goal is of course generality for all kinds of machines, so I don't expect you to deal with this problem for a single type of machine. The virtual function problem is interesting and I'd like to contribute with ideas myself (working for the government, so I haven't got any money to give you), but your advertising in the manual doesn't describe the problem clearly, nor does it describe your normal member calling standard (I assume that you haven't got the time to write about it, but I can eat any shorthand private notation raw, if you have one). Finally I'll just remind you to be aware that many performance statements about the VAX aren't up to date anymore. Some tests I've seen (especially during the RISC vs. CISC debate) was dated back in the old 11/7XX days when there were still bugs and unoptimized passages in the microcode, which has been changed through the years (yes! the old models had a writeable microcode store, loaded from a console diskette at boottime). I've deverified many of them myself by running them on our upgraded 11/780 and our new MicroVAX'es and Workstations. Make your own tests and be careful, especially about what DEC and their competitors say (don't believe in people who'll make money on it). I have a tendency to become very talkative also in writing, please let me know if I'm poking into too many problems that isn't worth it. I just try to share the knowledge I have. It's late and i start seeing pictures of beers on the screen, have a nice day Jesper Jorgensen (known as 'JJ the famous') Research associate Department of Computer Science Technical University of Denmark DK-2800 Lyngby DENMARK