[gnu.g++.bug] Derived class bug IS STILL THERE

jj@idris.id.dk (Jesper Joergensen [ris]) (11/30/89)
PREFACE:

    I've already mailed this letter on Nov 27 1989 as a reply to a message from
    tiemann@arkesden.Eng.Sun.COM, but due to troubles with our local networks
    I'm not sure whether it has reached you at all. To be sure, I send it again
    this time through "the official channels".

ATTN: Michael Tiemann

    Sorry to have to bother you again, but my problem is still there. You
    mailed me a sample program, which you've probably forgotten everything
    about, so I include it below:

    ======= SAMPLE PROGRAM START =======
    class Base {
    public:
      int b ;
      Base() { b = 2 ; }
      Base(Base& x) { b = x.b ; }
      const Base operator=(const Base& x)
	{ b = - x.b ;
	  return *this ; }
    } ;

    class Dpub : public Base {
    public:
      int d;
      Dpub() { d = 7; }
      Dpub(Base& x) : (x) { d = 1001; }
    } ;
    extern "C" void printf (char *, ...);
    int main()
    {
      Base b ;
      Dpub dpub ;

      b = b ;
      dpub = dpub ;
      b = dpub ;

      printf ("d.b == %d; d.d == %d\n", dpub.b, dpub.d);
      return 0 ;
    }
    ======== SAMPLE PROGRAM END ========

    The program worked as you predicted (even under version 1.35.0), so I first
    thought --- polite as I am --- that I'd made a mistake. However, after
    doublechecking everything I found that the problem was still present in
    my original program, so I tried to figure out what the differences between
    the two cases were. I FOUND A GENERAL DIFFERENCE: you have data members in
    the derived class I HAVEN'T, and if i removed the data member d from the
    derived class in your program the error reappeared both in the output
    (dpub.b is 2, not -2 after assignment) and the generated code. NOTICE that
    this is invariant of whether you use "-O" or not. Just include the class:

      class Dpub2 : public Base {
      public:
        Dpub2() { }
        Dpub2(Base& x) : (x) { }
      } ;

    and the extra statements:

      Dpub2 dpub2 ;			/* In declarartions */
       :
      dpub2 = dpub2 ;			/* After 2nd assignment */
       :
      printf ("d2.b == %d\n", dpub2.b);	/* After original printf */

    and you'll get the output:

      d.b == -2; d.d == 7
      d2.b == 2

    My guess is that you have an error in some sort of short optimization for
    derived classes without data members, but that is ONLY A GUESS (don't try
    to figure out what you need a derived class without data members for,
    I DO NEED ONE, but that hasn't got anything to do with the actual problem).

    I hope this presentation keeps up with the short and precise style, that
    you appreciate. I've been constructing compilers myself, so I like pursuing
    the problems and if it can help you the incitement is even greater (that's
    the least thing I can do, since I don't pay you).


    Concerning the implementation of a "-Wref" option, you mention:

      > ... Adding any kind of feature,
      > be it a language extension or a flag, has the cost that it cannot
      > easily be removed.  I want to keep the compiler's growth rate down to
      > some finite amount.

    You've got an important point there, which I din't think of in the first
    place (though I should have, since I've been maintaining software myself).
    Thanks for the "do it yourself tip", maybe I'll try it someday, now that I
    know where to start.


    Thanks also for the reference on the C++ Report, I'll try and get one.


    You write that my statement about improved code with the "-fforce-mem"
    option is vaccuous, you're right. I've been too hasty and inaccurate when
    reporting my observations when using this option, let me clarify:
	1/ Computed temporary addresses were kept in registers instead
	   of being hidden within complex addressing modes, which is faster
           when the addresses are used several times.
        2/ If such addresses were used only once they didn't get a register,
           but got hidden within complex addressing modes, which is the optimal
           in this case.
	3/ Many tests on memory operands were removed, since their loading into
           registers implicitly sets the status codes on a VAX.
    These three points were observed in code generated for manipulations of
    linked lists, which I think is some of the most common code of all. As an
    example let me mention the construct:
	if (!p->next)
	  /* use next pointer */
	else
          /* what a pity ... */
    Normally the "next" field would be tested in store with an offset reference
    from "p" followed by a zero-branch to the else part, after which the "next"
    pointer would be loaded into a register to be used as offset. With the
    "-fforce-mem" option in effect the "next" field was loaded immediately and
    the test therefore made superfluous. I know it is indeed a very small
    optimization, but the construct occurs many places (in arithmetic ifs as
    well as hidden logical && constructs), many times within loops making them
    more significant. In general I only observed advantages, no disadvantages.


    Concerning the VAX, it seems that you're a bit misinformed (no offense
    intended) when you write:

      > There are a number of problems with the VAX.  The general call
      > instruction is very expensive (= very slow).  In fact, I would not be
      > surprised if the microcode did not take the parameters from the memory
      > argument list and shuttle them to the stack before making the call.
      > Also, you have to know how to return from callg.  Currently, the
      > compiler always expects to be returning from a calls.

    Since you've hit an area which I happens to know a lot about (believe
    me, VAX assembler has been my job for three years) I'll try to explain
    why I'm almost certain that you'll be able to improve performance, when
    virtual functions are called (you are still advertising for ideas in the
    manual).

    First of all CALLG is not slower than CALLS, it is faster; I've just tested
    it by running a test loop 1.000.000 times (corrected for the rest of the
    instructions in the loop) calling an empty procedure, saving no registers
    and having no arguments, which should give the pure call overhead. With
    CALLG it takes approximately 8-9% less time than with CALLS to execute the
    procedure. Everytime a register has to be saved+restored, this adds an
    amount of time corresponding to the initial difference between the two;
    pushed+popped arguments costs the same for CALLS.

    The initial difference is easily explained, since CALLS has to push the
    number of arguments to complete the argument list on the stack. Notice
    that the procedure body does not differ for a CALLS or CALLG call, the RET
    instruction and the data in the stack frame takes care of that (quite
    actually both the stack and argument pointer may be corrupt, only the
    frame pointer is used).

    CALLG  **doesn't**  push the referenced argument list onto the stack (it
    is easily verified with a debugger) and why the hell should it, it doesn't
    even have to push the number of arguments. CALLG just loads the argument
    pointer with the address of the given argument list after creating the
    common stack frame. Originally this was meant for use in FORTRAN, where
    the argument lists could be laid out in storage by the compiler, but it
    fails to be reentrant so CALLS is needed for languages with recursive
    procedures.

    The CALLG technique, however, applies just as well in the case of chained
    (virtual) function calls where a suitable argument list and count is
    already on the stack. The latter is also reetrant provided that you don't
    modify the argument list **itself** within the procedure, which is the
    greatest sin of all.

    The main problem of course is to incorporate this information in a general
    and simple way into your compiler, which cannot handle it as it is now,
    I know that perfectly well. Your goal is of course generality for all kinds
    of machines, so I don't expect you to deal with this problem for a single
    type of machine. The virtual function problem is interesting and I'd like
    to contribute with ideas myself (working for the government, so I haven't
    got any money to give you), but your advertising in the manual doesn't
    describe the problem clearly, nor does it describe your normal member
    calling standard (I assume that you haven't got the time to write about
    it, but I can eat any shorthand private notation raw, if you have one).


    Finally I'll just remind you to be aware that many performance statements
    about the VAX aren't up to date anymore. Some tests I've seen (especially
    during the RISC vs. CISC debate) was dated back in the old 11/7XX days
    when there were still bugs and unoptimized passages in the microcode,
    which has been changed through the years (yes! the old models had a
    writeable microcode store, loaded from a console diskette at boottime).
    I've deverified many of them myself by running them on our upgraded 11/780
    and our new MicroVAX'es and Workstations.

    Make your own tests and be careful, especially about what DEC and their
    competitors say (don't believe in people who'll make money on it).


    I have a tendency to become very talkative also in writing, please let me
    know if I'm poking into too many problems that isn't worth it. I just try
    to share the knowledge I have.


	It's late and i start seeing pictures of beers on the screen,
        have a nice day

	Jesper Jorgensen (known as 'JJ the famous')

	Research associate
	Department of Computer Science
	Technical University of Denmark
	DK-2800 Lyngby
	DENMARK