timh@.mentorg.com (Tim Harvey @ APD x1381) (01/12/91)
Peter Shirley posted the article (# 10640) "why is this program slow?" I found the topic he brought up (C++ versus C performance) and the discussion that followed useful to some work I'm doing and I wrote up the following summary. I'm posting it for the benefit of any one else who might be interested this topic. Of course, this summary isn't the "final word", so I hope people will post corrections, clarifications, and comments. Credit is due to those people upon whose contributions this summary is based: Peter Shirley, Griff Smith, Steve Simmons, Stephen Clamage, Steve Madere, Paul Vaughan, and Bruce Hoult. THE ISSUE --------- C++ is often charged with being an inherently slower language than C. A simple case was developed where a C++ program took almost twice as long (about 170%) to run as a similar C program. It illustrates one situation where C++ appears slower than C. The example involves nearly "equivalent" matrix addition C and C++ programs. Matrix B is repeatly added to matrix A in a FOR loop. EXAMPLE ------- /******************************************/ /* C Example */ /******************************************/ #include <stdio.h> main() { unsigned long i ; double a[ 3 ], b[ 3 ] ; a[ 0 ] = a[ 1 ] = a[ 2 ] = 0.0 ; b[ 0 ] = 1.0 ; b[ 1 ] = 2.0 ; b[ 2 ] = 3.0 ; for ( i = 0 ; i < 1000000 ; i++ ) { a[ 0 ] = a[ 0 ] + b[ 0 ] ; a[ 1 ] = a[ 1 ] + b[ 1 ] ; a[ 2 ] = a[ 2 ] + b[ 2 ] ; } fprintf( stderr, "%lf %lf %lf\n", a[ 0 ], a[ 1 ], a[ 2 ] ) ; } /******************************************/ /* C++ Example */ /******************************************/ #include <math.h> #include <stream.h> class Vector { friend ostream& operator<<( ostream&, Vector& ) ; friend Vector operator+ ( Vector&, Vector& ) ; protected: double data[ 3 ] ; public: Vector() ; Vector( double, double, double ) ; void operator=( Vector& ) ; }; inline Vector::Vector() { } inline Vector::Vector( double a, double b, double c ) { data[ 0 ] = a ; data[ 1 ] = b ; data[ 2 ] = c ; } inline void Vector::operator=( Vector& v ) { data[ 0 ] = v.data[ 0 ] ; data[ 1 ] = v.data[ 1 ] ; data[ 2 ] = v.data[ 2 ] ; } inline Vector& operator+( Vector& u, Vector& v ) { static Vector temp ; temp.data[ 0 ] = u.data[ 0 ] + v.data[ 0 ] ; temp.data[ 1 ] = u.data[ 1 ] + v.data[ 1 ] ; temp.data[ 2 ] = u.data[ 2 ] + v.data[ 2 ] ; return temp ; } ostream& operator<<( ostream& s, vector& t ) { return s << form( "%f %f %f", t.data[ 0 ], t.data[ 1 ], t.data[ 2 ] ) ; } main() { Vector a( 0.0, 0.0, 0.0 ) ; Vector b( 1.0, 2.0, 3.0 ) ; for ( int i = 0 ; i < 1000000 ; i++ ) { a = a + b ; } cout << a << endl ; } REASON FOR SLOWER PERFORMANCE ----------------------------- The C program is written in terms of basic operations that the compiler can easily optimize while the C++ version isn't. The C++ version ends up having to construct and copy a temporary class Vector object. This extra work accounts for the difference in run time. When the form a = a + b ; is used, since `a' is a structure, the `+' operator first has to store the sum in hidden (contained in the emitted cfront C code), temporary matrix storage, then the `=' operator must copy the contents of the temporary storage into `a'. This takes a string move at best, and a slow function call at worst. Another way of looking at it is to make the C program truely equivalent to the C++ version. The C program segment would then look something like the following to do what the C++ version actually does. for ( i = 0 ; i < 1000000 ; i++ ) { temp = calloc( 3, sizeof( *temp ) ) ; temp[ 0 ] = a[ 0 ] + b[ 0 ] ; temp[ 1 ] = a[ 1 ] + b[ 1 ] ; temp[ 2 ] = a[ 2 ] + b[ 2 ] ; a[ 0 ] = temp[ 0 ] ; a[ 1 ] = temp[ 1 ] ; a[ 2 ] = temp[ 2 ] ; free( temp ) ; } DISCUSSION ---------- The problem is that cfront isn't smart enough to construct the vector sum directly in the target variable 'a' instead of creating an intermediary temporary object. Given the two addition/assignment forms 1. a[ 0 ] = a[ 0 ] + b[ 0 ] ; 2. a[ 0 ] += b[ 0 ] ; modern compilers should identify the two cases as equivalent and generate the most efficient code, but cfront is not so fortunate. It treats both forms as different. In the first case, cfront will generates code that copies the result of the '+' operator twice on the way to its final destination. The first time a bitwise copy (the default copy constructor) is used, and the second time the defined '=' operator is used. The problem goes beyond cfront not being able to recognize equivalent forms. The sort of C code cfront generates can actually interfer with the downstream C compiler's ability of perform optimization. For example, something like int a = 0 ; for ( int i = 0 ; i < 1000000 ; i++ ) { a = a + 1 ; } when written in C is easily optimized by most C compilers to: a = 1000000 ; But the same operation can be written in C++ so that C code generated by cfront will "hide" the optimization opportunity and leave the downstream C compiler with no other choice but to generate unoptimized literal code for a lengthy loop construction. Eventually, the code optimization restrictions and problems of cfront will be resolved by one or more of the following: o cfront is modified to generate optimized C code, o cfront is changed to generate C code in a form that can be easily optimized by existing C compilers, o C compilers are developed that are better tuned to the the C code that cfront presently generates. o reliable, fully optimized C++ native code comilers are developed. WORKAROUND ---------- One way to work around the problem is to define a '+=' operator for the Vector class. There are many ways to do this. Perhaps the simplest is to define and use the '+=' operator as a member in the following way: class Vector { ... public: ... void operator+=( Vector& ) ; }; void Vector::operator+=( Vector& v ) { data[ 0 ] += v.data[ 0 ] ; data[ 1 ] += v.data[ 1 ] ; data[ 2 ] += v.data[ 2 ] ; } main() { ... for ( int i = 0 ; i < 1000000 ; i++ ) { a += b ; ... }
keffert@jacobs.CS.ORST.EDU (Thomas Keffer) (01/13/91)
In article <1991Jan12.040453.6887@mentor.com> timh@.mentorg.com (Tim Harvey @ APD x1381) writes: >Peter Shirley posted the article (# 10640) "why is this program slow?" > >C++ is often charged with being an inherently slower language than C. > >A simple case was developed where a C++ program took almost twice as >long (about 170%) to run as a similar C program. It illustrates one >situation where C++ appears slower than C. Now hold on! We have done extensive testing of our C++ Math.h++ class library versus both C and Fortran and found only a very slight penalty that can very easily be recovered by using the more sophisticated algorithms made possible by the encapsulation, etc. properties of C++. Here's some hard data. The times shown are the times required to multiply two matrices together of the given size. The machine is a Sun 3/260 with a 68881 coprocessor. N**2 g++ -m68881 f77 -f68881 25 4.7s 3.9 100 16.4 15.5 400 64.2 62.1 You might want to implement a special "small matrix" class if you are going to be working with very small N (2, 3 or 4) matrices (e.g., graphics transformation matricies). But for larger N (5 and up), the time differences are completely trivial. The trick is careful attention to detail: use reference counting, minimize memory allocations, maintain a pool of the small (reference) structures, etc. -tk Disclaimer: I work for a company that vends a C++ math class. ___ Thomas Keffer Rogue Wave PO Box 2328 Corvallis, OR 97330 (503) 745-5908
mat@mole-end.UUCP (Mark A Terribile) (01/14/91)
> ... Of course, this summary isn't the "final word", so I hope people will > post corrections, clarifications, and comments. > THE ISSUE > --------- > A simple case was developed where a C++ program took almost twice as > long (about 170%) to run as a similar C program. It illustrates one > situation where C++ appears slower than C. > REASON FOR SLOWER PERFORMANCE > ----------------------------- > The C program is written in terms of basic operations that the compiler > can easily optimize while the C++ version isn't. The C++ version ends up > having to construct and copy a temporary class Vector object. This extra > work accounts for the difference in run time. > DISCUSSION > ---------- > The problem is that cfront isn't smart enough to construct the vector sum > directly in the target variable 'a' instead of creating an intermediary > temporary object. Well, I'm not so sure of that. There is a guarantee that the temporary objects WILL be created so that the side effects of constructors will occur-- EXCEPT in a very few circumstances involving a copy constructor used as the second constructor in an initialization. The problem, as coded, explicitly excludes any of these circumstances. (It's not that difficult to change so that one of the circumstances occurs, as we have seen.) In theory, a `sufficiently smart' compiler could determine that there are no side effects and computation can proceed without creating the temporary, and since the function is expanded inline, this would be one of the easier cases for the compiler to detect. (Without the inlining, the optimization and code generation would have to take place AFTER symbol resolution, that is, in the middle of the linking process.) > 1. a[ 0 ] = a[ 0 ] + b[ 0 ] ; > 2. a[ 0 ] += b[ 0 ] ; > > modern compilers should identify the two cases as equivalent and generate the > most efficient code, but cfront is not so fortunate. ... Modern compilers which have semantics for the type built-in, yes. For C++, no. Perhaps it will one day be possible to specify more about the operator semantics, but for a language like C++, I suspect that awaits some theory of programming language semantics. > The problem goes beyond cfront not being able to recognize equivalent forms. > The sort of C code cfront generates can actually interfer with the > downstream C compiler's ability of perform optimization. ... C++ asks more of a compiler, and for the near future, that will mean that certain things that can be optimized in C-like code cannot be optimized when they cross the boundaries provided by C++'s ways of organizing programs. If the low-level performance of a type is important, then the type must be designed and written carefully. It's true that we've come to rely a lot on optimizing compilers and less on how we code, but we can't rely on compilers to optimize across separate compilation boundaries, nor to replace weak expensive algorithms with good ones (even if code hoisting and strength reduction can sometimes have that effect). We have to seek to reduce wasted motion in more-or-less machine independent ways. Such ways exist (e.g. the value-building constructor in the return statement). > Eventually, the code optimization restrictions and problems of cfront > will be resolved by one or more of the following: I would say rather `ameliorated,' since it's not clear to me that a compiler, given seperate code generation, can analyze away all the extra motion that C++'s model seems (in the absence of optimization) to imply. > o cfront is modified to generate optimized C code, > > o cfront is changed to generate C code in a form that can be > easily optimized by existing C compilers, > > o C compilers are developed that are better tuned to the the > C code that cfront presently generates. > > o reliable, fully optimized C++ native code comilers are > developed. I believe that another step is needed: code generation and optimization AFTER symbol resolution. Unfortunately, it will be years before we see this, even if one language or machine vendor proves conclusively its worth. > WORKAROUND > ---------- > > One way to work around the problem is to define a '+=' operator for the > Vector class. ... Another is to reduce by at least half the wasted motion in the operator+() , by using the constructor value builder. With inlines, this may clue the compiler that all the temporaries can be eliminated. Mark T. !FunkyStuff -- (This man's opinions are his own.) From mole-end Mark Terribile
don@usna.navy.mil (Mr. Donald W. Garner (CADIG STAFF) <don@usna.navy.mil>) (01/17/91)
... it was noted sometime ago that the use of stream.h in a C++ program can make it larger and slower than a C program that performs the same functions without streams ...
jimad@microsoft.UUCP (Jim ADCOCK) (01/17/91)
In article <470@mole-end.UUCP> mat@mole-end.UUCP (Mark A Terribile) writes: |> The problem is that cfront isn't smart enough to construct the vector sum |> directly in the target variable 'a' instead of creating an intermediary |> temporary object. | |Well, I'm not so sure of that. There is a guarantee that the temporary |objects WILL be created so that the side effects of constructors will occur-- |EXCEPT in a very few circumstances involving a copy constructor used as the |second constructor in an initialization. Ack! I disagree that any such guarantee exists -- I claim that contrar-wise ARM goes out of its way to state that any and all such uses of temporaries is implementation dependent, and that the compiler does indeed have latitude to generate or not generate temporaries as it so chooses. The commentary on Page 299 of ARM states: "The fundamental rule is that the introduction of a temporary object and the calls of its constructor/destructor pair may be eliminated if the only way the user can detect its elimination or introduction is by observing side effects generated by the constructor or destructor calls...." This "fundamental rule" unfortunately, does nothing to clarify where a compiler _ought_ to be generating a temporary, that it then might possibly be "optimized away" using the above "fundamental rule." One thing [about the only thing] you can be sure of is that if you initialize a reference with a temporary, the temporary exists for the duration of the scope it is created in [but I can't find the pertinent chapter and verse in ARM today] For a really interesting and related issue, see the Dec. "Journal of C Language Translation" regards the issue of whether [C] functions must return values by copying. Clearly, in C++ return-by-reference requires that answers be returned without copying -- but, where is it stated that non-return-by-reference functions must return their "value" by making a copy of that value? ....Kind of makes me wonder how, if at all, a C++ programmer can certainly force a copy [other than introducing a named variable to represent that copy, and is even that guaranteed to work?] and how in general C++ programmers are suppose to control object identity -- when they're doing OOP, that is. ???
jimad@microsoft.UUCP (Jim ADCOCK) (01/26/91)
In article <471@mole-end.UUCP| mat@mole-end.UUCP (Mark A Terribile) writes: |In article <70067@microsoft.UUCP>, jimad@microsoft.UUCP (Jim ADCOCK) writes: |> |... There is a guarantee that the temporary objects WILL be created so that |> |the side effects of constructors will occur--EXCEPT in a very few circum- |> |tances involving a copy constructor ... the second ... in an initialization. | |> Ack! I disagree that any such guarantee exists -- I claim that contrar-wise |> ARM goes out of its way to state that any and all such uses of temporaries is |> implementation dependent, ... | |If you go back to the original problem, you'll see that the `temporaries' |in questions were not compiler-generated, but were written by the programmer. |(Perhaps I should have written `intermediate' instead of `temporary'?) And |there is a place where the compiler states that an EXCPLICITLY WRITTEN |temporary/intermediate may, in effect, be eliminated: | | X f( . . . ) | { | . . . | return X( xarg1, xarg2 ); | } | |Here the compiler, instead of creating a local X to evaluate the value |builder, and then using it to initialize the returned value, may use the |constructor value builder directly to initialize the returned value. Agreed. A named local variable is not a temporary. Temporaries are unnamed objects. See ARM page 22: "A NAMED LOCAL OBJECT may not be destroyed before the end of its block nor may a local named object of a class with a constructor or a destructor with side effects be eliminated even if it appears to be unused." [emphasis mine] Contrarwise, whether an UNNAMED TEMPORARY is actually created, when either "explicitly" or "implicitly" called for by a programmer is strictly implementation dependent. UNNAMED TEMPORARIES can be "optimized away" even if their constructors or destructors have side effects. The basic rule is that if you want to ensure that a copy is being made, you must explicitly give that copy a name. This has some interesting consequences. Consider what the following function does: Foo do_I_make_a_copy(Foo* foo) { return *foo; } ??? Interestingly, if a temporary is given a name by assigning a reference to it, then the rules change and it essentially becomes a named object, and is guaranteed to stay alive for the scope of the reference. So, it appears to be the named'ness of an object that is the determining factor in whether a compiler can optimize it away [and therefor its constructor/destructor side effects] Another unsolved question: Can a compiler optimize-away a named object that *doesn't* have a constructor nor destructor with side effects? This is still an interesting question in dealing with object identity.