jon@hanauma (Jon Claerbout) (07/26/89)
ABSTRACT: C++ can be customized to look like Fortran in its array subscription, but it has an advantage over Fortran that variable dimensioned arrays can be dynamically allocated. A benchmark shows C++ about half as fast on a DEC 3100. ANALYSIS: fortran C real x(n,m) x = alloc( n*m* sizeof(float)) x(i,j) = etc x[ i + index[j] ] = etc COMMENT: This is a great example for C++ textbook writers. COMPILER: AT&T C++ version 1.2.1 DEMONSTRATION BELOW ------------------------------------------------- /* Benchmark Fortran equivalent (actually ratfor) real total, x(200,200) integer i,j,n n = 200 total = 0. do i=1,n do j=1,n x(i,j) = sqrt(1.+i+j) do i=1,n do j=1,n do k=1,n total = total + x(i,k) * x(k,j) write(6,10) total 10 format(' 1576677724. if double precision '/f15.0,' here in single') call exit(0); end */ //------------------------------------ C++ user program equivalent to Fortran #include <CC/stdio.h> main () { double sqrt(float); int i, j, k, n=200; float total=0.; Matrix x(n,n); for( i=0; i<n; i++ ) for( j=0; j<n; j++ ) x(i,j) = sqrt(1.+(i+1)+(j+1)); for( i=0; i<n; i++ ) for( j=0; j<n; j++ ) for( k=0; k<n; k++ ) total = total + x(i,k) * x(k,j); printf("total=%f\n 1576677724. if double precision\n",total ); } // 7.0sec 1575556480. DEC 3100 (f77 -O3) // 13.1sec 1575559552 DEC 3100 (C++ -O2) // //--------------- C++ header to handle arrays like fortran. class Matrix { // by dave nichols private: int *index; float *data; public: Matrix( int, int); inline float& operator()( int i, int j ) { return ( data[i + index[j] ] ); } ~Matrix(); }; Matrix::Matrix( int n1, int n2) { index = new int[n2]; for( int i =0; i<n2; i++ ) { index[i] = i*n1; } data = new float[n1*n2]; } Matrix::~Matrix() { delete index; delete data; }
grunwald@flute.cs.uiuc.edu (Dirk Grunwald) (07/26/89)
I modified your example slightly and tried it on a Sun-3/60 w/68881 running SunOS 4.0.3, using g++ 1.35.1+ I got the following results: Floating Point Compiled by: Time -------------------------------------------------------------------------- real f77 -O 84.7 real f77 -O4 85.4 double precision f77 -O4 88.1 float g++ -O -fstrength-reduce 85.5 double g++ -O -fstrength-reduce 95.9 double, global vars g++ -O -fstrength-reduce 95.3 double, no dope vector g++ -O -fstrength-reduce 97.3 Conclusions: + I can't wait to get a DEC-3100 + G++ isn't so bad for numeric codes + G++/Gcc needs better optimization for `doubles' + G++ needs to improve optimization for induction variables that are class member functions [posted to gnu.g++.bug] Changes to the code & why: + g++ doesn't grok inlines if they're in the class defn, so I pulled it out + G++ has dynamic local vectors, so you don't need the mallocs, but you do need a macro [curse the language]. I think this saves MUCHO time. You could pull a similar trick using ``alloca'' on a vanilla-flavoured C++ backend. You could also make the variables be global, since FORTRAN does that (no saving in of itself) + used inline math library (<math.h> includes <math-68881.h>) //--------------- C++ header to handle arrays like fortran. // by dave nichols, hack by d. grunwald class Matrix { private: int *index; double *data; public: Matrix( int n1, int n2, int *indx, double *dat); double& operator()( int i, int j ); }; static inline double& Matrix::operator()( int i, int j ) { return ( data[i + index[j] ] ); } static inline Matrix::Matrix( int n1, int n2, int *indx, double *dat) { index = indx; data = dat; for( int i =0; i<n2; i++ ) { index[i] = i*n1; } } #include <stdio.h> #include <math.h> #include <generic.h> #define MATRIX(name,n1,n2)\ int name2(name,index)[n1];\ double name2(name,data)[n1*n2]; \ Matrix name(n1, n2, name2(name,index), name2(name,data)); main () { int i, j, k; double total=0.; const int n=200; MATRIX(x,n,n); for( i=0; i<n; i++ ) for( j=0; j<n; j++ ) x(i,j) = sqrt(1.+(i+1)+(j+1)); for( i=0; i<n; i++ ) for( j=0; j<n; j++ ) for( k=0; k<n; k++ ) total = total + x(i,k) * x(k,j); printf("total=%f\n 1576677724. if double precision\n", (double) total ); } -- Dirk Grunwald -- Univ. of Illinois (grunwald@flute.cs.uiuc.edu)
paul@gill.UUCP (Paul Nordstrom) (07/29/89)
In article <3967@portia.Stanford.EDU>, jon@hanauma (Jon Claerbout) writes: > > ABSTRACT: C++ can be customized to look like Fortran in its > array subscription, but it has an advantage > over Fortran that variable dimensioned arrays > can be dynamically allocated. > A benchmark shows C++ about half as fast on a DEC 3100. > > ANALYSIS: fortran C > real x(n,m) x = alloc( n*m* sizeof(float)) > x(i,j) = etc x[ i + index[j] ] = etc > > COMMENT: This is a great example for C++ textbook writers. > > COMPILER: AT&T C++ version 1.2.1 > > DEMONSTRATION BELOW ------------------------------------------------- > /* Benchmark Fortran equivalent (actually ratfor) code went here >// 7.0sec 1575556480. DEC 3100 (f77 -O3) >// 13.1sec 1575559552 DEC 3100 (C++ -O2) >// On a sun 3/60 w/ 12 MB and Oregon c++ 1.2Ag /usr/local/paul % time bench total=1576677723.402206 1576677724. if double precision 108.7u 0.3s 1:52 96% 0+12k 5+1io 4pf+0w /usr/local/paul % !! time bench total=1576677723.402206 1576677724. if double precision 108.8u 0.2s 1:50 98% 0+12k 0+0io 0pf+0w One thing bothers me though... my results are so accurate I think that the compiler is using double precision, even though the program specifies float. Havn't found anything relevant in the manual. Obviously that would explain some of the difference in the timing. I don't think that that is the the entire difference, though :-). -- Paul Nordstrom Gill & Co., L.P. uunet!gill!paul
schwartz@shire.cs.psu.edu (Scott Schwartz) (07/30/89)
In article <3967@portia.Stanford.EDU>, jon@hanauma (Jon Claerbout) writes: |ABSTRACT: C++ can be customized to look like Fortran in its | array subscription, but it has an advantage | over Fortran that variable dimensioned arrays | can be dynamically allocated. | A benchmark shows C++ about half as fast on a DEC 3100. | |ANALYSIS: fortran C | real x(n,m) x = alloc( n*m* sizeof(float)) | x(i,j) = etc x[ i + index[j] ] = etc | |COMMENT: This is a great example for C++ textbook writers. | |COMPILER: AT&T C++ version 1.2.1 | |DEMONSTRATION BELOW ------------------------------------------------- |// 7.0sec 1575556480. DEC 3100 (f77 -O3) |// 13.1sec 1575559552 DEC 3100 (C++ -O2) Results for a Sun4/280: # f77 -O4 -R x.r /usr/lib/libm.il time xr 1576677724. if double precision 1575556480. here in single 7.4u 0.1s 0:11 64% 0+432k 0+0io 0pf+0w # g++ -O -fno-defer-pop -fstrength-reduce x.cc -lm time xc total=1575556480.000000 1576677724. if double precision 19.3u 0.1s 0:34 56% 0+208k 0+0io 0pf+0w Using Dirk Grunwald's modifications: time yc total=1576677723.737338 1576677724. if double precision 24.6u 0.1s 0:27 90% 0+360k 0+0io 0pf+0w -- Scott Schwartz <schwartz@shire.cs.psu.edu>
grunwald@flute.cs.uiuc.edu (Dirk Grunwald) (07/30/89)
Is there a math inline library for the SPARC? The SQRT computation is a fairly heafty part of the entire computation. On a Sun-3, the math-inline library helps a good deal (that's what the .il library is doing). From what I saw on the Sun-3, f77 -O4 appears to force the inlines (I didn't specify them, but calls to fsqrtx were generated anyway). It's hard to believe that using stacked arrays would make it *slower* that a version using malloc; did my modified version eliminate the dope vectors? I do notice that it used doubles, not floats. -- Dirk Grunwald -- Univ. of Illinois (grunwald@flute.cs.uiuc.edu)
dl@cmx.npac.syr.edu (Doug Lea) (08/05/89)
There is a thorny problem in attempting to write C++ matrix code (among similar sorts of applications) that can be compiled, via an optimizing compiler like g++, to run as fast as well-optimized Fortran code. Instead of using the example of Dave Nichols & Dirk Grunwald, I'll use a simpler example, that makes this clearer. In fact, the example can be written using only C constructs, even though it is definitely a C++-ish problem. struct imatrix /* a matrix of integers */ { int rows; /* number of rows */ int cols; /* number of columns */ int* data; /* pointer to rows*cols ints from freestore (or wherever) */ }; void fill(struct imatrix* mp, int x) { int i, j; for (i = 0; i < mp->rows; ++i) for (j = 0; j < mp->cols; ++j) *(mp->data + i * mp->cols + j) = x; /* traverse in row-major order */ } [In C++, you'd probably write this as something like class Imatrix { private: int rows; int cols; int* data; public: Imatrix(int r, int c) :rows(r), cols(c), data(new int[r * c]) {} int& operator() (int i, int j) { return *(data + i * cols + j); } void fill(int x) { for (int i = 0; i < rows; ++i) for (int j = 0; j < cols; ++j) (*this)(i, j) = x; } }; but nothing I'll say depends on these differences.] Fill() is an example of a function that cannot be optimized as well in C or C++ as it could in other languages because the compiler cannot safely assume that memory accesses through mp->data do not affect mp->rows or mp->cols. Thus, neither mp->rows nor mp->cols may be cached in a register, and very few other optimizations are possible. This is easy to see by comparing the code generated for fill with that for a similar fill2. void fill2(int rows, int cols, int* data, int x) { int i, j; for (i = 0; i < rows; ++i) for (j = 0; j < cols; ++j) *(data + i * cols + j) = x; } Using g++ (or gcc) (-O -fstrength-reduce), on a 1000 X 1000 imatrix, the first fill is FIFTY TO EIGHT-HUNDRED PERCENT SLOWER than fill2, across a few different machines and versions of g++/gcc. The problem ought to show up most dramatically on RISC machines, where memory access requires significant load delays over register access. Without register caching, every single matrix reference must first retrieve `mp->cols' from memory. Worse, no strength-reducing optimizations are possible that would replace the i * mp->cols multiply by simple address increments each time through the loop, as may be done (and is done, in most recent versions of g++, on most machines) with fill2. Thus, programming this kind of function in an `object-oriented' manner results in an unacceptable performance hit. (Of course, there is no reason to traverse in row-major order in `fill', making it a poor example in this sense, but the principle holds across more realistic situations. The differences are slightly less dramatic in the nichols/grunwald example because the time-consuming `sqrt' calculation masks these effects to some extent on many machines.) This is a very real, and very frustrating problem in g++. It is not really a `problem' in AT&T cfront, since cfront generates C code; thus nearly all optimization is in the hands of the backend C compiler, and fewer such C++ (non-C) language issues impact the resulting code efficiency. Nevertheless, I see the problem as a C++ issue, even though it also occurs in C, because in C, you can always program things along the lines of `fill2', while in C++, you cannot do this if you want to simultaneously take advantages of the benefits of object-oriented programming features. Optimizers cannot pull the kinds of invariants needed to produce good code out of thin air. There is nothing about the imatrix declaration that allows a compiler to safely assume that `data' does not point to the `rows' or `cols' fields. In fact, in the g++/gcc optimizer, even if `data' is of type `double*', as would be the case in floating point matrices, the optimizer still isn't sure that `data' doesn't point to these fields, since a programmer could have applied a coercion (i.e., mp->data = (double*)(mp->rows)). There *may* be one way out of this using existing C++ constructs. If the `rows' and `cols' fields were declared to be `const', struct imatrix { const int rows, const int cols, int* data; }; then the optimizer might be allowed to assume that access through `data' couldn't affect them. However, even this is possibly unsafe, since a programmer could still have legally applied a coercion (i.e. mp->data = (int*)(mp->rows)). I cannot find anything officially stated in the C++ reference manual that would enable a compiler to assume that const fields are never altered via this kind of `side-channel' access. Moreover, even if there were, using const fields in order to guide optimization would preclude ever resizing the matrix. In fact, this is probably the most important kind of drawback of using `const' in such applications. It seems that any other solution requires some kind of minor extension in the C++ language. Here are my best three ideas about this: *** Array References *** I'll start with the simplest, but probably most controversial idea. Suppose there were a way to change C++ so that so that the difference between pointers/references to arrays and pointers/references to single objects were *enforced*. For example if `data' were a pointer or reference to an array of ints, it could not point to (refer to) a single int. Surprisingly, there is a tolerably natural syntax available for this. Consider struct imatrix { int rows, int cols, int[]& data; } Define this to mean that data *MUST* refer to an array. This would cure the kinds of problems mentioned above with fill, since `data' could not possibly refer (point) to `rows' or `cols'. `data' elements can be accessed in the usual way via data[i]. For good reason, this syntax bears strong resemblance to that for `vector_new' (e.g., `new int[100]'). Unfortunately, there is also the need to support the `star' version as well, mainly to support vector_new. `int[]* p' must mean that p points to an array of ints, elementwise dereferenceable via (*p)[i]. This *might* be easier to implement than it first seems. I see only a few major changes that would be required minimally: 1) Any <type>[]& may be SILENTLY, AUTOMATICALLY coerced into a <type>*, without `wasting' a coercion (Recall the C++ only-one coercion rule). 2) The opposite coercion <type>* to <type>[]& is NEVER performed, except by manual intervention. An optimizing compiler is allowed to ignore any aliasing to non-array objects created by such coercions. 3) If `a' is declared as <type> a[<count>], the type of a mention of `a' is <type>[]&. (When coupled with (1), this change is transparent to older existing C & C++ code). 4) `vector_new' must be retyped to return <type>[] *. and `vector_delete' should take a <type>[]*. Note that `int[]* a' is VERY different from `int* a[]' (== `int** a') and that `int a[]' still means the same as `int* a'. The only bit of strangeness to this proposal is that `int[]& a' and `int[]* p' serve only as idioms. Just using `int[] a' isn't meaningful. Note also that, depending on details that ought to be hammered out, these rules seem to allow for reference assignments, unlike rules for regular references. I'm not sure whether this is good or bad. The construct might also be very useful for function arguments. In void f(int[]& d, int& x); the compiler knows that x isn't aliased by d (although x *could* still be an element of d). More importantly, f will only take arrays, not pointers to single objects, as arguments: void usef() { int a; int[]& b = *(new int[100]); int c[100]; int* d = new int; int x; f(&a, x); // ERROR f(b, x); // OK f(c, x); // OK f(d, x); // ERROR b = c; // what do you think? } If programmers used this construct consistently whenever using arrays, they might be spared a great many pointer errors. Perhaps its greatest attraction is that the construct significantly improves the type safety of C++. I think that this idea contains a lot of promise. However, it sidesteps the basic aliasing issue. The next idea has the opposite features. *** Register struct/class fields *** According to ANSI and the C++ reference manual, if an object is declared as `register', there are two consequences: the compiler is encouraged to allocate a register for it, and, even if not, its address may not be taken. Thus a register variable or argument may never have an alias. With a good register-allocating compiler like g++/gcc, the storage class specification aspect of `register' is almost archaic. Given this, `register' is only useful in its second meaning, as a *type qualifier*, in the same league as `const' and `volatile'. This is in fact what I propose: That `register' be treated as a type qualifier with respect to struct/class components, as in struct imatrix { register int rows; register int cols; int* data; }; So far as I can see, this usage of `register' does not create any new C++ parsing difficulties. Again, encouragement to place such fields in registers is only indirectly related to the use of `register' in this context. The meaning of interest would be that For any struct imatrix imat, constructs of the form int* rp = &imat.rows int& rr = imat.rows; would be illegal. Constructs like imatrix* mp = &imat would still be OK (unless, of course, imat itself were declared as register). In other words, register fields may not have aliases, although the structs they are a part of may be aliased. Thus, imat's rows field could be changed only by direct modification (imat.rows = x). An imatrix* mp could have its rows changed either manually or indirectly via any f(imatrix*) function. (I take it as irrelevant to the definition of such fields as registers that `imat.rows = x' implicitly requires an offset address to be computed inside the compiler.) There is a very unfortunate snag: the cast int* p = (int*)(&imat); would still enable side-channel modification (as in *p = 1234 causing imat.rows to be 1234). There seem to be two routes in dealing with this. A `weak' interpretation would disable optimization if any potential anything* alias for m existed and were destructively accessed. Sadly, the `weak' interpretation does not help out at all, since, for example, fill's mp->data could be an alias for mp. A `strong' interpretation of `register' would be to assume that no side-channel access occurs, treating any mp->rows as potentially volatile only when mp (or a possible imatrix* alias) modifies rows or is possibly changed via a f(imatrix*). It appears that a compiler operating under such an interpretation would generate essentially identical code for both fill and fill2. This means that, in order to be useful, the `strong' interpretation would be required, even though its assumptions cannot be enforced. I believe that it should be adopted anyway. In fact, I think that an optimizer should be able to assume lack of side channel access in all three of the cases discussed: * when a field is marked as `register' * when a field is marked as `const' * when there is a pointer type mismatch (e.g., double* data vs int rows) For the same reason that C++ compilers emit warnings when a const is coerced to a non-const, it would be nice to have a warning if a pointer to a struct containing register fields were coerced into something else. There are several other related niceties that could be introduced to make this a stronger proposal. Entities other than struct components could similarly profit from the `register-as-type-qualifier' notion. Note that this use of `register' is NOT the same as the proposed ANSI `noalias' qualifier: `register' means `cannot have an alias', while `noalias' means `does not have an alias'. This difference is actually quite substantial: Noalias can be applied to pointers to indicate uniqueness. For example, void f(noalias struct imatrix* a, noalias struct imatrix* b); means that *a and *b are guaranteed distinct. This condition is extremely difficult to enforce, and, in any case applies to a whole different set of problems (chiefly, parallelization issues). No such effect seems inherent in any use of `register' as described here. *** Compile-time directives *** It is possible, although incredibly tedious, to use some kind of #pragmas or asserts that, if recognized by the compiler, would provide the same kind information. For example: void fill(struct imatrix* mp, int x) { assert(!((mp->data >= mp && mp->data < (char*)mp + sizeof(imatrix)) || (mp->data + mp->rows * mp->cols >= mp && mp->data + mp->rows * mp->cols < (char*)mp + sizeof(imatrix)))) /* i.e., no requested offset of data == rows or cols */ int i, j; for (i = 0; i < mp->rows; ++i) for (j = 0; j < mp->cols; ++j) *(mp->data + i * mp->cols + j) = x; } Pretty ugly, huh? ---- If anyone has any other ideas or comments on this, I'd love to hear them! This issue is among the chief reasons that the libg++ matrix library has not yet been released. I refuse to distribute a package that is guaranteed to convince users that C++ is not a serious candidate for fast Matrix computations! Doug Lea, Computer Science Dept., SUNY Oswego, Oswego, NY, 13126 (315)341-2367 email: dl@oswego.edu or dl%oswego.edu@nisc.nyser.net UUCP :...cornell!devvax!oswego!dl or ...rutgers!sunybcs!oswego!dl