daveb@gonzo.UUCP (Dave Brower) (04/30/89)
Looking into this, I discovered at least one counter-intuitive result, so this might be worth reading even if you've grown tired of the thread. In article <1266@l.cc.purdue.edu> cik@l.cc.purdue.edu (Herman Rubin) writes: >Now suppose I am doing some serious array operations, and I have to know >whether one array buffer is longer than another. The elements are of type >long. Do I have to do this multiplying and dividing by 4 all the time? >Another example of "user-friendly" which turns out to be "user-inimical." Many people have jumped on this, but it does point to one of the fundamental C mind sets, which turns out to have some exceptions: "For heavy array operations, it is usually faster to use pointers than array indexing." - me. That is because the scaling operation is only done once, when the pointer is incremented, rather than at each reference to the array[index]. Specifically, where a and b are arrays of non-char types: for( ap = a, bp = b; i--; ) /* even I will use a comma op here! */ *ap++ = *bp++; Is likely to be noticably faster than the functionally equivalent: while( i-- ) a[i] = b[i]; While a hyper-optimising compiler _might_ manage to write code with equivalent performance, we all believe that to rarely be the case. Now, when looking into this, I wrote some test programs just to make sure I wasn't lying. These were compiled with gcc 1.34 -O, the most optimising compiler I have handy. # define LEN 10 int a[LEN], b[LEN]; void f1() { register int i = LEN; register int *ap, *bp; for( ap = a, bp = b; i--; ) *ap++ = *bp++; } void f2() { register int i = LEN; while( i-- ) a[i] = b[i]; } The scaling operation done in the loop costs two extra instructions. pointer: L%5: << one instruction loop body mov.l (%a0)+,(%a1)+ << move word + bump pointers L%2: dbra %d0,L%5 array: L%9: << three instruction loop body mov.l %d1,%d0 << grab index asl.l &2,%d0 << scale mov.l (%a0,%d0.l),(%a1,%d0.l) << then move word L%7: dbra %d1,L%9 What came as a surprise to me was that pointer operations may NOT be faster than arrays for use with char types. Changing the int arrays to char arrays gives the following single instructions for the loop body. pointer: mov.b (%a0)+,(%a1)+ array: mov.b (%a0,%d0.l),(%a1,%d0.l) No scaling is involved, so it comes down to the price of using an indexed move vs. an unindexed move with two pointer increments. This is heaviliy in the realm of architecture and hardware implementation. One can imagine machines where pointers are faster, and others where the reverse was true. Thus, the maxim becomes modified: "For heavy array operations, it is usually faster to use pointers than array indexing. "If you are really trying to squeeze out the last cycle, you ought to try it both ways and measure the result because what is good on Brand X may be bad on Brand Y." - me. Few really large programs will be affected by this sort of thing. Clarity of expression may make arrays preferable, particularly if a published algorithm you are stealing is written that way. Given a large enough search space, a poorly written hash or tree will still be a lot faster than a great register pointer based linear search. But in quite a lot of smaller programs this sort of thing can have a significant affect. For instance, the recently posted "splay compress" functions can be speeded up 50% by changing arrays to pointers. -dB -- "An expert is someone who's right 75% of the time" {sun,mtxinu,amdahl,hoptoad}!rtech!gonzo!daveb daveb@gonzo.uucp
david@sun.com (The Amazing Elvis Hypnodisk) (05/03/89)
In article <629@gonzo.UUCP> daveb@gonzo.UUCP (Dave Brower) writes: >Looking into this, I discovered at least one counter-intuitive result, >Many people have jumped on this, but it does point to one of the >fundamental C mind sets, which turns out to have some exceptions: > > "For heavy array operations, it is usually faster to > use pointers than array indexing." > - me. > >That is because the scaling operation is only done once, when the >pointer is incremented, rather than at each reference to the >array[index]. Specifically, where a and b are arrays of non-char types: > > for( ap = a, bp = b; i--; ) /* even I will use a comma op here! */ > *ap++ = *bp++; > >Is likely to be noticably faster than the functionally equivalent: > > while( i-- ) > a[i] = b[i]; Another counter-intuitive result is the Sun-4. The current Sun SPARC C compiler will usually generate better code for arrays than for pointers. This is mostly because of the SPARC instruction set (it has reg + reg addressing but no autoincrement or memory to memory ops). The best pointer code would be something like (neglecting loop unrolling): 1: ld [%ap], %tmp inc size, %ap st %tmp, [%bp] deccc %i bnz 1b inc size, %bp ! delay slot while the best array code might be: 1: ld [%aend + %off], %tmp inccc size, %off bnz 1b st %tmp, [%bend + %off] ! delay slot where aend = a + i, bend = b + i - 1, off = -i * sizeof *a, and of course sizeof *a == sizeof *b. Not that a compiler couldn't tranform one into the other. Not that I'm in the Sun compiler group. Remember, cc -S is your friend. -- David DiGiacomo, Sun Microsystems, Mt. View, CA sun!david david@sun.com