[comp.lang.c] Another s*y question

daveb@gonzo.UUCP (Dave Brower) (04/30/89)

Looking into this, I discovered at least one counter-intuitive result,
so this might be worth reading even if you've grown tired of the thread.

In article <1266@l.cc.purdue.edu> cik@l.cc.purdue.edu (Herman Rubin) writes:
>Now suppose I am doing some serious array operations, and I have to know
>whether one array buffer is longer than another.  The elements are of type
>long.  Do I have to do this multiplying and dividing by 4 all the time?
>Another example of "user-friendly" which turns out to be "user-inimical."

Many people have jumped on this, but it does point to one of the
fundamental C mind sets, which turns out to have some exceptions:

	"For heavy array operations, it is usually faster to
	use pointers than array indexing."
			- me.

That is because the scaling operation is only done once, when the
pointer is incremented, rather than at each reference to the
array[index].  Specifically, where a and b are arrays of non-char types:

	for( ap = a, bp = b; i--;  ) /* even I will use a comma op here! */
		*ap++ = *bp++;

Is likely to be noticably faster than the functionally equivalent:

	while( i-- )
		a[i] = b[i];

While a hyper-optimising compiler _might_ manage to write code with
equivalent performance, we all believe that to rarely be the case.

Now, when looking into this, I wrote some test programs just to make
sure I wasn't lying.  These were compiled with gcc 1.34 -O, the most
optimising compiler I have handy.

	# define LEN 10
	int a[LEN], b[LEN];
	void f1()
	{
		register int i = LEN;
		register int *ap, *bp;
		for( ap = a, bp = b; i--; )
			*ap++ = *bp++;
	}
	void f2()
	{
		register int i = LEN;
		while( i-- )
			a[i] = b[i];
	}

The scaling operation done in the loop costs two extra instructions. 

    pointer:
	L%5:			<< one instruction loop body
		mov.l (%a0)+,(%a1)+	<< move word + bump pointers
	L%2:
		dbra %d0,L%5

    array:
	L%9:			<< three instruction loop body
		mov.l %d1,%d0			<< grab index
		asl.l &2,%d0			<< scale
		mov.l (%a0,%d0.l),(%a1,%d0.l)	<< then move word
	L%7:
		dbra %d1,L%9

What came as a surprise to me was that pointer operations may NOT
be faster than arrays for use with char types.  Changing the int arrays
to char arrays gives the following single instructions for the loop body.

   pointer:
		mov.b (%a0)+,(%a1)+

   array:
		mov.b (%a0,%d0.l),(%a1,%d0.l)

No scaling is involved, so it comes down to the price of using an
indexed move vs. an unindexed move with two pointer increments.  This is
heaviliy in the realm of architecture and hardware implementation.  One
can imagine machines where pointers are faster, and others where the
reverse was true.

Thus, the maxim becomes modified:

	"For heavy array operations, it is usually faster to
	use pointers than array indexing.

	"If you are really trying to squeeze out the last cycle,
	you ought to try it both ways and measure the result because
	what is good on Brand X may be bad on Brand Y."

			- me.

Few really large programs will be affected by this sort of thing.
Clarity of expression may make arrays preferable, particularly if a
published algorithm you are stealing is written that way.  Given a large
enough search space, a poorly written hash or tree will still be a lot
faster than a great register pointer based linear search.

But in quite a lot of smaller programs this sort of thing can have a
significant affect.  For instance, the recently posted "splay compress"
functions can be speeded up 50% by changing arrays to pointers.

-dB
-- 
"An expert is someone who's right 75% of the time"
{sun,mtxinu,amdahl,hoptoad}!rtech!gonzo!daveb	daveb@gonzo.uucp

david@sun.com (The Amazing Elvis Hypnodisk) (05/03/89)

In article <629@gonzo.UUCP> daveb@gonzo.UUCP (Dave Brower) writes:
>Looking into this, I discovered at least one counter-intuitive result,
>Many people have jumped on this, but it does point to one of the
>fundamental C mind sets, which turns out to have some exceptions:
>
>	"For heavy array operations, it is usually faster to
>	use pointers than array indexing."
>			- me.
>
>That is because the scaling operation is only done once, when the
>pointer is incremented, rather than at each reference to the
>array[index].  Specifically, where a and b are arrays of non-char types:
>
>	for( ap = a, bp = b; i--;  ) /* even I will use a comma op here! */
>		*ap++ = *bp++;
>
>Is likely to be noticably faster than the functionally equivalent:
>
>	while( i-- )
>		a[i] = b[i];

Another counter-intuitive result is the Sun-4.  The current Sun SPARC C
compiler will usually generate better code for arrays than for pointers.
This is mostly because of the SPARC instruction set (it has reg + reg
addressing but no autoincrement or memory to memory ops).  The best
pointer code would be something like (neglecting loop unrolling):

1:	ld	[%ap], %tmp
	inc	size, %ap
	st	%tmp, [%bp]
	deccc	%i
	bnz	1b
	inc	size, %bp	! delay slot

while the best array code might be:

1:	ld	[%aend + %off], %tmp
	inccc	size, %off
	bnz	1b
	st	%tmp, [%bend + %off] ! delay slot

where aend = a + i, bend = b + i - 1, off = -i * sizeof *a, and of course
sizeof *a == sizeof *b.  Not that a compiler couldn't tranform one into
the other.  Not that I'm in the Sun compiler group.

Remember, cc -S is your friend.

-- 
David DiGiacomo, Sun Microsystems, Mt. View, CA  sun!david david@sun.com