[comp.arch] Copying bytes quickly, was RISC bashing at USENIX

walter@garth.UUCP (Walter Bays) (07/16/88)
	register long count;
	register long *src, *dst;
	
	   while( --count )
	   {
	      *dst++ = *src++;
	   }

In article <1746@vaxb.calgary.UUCP> radford@calgary.UUCP (Radford Neal) writes:
>Your problem is that the above C code is grossly non-optimal. Assuming
>that "count" is typically fairly large, the optimal C code is the
>following:
>     bcopy ((char*)src, (char*)dst, count*sizeof(long));
>If for some bizzare reason your C comiler doesn't come with a "bcopy"
>routine, I suggest [unrolling the loop]

Neal trusts the library routine to be at least as fast as what he can
write in C.  Bcopy should use an unrolled loop, and should do a word
copy if allignment is right, byte copy if not.  You can probably even
arrange to guarantee word allignment.  Since speed is important here,
I'd use something like:

      assert ((unsigned)src & 3 == 0 && (unsigned)dst & 3 == 0);
      bcopy ((char*)src, (char*)dst, count*sizeof(long));

>There are, of course, many variations, and it's hard to tell which will
>be best on any particular processor, which is why "bcopy" was invented.

A smart compiler could inline the bcopy for small counts.  It could even
use the assertion to bypass the run-time allignment-checking code.
All of which gets back to the previous poster's point: use the library.
-- 
------------------------------------------------------------------------------
My opinions are my own.  Objects in mirror are closer than they appear.
E-Mail route: ...!pyramid!garth!walter		(415) 852-2384
USPS: Intergraph APD, 2400 Geng Road, Palo Alto, California 94303
------------------------------------------------------------------------------