[net.unix-wizards] Speedup to bcopy

root (07/13/82)

We did some local monitoring, and discovered that bcopy() always copies
an even number of bytes (i.e. words) always starting on a word boundary.
Doing a byte copy is wasteful compared to a word copy, so we implemented
the following change. Note that it tests for byte boundaries all the same,
since just 'cuz we never saw it happen doesn't mean somebody might not
copy odd bytes or odd addresses (although we doubt it). Anyway, bcopy
gets called an AWFUL lot, so this should speed things up a bit for you
overloaded folks. (We are running BSD something-or-other).

			-Dan Klein & Tron McConnell

-------------------------------------------------------------------------
/*
 * copy count bytes from from to to.
 */
bcopy(from, to, count)
caddr_t from, to;
register count;
{

#ifdef MI_BCOPY
	if ((count|from|to)&1) {	/* RARE case of odd bytes */
#endif
		register char *f, *t;
		f = from;
		t = to;
		do
			*t++ = *f++;
		while(--count);
#ifdef MI_BCOPY
	} else {
		register int *f, *t;
		f = from;
		t = to;
		count >>= 1;		/* Quick divide by 2 */
		do
			*t++ = *f++;
		while(--count);
	}
#endif
}

dan (07/14/82)

Several years ago (when v7 was just a rumor) I got excited about
cpu time spent in bcopy() and rewrote it in assembler to reduce
loop control overhead by a factor of 8.  Some time after I did
this, I took a good look at how the kernel actually used bcopy()
and decided that I had probably been wasting my time.  If you are
into shaving microseconds, this is one way to do it.  Notice that
the length argument is in words.  More modern systems may use byte
counts here.

This code is for pdp11s.  VAXen should (and do) use the movc3 instruction.

	.globl	_bcopy		/call: bcopy(from,to,wordcount);
	_bcopy:	mov	sp,r0
		mov	r2,-(sp)
		mov	r3,-(sp)
		tst	(r0)+
		mov	(r0)+,r2	/par#1 - from pointer
		mov	(r0)+,r3	/par#2 - to pointer
		mov	*r0,r1		/par#3 - word count
		clr	r0
		div	$8,r0
		inc	r0
		asl	r1
		neg	r1
		jmp	2f(r1)
	1:	mov	(r2)+,(r3)+
		mov	(r2)+,(r3)+
		mov	(r2)+,(r3)+
		mov	(r2)+,(r3)+
		mov	(r2)+,(r3)+
		mov	(r2)+,(r3)+
		mov	(r2)+,(r3)+
		mov	(r2)+,(r3)+
	2:	sob	r0,1b
		mov	(sp)+,r3
		mov	(sp)+,r2
		rts	pc