[net.lang.c] Tricky way to unroll string copy loops

msb@lsuc.UUCP (Mark Brader) (04/07/85)

All think talk about string copies made me figure it was time to
repost Duff's Device.  This was invented by Tom Duff when he was
at Lucasfilm, and he posted it about last May with the comment that
"I feel a combination of pride and revulsion at this discovery".

It was originally written in a slightly different form; I'll transform
it slightly so that it copies n bytes from "from" to "to", like strncpy
except that it won't treat nulls specially.

The tricky part is the way the last loop is handled.  Watch carefully:

	tdsdcopy (to, from, count)
	register char *to, *from;
	register int count;
	{
		register n;
		if (count > 0) {
			n = (count+7) / 8;
			switch (count % 8) {

			case 0: do {	*to++ = *from++;
			case 7:		*to++ = *from++;
			case 6:		*to++ = *from++;
			case 5:		*to++ = *from++;
			case 4:		*to++ = *from++;
			case 3:		*to++ = *from++;
			case 2:		*to++ = *from++;
			case 1:		*to++ = *from++;
				} while (--n > 0);
			}
		}
	}

Duff also remarked as follows:

	Dennis Ritchie has endorsed it as legal C.
	...
	Many people (even Brian Kernighan?) have said that the worst feature
	of C is that switches don't break automatically before each case label.
	This code forms some sort of argument in that debate, but I'm not sure
	whether it's for or against.

Mark Brader

garys@bunker.UUCP (Gary M. Samuelson) (04/09/85)

I would like to suggest one aesthetic improvement to
Duff's Device.  It would look slightly less revolting
(in my opinion) if all of the case labels were at the
same block level:

Original:
> 			case 0: do {	*to++ = *from++;
> 			case 7:		*to++ = *from++;
>			...
> 			case 1:		*to++ = *from++;
> 				} while (--n > 0);

With my proposed change:
> 				do {
> 			case 0: 	*to++ = *from++;
> 			case 7:		*to++ = *from++;
>			...
> 			case 1:		*to++ = *from++;
> 				} while (--n > 0);

As I said, it's an aesthetic change, but then again, there
might be a case where some compiler does something with
either the 'do' or the brace which makes a difference.

I'm not sure I would use it either way.

Gary Samuelson
ittvax!bunker!garys

thoth@tellab2.UUCP (Marcus Hall) (04/11/85)

In article <582@lsuc.UUCP> msb@lsuc.UUCP (Mark Brader) codes:
 (inside a routine to string copying in an unrolled loop)
>		if (count > 0) {
>			n = (count+7) / 8;
>			switch (count % 8) {
>
>			case 0: do {	*to++ = *from++;
>			case 7:		*to++ = *from++;
>			case 6:		*to++ = *from++;
>			case 5:		*to++ = *from++;
>			case 4:		*to++ = *from++;
>			case 3:		*to++ = *from++;
>			case 2:		*to++ = *from++;
>			case 1:		*to++ = *from++;
>				} while (--n > 0);
>			}
>		}

Which is very clever, but it seems that still better code should be generated
by the following:

		n = count / 8;
		switch (count % 8) {
		case 7:		*to++ = *from++;
		case 6:		*to++ = *from++;
		case 5:		*to++ = *from++;
		case 4:		*to++ = *from++;
		case 3:		*to++ = *from++;
		case 2:		*to++ = *from++;
		case 1:		*to++ = *from++;
		}
		while (n-- > 0) {
			*to++ = *from++; *to++ = *from++;
			*to++ = *from++; *to++ = *from++;
			*to++ = *from++; *to++ = *from++;
			*to++ = *from++; *to++ = *from++;
		}

Which reads much more clearly.  Note that if the compiler is at all smart in
combining code segments, it will generate code very similar to the first
routine.  The first version, however, has to do an extra test (if count is
considered signed, my routine would have to make this test as well, but copying
a negative number of bytes never made much sense to me) and an extra add
when generating n to make the loop work out right.

I guess that the moral is that cleverness may be an advantage some times, but
sometimes the direct approach can be better.

marcus hall
..!ihnp4!tellab1!tellab2!thoth