[comp.arch] Duff on i386

kds@mipos2.intel.com (Ken Shoemaker ~) (08/30/88)

I played around with the Duff block move thingy on the Sun 386i box and got
some pretty predictable results.  If you take the program as written, the
Duff block move takes ~1/2 as long as the normal block move.  However, if
you go in and do a little assembly language hack, taking advantage of the
repeat move string instruction, you speed up the move considerably such that 
the Duff code takes ~1.8 times as long as the assembly string move.  In other
words:

	C string move: 			2.0
	Duff string move: 		1.0
	Assy string move:		0.55

where the numbers are normalized times.  The assembly language hack amounted
to adding 3 lines of assembly code and removing the code the compiler 
generated for the block move.

Another thing that I noticed when looking at the assembly language for the
string move part of the Duff move was that block of code generated for each
of the "cases" does almost exactly what the string move instruction does, since
the array pointer variables are declared as register type.  The normalized 
time for this version of the Duff string move is:

	Modified Duff string move:	0.69

Yet another thing that can be done is to let the compiler generate the 
iteration variable, but to use the string move instruction to do the work
of the loop.  Note, that in this case, the compiler puts the iteration variable
in memory, not in a register.  The normalized time for this is:

	Modified C string move:		1.6

For the curious, the assembly language generated by the compiler to do the
string move is:

	movl	%edi,%eax
	addl	$4,%edi
	movl	%esi,%edx
	addl	$4,%esi
	movl	(%edx),%ecx
	movl	%ecx,(%eax)

which, as far as the business end is concerned, is the same as a single

	movsl

Have fun...
-------------------
If you break a law to prove a law, you're on pretty shakey moral grounds
						-- Ian Shoales
Ken Shoemaker, Microprocessor Design, Intel Corp., Santa Clara, California
uucp: ...{hplabs|decwrl|amdcad|qantel|pur-ee|scgvaxd|oliveb}!intelca!mipos3!kds