[net.lang.c] Fast strcpy nitpicking

gnu@hoptoad.uucp (John Gilmore) (09/29/86)

In article <7503@sun.uucp>, guy@sun.uucp (Guy Harris) writes:
>       The following, courtesy of John Gilmore and Vaughan Pratt, is what is
> actually used in the (3.2 version of) "strcpy", etc.:
> 
> 	moveq	#-1,d1		| maximum possible (16-bit) count
> hardloop:
> 	movb	FROM@+,TO@+	| copy...
> 	dbeq	d1,hardloop	| until we copy a null or the count is -1
> 	bne	hardloop	| if not-null, continue copying with count
> 				| freshly initialized to -1

Gee, Guy, something must have gotten lost in the translation.  I never
suggested that you needed a moveq there.  It works great no matter what
value happens to be in d1.  Unless d1 always has a small value at that point
in the code, you'll hit the end of the string before the dbra expires,
saving an instruction.  					:-)

PS:  My heart really goes out to the poor guy who wrote his own strcpy
and it broke.  Gee, he'll have to #define it to the one that came with 
the system.  Tears are streaming down my face.			;-)
-- 
John Gilmore  {sun,ptsfa,lll-crg,ihnp4}!hoptoad!gnu   jgilmore@lll-crg.arpa
		     May the Source be with you!

guy@sun.uucp (Guy Harris) (09/29/86)

> Gee, Guy, something must have gotten lost in the translation.  I never
> suggested that you needed a moveq there.  It works great no matter what
> value happens to be in d1.  Unless d1 always has a small value at that point
> in the code, you'll hit the end of the string before the dbra expires,
> saving an instruction.  					:-)

No, I stuck the "moveq" in there; the credit was for the general clich\`e
for "dbCC with a 32-bit count".  The "moveq" is cheaper than the "bne", both
on the 010 and on the 020; if it prevents the "dbeq" from running out enough
times, it'll be worth it.

Unfortunately, how often it will prevent the "dbeq" from running out depends
on the frequency distribution of incoming values in "d1" and of string
lengths (and on the correlation, if any, between those values); lacking any
such data, there's no way of saying which is better.  This situation also
occurs on the VAX; since the VAX's string instructions are oriented towards
counted strings rather than null-terminated strings, the assembler-language
version of "strcpy" must first do a "locc" to find the length of the string
to copy before it does the "movc3" to move the string.  As such, there are
cases where the simple-minded "strcpy" outperforms the assembler-language
version (I don't remember what the lengths in those cases were on the 780; I
did that a long time ago).

I doubt anybody'd have the statistics on values of "d1" upon entry to
"strcpy", but does anybody have a distribution of string lengths in
"strcpy"?  The trouble is that you'd have to build an instrumented version
of "strcpy" and rebuild some reasonably large subset of commands with this
(unless you had it in a shared library that you could replace the standard
library with *without* rebuilding the commands - this would probably require
that library references be bound to addresses at run time).
-- 
	Guy Harris
	{ihnp4, decvax, seismo, decwrl, ...}!sun!guy
	guy@sun.com (or guy@sun.arpa)