[comp.os.minix] A replacement string package for PC's

nfs@notecnirp.Princeton.EDU (Norbert Schlenker) (10/10/89)

Having done some a favour by posting a new stdio package, which I believe
to be portable across Minix platforms, I am now going to do others a
favour (but to ast, perhaps a disservice) by posting an assembly language
version of the ANSI C string package for PC's and compatibles.  All of
the required routines, save for strcoll() and strxfrm() [missing since
I don't have a use for locale specific routines] and strerror() [missing
since I see no advantage over the C implementation], are included.  The
Berkeley compatibility routines and memccpy() are not included either.
I realize that this posting may cause some distress because of
nonportability, but I believe the performance gain to be worthwhile.

Henry Spencer's C routines are widely used, very reliable, very portable,
and are easily compiled into reasonably efficient code.  They can take no
advantage, of course, of special architectural features, which Intel
processors possess in abundance in this case.  If the best that could be
done was a 10-20% improvement in the string code, which I would consider
fairly typical for assembly over C, I wouldn't consider it worthwhile.
But my rewritten routines show much larger improvements for typical
inputs - from 40% to 95% depending on function.

The code that I write tends to use a lot of str[len|cpy|cmp]() and
mem[set|cpy]().  The improvements for these routines are substantial
enough that I use my assembly language versions.  The recent Dhrystone
1.1 posting by ast shows a 40% increase in Dhrystone rating on my 
machine with these routines (only strcpy() and strcmp() are used there).

The code is faster for four reasons.  It uses special instructions not
generated by the C compiler;  it pays careful attention to register
contents; it unrolls most loops once; and it takes advantage of word
alignment where possible.  The first two involve fairly simple adap-
tations of Spencer's C code to the Intel architecture.  The last two are
sometimes unpleasant.  Unrolling loops once saves 10-15% in most cases;
the attention to alignment saves 3-5% on top of that.  The code is less
clear (in some cases, MUCH less clear) and harder to debug, but 20% is
not to be sniffed at.

The code was optimized on a Toshiba 5100, which has an 80386 and uncached
1-wait state 32 bit memory.  As a result, the code may not be optimal on
other machines.  I expect it to be quite good for 16 bit CPU's, with
perhaps slightly less improvement on 8088's, where the attention to
alignment is wasted.

I am open to bug reports and suggestions for improvement.  I am also
interested in reports of performance on other machines.  To that end, I
have included a program that computes the improvement for a variety of
routines automagically; please email the results to me.  The program
asks for a description of the machine, which it uses as a header - the
CPU and a description of the memory architecture are what I'd like to see.
I have included a copy of the program's output for my machine in the
file Perf.T5100; I hope the improvements shown there will be typical.

Feed the following to /bin/sh to unpack it.  If you have a '386, link
memset386.S to memset.S; otherwise link memset86.S.  By default, the
makefile generates the performance comparison with the existing library.
If you "make install", packed versions of the routines will be installed
in /usr/lib/libc.a.  (After installation in the library, the performance
comparison cannot be done, since the old routines are gone.)

Enjoy!

------------------------------ Cut here --------------------------------
echo x - Copyright
sed '/^X/s///' > Copyright << '/'
XCopyright 1989 Norbert Schlenker.  All rights reserved.
X
XThe copyright holder hereby grants to the public the right to use,
Xmodify, and redistribute this software freely.
X
XThis software is provided "as is" and carries no warranty of any kind.
/
echo x - Makefile
sed '/^X/s///' > Makefile << '/'
XSRCS =	memchr.S memcmp.S memmove.S memset.S strcat.S strchr.S \
X	strcmp.S strcpy.S strcspn.S strlen.S strncat.S strncmp.S \
X	strncpy.S strpbrk.S strrchr.S strspn.S strstr.S strtok.S
X
XOBJS =	memchr.s memcmp.s memmove.s memset.s strcat.s strchr.s \
X	strcmp.s strcpy.s strcspn.s strlen.s strncat.s strncmp.s \
X	strncpy.s strpbrk.s strrchr.s strspn.s strstr.s strtok.s
X
X.SUFFIXES: .s .S .o .c .y
X
X.S.s:
X	libpack < $*.S | \
X	sed "/_[MS]/s/`echo $* | tr '[a-z]' '[A-Z]'`/$*/" | \
X	sed "/^\$$/d" >$*.s
X
Xperf:	perf.c $(SRCS)
X	cc -o perf perf.c $(SRCS)
X	perf >Perf.local
X
Xinstall: $(OBJS)
X	ar d /usr/lib/libc.a memcpy.s	# Spencer's memcpy replaced by memmove
X	ar rlv /usr/lib/libc.a $?
/
echo x - perf.c
sed '/^X/s///' > perf.c << '/'
X#include <stdio.h>
X#include <string.h>
X#include <time.h>
X#include <sys/times.h>
X
X#define INNERLOOPCOUNT 1000
X
X#define TIME_EXECUTION(x, t) {			\
X  times(&start);				\
X  for (i = loop_count; --i >= 0; )		\
X    for (j = INNERLOOPCOUNT; --j >= 0; )	\
X      x;					\
X  times(&end);					\
X  t = end.tms_utime - start.tms_utime;		\
X  }
X
X#define GAIN(argc)				\
X	(((new_time - empty_function_time[argc]) * 100 + 50) /	\
X	  (old_time - empty_function_time[argc]))
X
X
Xstatic char ATOE[] = "ABCDE";
Xstatic char ATOZ[] = "ABCDEFGHIJKLMNOPQRSTUVWXYZ";
X
Xint empty_function(x, y, z)
Xchar *x;
Xchar *y;
Xchar *z;
X{
X}
X
Xmain()
X{
X  struct tms start, end;
X  register int i, j;
X  int loop_count;
X  clock_t empty_function_time[4];
X  clock_t old_time;
X  clock_t new_time;
X  clock_t fake_time;
X  char *s;
X  static char s1[31];
X  static char s2[30];
X  static char bigbuf[1024];
X  static char bigbuf2[1024];
X
X  fprintf(stderr, "This test takes about 7 minutes to run.\n");
X  fprintf(stderr, "Describe the machine you are running on: ");
X  fgets(bigbuf, 60, stdin);
X  printf("Machine: %s\n", bigbuf);
X
X/* --- Find a loop count that results in nontrivial function call times --- */
X  empty_function_time[0] = 0;
X  empty_function_time[1] = 0;
X  for (loop_count = 10; empty_function_time[1] < CLK_TCK/10; loop_count += 5)
X    TIME_EXECUTION(empty_function("x"), empty_function_time[1]);
X  TIME_EXECUTION(empty_function("x","y"), empty_function_time[2]);
X  TIME_EXECUTION(empty_function("x","y","z"), empty_function_time[3]);
X  printf("Functions called %d,000 times each\n\n", loop_count);
X  printf("Function call    \t\t%% of old time\n");
X  printf("-----------------\t\t-------------\n");
X
X/* --- memmove --- */
X  TIME_EXECUTION(memcpy(s1 + 1, s2, 4), old_time);
X  TIME_EXECUTION(MEMCPY(s1 + 1, s2, 4), new_time);
X  printf("memcpy(s1, s2, n)\t\t[n=4]:%d", GAIN(3));
X  TIME_EXECUTION(memcpy(s1, s2 + 1, 25), old_time);
X  TIME_EXECUTION(MEMCPY(s1, s2 + 1, 25), new_time);
X  printf("\t[n=25]:%d", GAIN(3));
X  TIME_EXECUTION(memcpy(bigbuf, bigbuf2, 1024), old_time);
X  TIME_EXECUTION(MEMCPY(bigbuf, bigbuf2, 1024), new_time);
X  printf("\t[n=1024]:%d\n", GAIN(3));
X
X/* --- strcpy --- */
X  TIME_EXECUTION(strcpy(s1, ATOE), old_time);
X  TIME_EXECUTION(STRCPY(s1, ATOE), new_time);
X  printf("strcpy(s1, s2)\t\t\t[s2=ATOE]:%d", GAIN(2));
X  TIME_EXECUTION(strcpy(s1, ATOZ), old_time);
X  TIME_EXECUTION(STRCPY(s1, ATOZ), new_time);
X  printf("\t[s2=ATOZ]:%d\n", GAIN(2));
X
X/* --- strncpy --- */
X  TIME_EXECUTION(strncpy(s1, ATOZ, 10), old_time);
X  TIME_EXECUTION(STRNCPY(s1, ATOZ, 10), new_time);
X  printf("strncpy(s1, s2, n)\t\t[s2=ATOZ,n=10]:%d\n", GAIN(3));
X
X/* --- strcat --- */
X  strcpy(s1, ATOZ);
X  times(&start);
X  for (i = loop_count; --i >= 0; )
X    for (j = INNERLOOPCOUNT; --j >= 0; ) {
X      s1[11] = '\0';
X      strcat(s1, ATOE + 1);
X    }
X  times(&end);
X  old_time = end.tms_utime - start.tms_utime;
X  times(&start);
X  for (i = loop_count; --i >= 0; )
X    for (j = INNERLOOPCOUNT; --j >= 0; ) {
X      s1[11] = '\0';
X      STRCAT(s1, ATOE + 1);
X    }
X  times(&end);
X  new_time = end.tms_utime - start.tms_utime;
X  printf("strcat(ATOJ, ATOE + 1)\t\t%d\n", GAIN(2));
X
X/* --- memcmp --- */
X  memset(bigbuf, 0, 1024);
X  memset(bigbuf2, 0, 1024);
X  TIME_EXECUTION(memcmp(bigbuf, bigbuf2, 4), old_time);
X  TIME_EXECUTION(MEMCMP(bigbuf, bigbuf2, 4), new_time);
X  printf("memcmp(buf, buf2, n)\t\t[n=4]:%d", GAIN(3));
X  TIME_EXECUTION(memcmp(bigbuf, bigbuf2, 25), old_time);
X  TIME_EXECUTION(MEMCMP(bigbuf, bigbuf2, 25), new_time);
X  printf("\t[n=25]:%d", GAIN(3));
X  TIME_EXECUTION(memcmp(bigbuf, bigbuf2, 1024), old_time);
X  TIME_EXECUTION(MEMCMP(bigbuf, bigbuf2, 1024), new_time);
X  printf("\t[n=1024]:%d\n", GAIN(3));
X
X/* --- strcmp --- */
X  strcpy(s1, ATOE);
X  TIME_EXECUTION(strcmp(s1, ATOE), old_time);
X  TIME_EXECUTION(STRCMP(s1, ATOE), new_time);
X  printf("strcmp(s1, s2)\t\t\t[len=5]:%d", GAIN(2));
X  strcpy(s1, ATOZ);
X  TIME_EXECUTION(strcmp(s1 + 1, ATOZ + 1), old_time);
X  TIME_EXECUTION(STRCMP(s1 + 1, ATOZ + 1), new_time);
X  printf("\t[len=25]:%d\n", GAIN(2));
X
X/* --- strncmp --- */
X  strcpy(s1, ATOZ);
X  TIME_EXECUTION(strncmp(s1, ATOZ, 4), old_time);
X  TIME_EXECUTION(STRNCMP(s1, ATOZ, 4), new_time);
X  printf("strncmp(s1, s2, n)\t\t[n=4]:%d", GAIN(3));
X  TIME_EXECUTION(strncmp(s1, ATOZ, 25), old_time);
X  TIME_EXECUTION(STRNCMP(s1, ATOZ, 25), new_time);
X  printf("\t[n=25]:%d\n", GAIN(3));
X
X/* --- memchr --- */
X  TIME_EXECUTION(memchr(ATOZ, 'E', 25), old_time);
X  TIME_EXECUTION(MEMCHR(ATOZ, 'E', 25), new_time);
X  printf("memchr(ATOZ, c, 25)\t\t[c='E']:%d", GAIN(3));
X  TIME_EXECUTION(memchr(ATOZ, 'Z', 25), old_time);
X  TIME_EXECUTION(MEMCHR(ATOZ, 'Z', 25), new_time);
X  printf("\t[c='Z']:%d\n", GAIN(3));
X
X/* --- strchr --- */
X  strcpy(s1, ATOZ);
X  strcpy(s2, s1);
X  TIME_EXECUTION(strchr(ATOZ, 'E'), old_time);
X  TIME_EXECUTION(STRCHR(ATOZ, 'E'), new_time);
X  printf("strchr(ATOZ, c))\t\t[c='E']:%d", GAIN(2));
X  TIME_EXECUTION(strchr(ATOZ, 'Z'), old_time);
X  TIME_EXECUTION(STRCHR(ATOZ, 'Z'), new_time);
X  printf("\t[c='Z']:%d\n", GAIN(2));
X
X/* --- strcspn --- */
X  TIME_EXECUTION(strcspn("word list", " "), old_time);
X  TIME_EXECUTION(STRCSPN("word list", " "), new_time);
X  printf("strcspn(\"word list\", s)\t\t[s=\" \"]:%d", GAIN(2));
X  TIME_EXECUTION(strcspn("word list", " \t\r\n"), old_time);
X  TIME_EXECUTION(STRCSPN("word list", " \t\r\n"), new_time);
X  printf("\t[s=\" \\t\\r\\n\"]:%d\n", GAIN(2));
X
X/* --- strpbrk --- */
X  TIME_EXECUTION(strpbrk("word list", " "), old_time);
X  TIME_EXECUTION(STRPBRK("word list", " "), new_time);
X  printf("strpbrk(\"word list\", s)\t\t[s=\" \"]:%d", GAIN(2));
X  TIME_EXECUTION(strpbrk("word list", " \t\r\n"), old_time);
X  TIME_EXECUTION(STRPBRK("word list", " \t\r\n"), new_time);
X  printf("\t[s=\" \\t\\r\\n\"]:%d\n", GAIN(2));
X
X/* --- strrchr --- */
X  TIME_EXECUTION(strrchr(ATOZ, 'A'), old_time);
X  TIME_EXECUTION(STRRCHR(ATOZ, 'A'), new_time);
X  printf("strrchr(ATOZ, c)\t\t[c='A']:%d", GAIN(2));
X  TIME_EXECUTION(strrchr(ATOZ, 'M'), old_time);
X  TIME_EXECUTION(STRRCHR(ATOZ, 'M'), new_time);
X  printf("\t[c='M']:%d", GAIN(2));
X  TIME_EXECUTION(strrchr(ATOZ, 'Z'), old_time);
X  TIME_EXECUTION(STRRCHR(ATOZ, 'Z'), new_time);
X  printf("\t[c='Z']:%d\n", GAIN(2));
X
X/* --- strspn --- */
X  TIME_EXECUTION(strspn("0175713", "01234567"), old_time);
X  TIME_EXECUTION(STRSPN("0175713", "01234567"), new_time);
X  printf("strspn(\"0175713\", \"01234567\")\t%d\n", GAIN(2));
X
X/* --- strstr --- */
X  TIME_EXECUTION(strstr(ATOZ, "a"), old_time);
X  TIME_EXECUTION(STRSTR(ATOZ, "a"), new_time);
X  printf("strstr(ATOZ, s)\t\t\t[s=\"a\"]:%d", GAIN(2));
X  TIME_EXECUTION(strstr(ATOZ, "y"), old_time);
X  TIME_EXECUTION(STRSTR(ATOZ, "y"), new_time);
X  printf("\t[s=\"y\"]:%d", GAIN(2));
X  TIME_EXECUTION(strstr(ATOZ, "klmnop"), old_time);
X  TIME_EXECUTION(STRSTR(ATOZ, "klmnop"), new_time);
X  printf("\t[s=\"klmnop\"]:%d\n", GAIN(2));
X
X/* --- strtok --- */
X  times(&start);
X  for (i = loop_count; --i >= 0; )
X    for (j = INNERLOOPCOUNT; --j >= 0; ) {
X      strcpy(s1, "a a a a a a a a a a a a");
X      empty_function(NULL, " ");
X      empty_function(NULL, " ");
X      empty_function(NULL, " ");
X      empty_function(NULL, " ");
X      empty_function(NULL, " ");
X      empty_function(NULL, " ");
X      empty_function(NULL, " ");
X      empty_function(NULL, " ");
X      empty_function(NULL, " ");
X      empty_function(NULL, " ");
X      empty_function(NULL, " ");
X      empty_function(NULL, " ");
X    }
X  times(&end);
X  fake_time = end.tms_utime - start.tms_utime;
X  times(&start);
X  for (i = loop_count; --i >= 0; )
X    for (j = INNERLOOPCOUNT; --j >= 0; ) {
X      strcpy(s1, "a a a a a a a a a a a a");
X      strtok(s1, " ");
X      strtok(NULL, " ");
X      strtok(NULL, " ");
X      strtok(NULL, " ");
X      strtok(NULL, " ");
X      strtok(NULL, " ");
X      strtok(NULL, " ");
X      strtok(NULL, " ");
X      strtok(NULL, " ");
X      strtok(NULL, " ");
X      strtok(NULL, " ");
X      strtok(NULL, " ");
X    }
X  times(&end);
X  old_time = end.tms_utime - start.tms_utime - fake_time;
X  times(&start);
X  for (i = loop_count; --i >= 0; )
X    for (j = INNERLOOPCOUNT; --j >= 0; ) {
X      strcpy(s1, "a a a a a a a a a a a a");
X      STRTOK(s1, " ");
X      STRTOK(NULL, " ");
X      STRTOK(NULL, " ");
X      STRTOK(NULL, " ");
X      STRTOK(NULL, " ");
X      STRTOK(NULL, " ");
X      STRTOK(NULL, " ");
X      STRTOK(NULL, " ");
X      STRTOK(NULL, " ");
X      STRTOK(NULL, " ");
X      STRTOK(NULL, " ");
X      STRTOK(NULL, " ");
X    }
X  times(&end);
X  new_time = end.tms_utime - start.tms_utime - fake_time;
X  printf("strtok(\"a a ... a a\",\" \") ...\n");
X  printf("strtok(NULL, \" \") ...\t\t%d\n", GAIN(0));
X
X/* --- memset --- */
X  TIME_EXECUTION(memset(bigbuf, 0, 4), old_time);
X  TIME_EXECUTION(MEMSET(bigbuf, 0, 4), new_time);
X  printf("memset(buf, 0, n)\t\t[n=4]:%d", GAIN(3));
X  TIME_EXECUTION(memset(bigbuf, 0, 1024), old_time);
X  TIME_EXECUTION(MEMSET(bigbuf, 0, 1024), new_time);
X  printf("\t[n=1024]:%d\n", GAIN(3));
X
X/* --- strlen --- */
X  TIME_EXECUTION(strlen(ATOE), old_time);
X  TIME_EXECUTION(STRLEN(ATOE), new_time);
X  printf("strlen(s)\t\t\t[s=ATOE]:%d", GAIN(1));
X  TIME_EXECUTION(strlen(ATOZ), old_time);
X  TIME_EXECUTION(STRLEN(ATOZ), new_time);
X  printf("\t[s=ATOZ]:%d\n", GAIN(1));
X
X  exit(0);
X}
/
echo x - Perf.T5100
sed '/^X/s///' > Perf.T5100 << '/'
XMachine: Toshiba 5100 (80386; 16 MHz; 1-ws 32 bit memory; no cache)
X
XFunctions called 25,000 times each
X
XFunction call    		% of old time
X-----------------		-------------
Xmemcpy(s1, s2, n)		[n=4]:35	[n=25]:18	[n=1024]:7
Xstrcpy(s1, s2)			[s2=ATOE]:58	[s2=ATOZ]:47
Xstrncpy(s1, s2, n)		[s2=ATOZ,n=10]:46
Xstrcat(ATOJ, ATOE + 1)		51
Xmemcmp(buf, buf2, n)		[n=4]:31	[n=25]:13	[n=1024]:7
Xstrcmp(s1, s2)			[len=5]:47	[len=25]:37
Xstrncmp(s1, s2, n)		[n=4]:51	[n=25]:33
Xmemchr(ATOZ, c, 25)		[c='E']:31	[c='Z']:23
Xstrchr(ATOZ, c))		[c='E']:51	[c='Z']:33
Xstrcspn("word list", s)		[s=" "]:33	[s=" \t\r\n"]:36
Xstrpbrk("word list", s)		[s=" "]:35	[s=" \t\r\n"]:38
Xstrrchr(ATOZ, c)		[c='A']:31	[c='M']:25	[c='Z']:18
Xstrspn("0175713", "01234567")	30
Xstrstr(ATOZ, s)			[s="a"]:55	[s="y"]:55	[s="klmnop"]:48
Xstrtok("a a ... a a"," ") ...
Xstrtok(NULL, " ") ...		45
Xmemset(buf, 0, n)		[n=4]:50	[n=1024]:9	[n=1024(386)]:5
Xstrlen(s)			[s=ATOE]:47	[s=ATOZ]:38
/
echo x - memchr.S
sed '/^X/s///' > memchr.S << '/'
X| memchr.s
X|	void *memchr(const void *s, int c, size_t n)
X
X|	Returns a pointer to the first occurrence of c (converted to
X|	unsigned char) in the object pointed to by s, NULL if none.
X
X.define	_MEMCHR
X.globl	_MEMCHR
X.text
X_MEMCHR:
X	push	bp
X	mov	bp,sp
X	push	di
X	xor	bx,bx		| default result is NULL
X	mov	cx,8(bp)
X	jcxz	exit		| early exit if n == 0
X	movb	al,6(bp)
X	mov	di,4(bp)
X	cld
X	repne
X	scab
X	jne	exit
X	lea	bx,-1(di)	| [dec di; mov bx,di] better on 8088/8086
Xexit:
X	mov	ax,bx
X	pop	di
X	mov	sp,bp
X	pop	bp
X	ret
/
echo x - memcmp.S
sed '/^X/s///' > memcmp.S << '/'
X| memcmp.s
X|	int memcmp(const void *s1, const void *s2, size_t n)
X
X|	Compares the first n characters of the objects pointed to by
X|	s1 and s2.  Returns zero if all characters are identical, a
X|	positive number if s1 greater than s2, a negative number otherwise.
X
XBYTE_LIMIT = 10			| if n is above this, we work with words
X
X.define	_MEMCMP
X.globl	_MEMCMP
X.text
X_MEMCMP:
X	push	bp
X	mov	bp,sp
X	push	si
X	push	di
X	xor	ax,ax		| default return is equality
X	mov	cx,8(bp)
X	jcxz	exit		| early exit if n == 0
X	mov	si,4(bp)
X	mov	di,6(bp)
X	cmp	si,di
X	je	exit		| early exit if s1 == s2
X	cld
X	cmp	cx,#BYTE_LIMIT
X	ja	word_compare
Xbyte_compare:
X	repe
X	cmpb
X	je	exit
X	jmp	find_difference
Xword_compare:
X	test	si,#1		| align s1 on word boundary
X	jz	word_aligned
X	cmpb
X	jne	find_difference
X	dec	cx
Xword_aligned:
X	mov	dx,cx		| save count
X	shr	cx,#1		| compare words, not bytes
X	jz	almost_done
X	repe
X	cmp
X	je	almost_done
X	mov	ax,-2(si)	| fetch mismatched words
X	sub	ax,-2(di)
X	orb	al,al
X	jz	find_difference	| if low bytes match, high byte must not
X	cbw
X	jmp	.dsret
Xalmost_done:			| most of string compared equal
X	test	dx,#1
X	jz	exit
X	inc	si
X	inc	di
Xfind_difference:
X	movb	al,-1(si)	| mismatch - determine > or <
X	subb	al,-1(di)
X	cbw
Xexit:
X	jmp	.dsret
/
echo x - memmove.S
sed '/^X/s///' > memmove.S << '/'
X| memmove.s
X|	void *memmove(void *s1, const void *s2, size_t n)
X|	void *memcpy(void *s1, const void *s2, size_t n)
X
X|	Copy n characters from the object pointed to by s2 into the
X|	object pointed to by s1.  Copying takes place as if the n
X|	characters pointed to by s2 are first copied to a temporary
X|	area and then copied to the object pointed to by s1.
X
X|	Per X3J11, memcpy may have undefined results if the objects
X|	overlap; since the performance penalty is insignificant, we
X|	use the safe memmove code for it as well.
X
XBYTE_LIMIT = 12			| if n is above this, we work with words
X
X.define	_MEMMOVE, _MEMCPY
X.globl	_MEMMOVE, _MEMCPY
X.text
X_MEMMOVE:
X_MEMCPY:
X	push	bp
X	mov	bp,sp
X	push	si
X	push	di
X	mov	cx,8(bp)
X	jcxz	exit		| early exit if n == 0
X	mov	di,4(bp)
X	mov	si,6(bp)
X	cmp	si,di
X	je	exit		| early exit if s1 == s2
X	ja	left_to_right
X	std			| s1 < s2: move right to left
X	add	si,cx		| compute objects' end addresses
X	dec	si
X	add	di,cx
X	dec	di
X	cmp	cx,#BYTE_LIMIT
X	jbe	byte_move
X	test	si,#1		| align source on word boundary
X	jnz	1f
X	movb
X	dec	cx
X1:
X	dec	si		| adjust to word boundary
X	dec	di
X	jmp	word_move
Xleft_to_right:
X	cld			| s1 > s2: move left to right
X	cmp	cx,#BYTE_LIMIT
X	jbe	byte_move
X	test	si,#1		| align source on word boundary
X	jz	word_move
X	movb
X	dec	cx
Xword_move:
X	shr	cx,#1		| move words, not bytes
X	rep
X	movw
X	jnc	exit
X	inc	cx		| set up to move leftover byte
Xbyte_move:
X	rep
X	movb
Xexit:
X	jmp	.dsret
/
echo x - memset386.S
sed '/^X/s///' > memset386.S << '/'
X| memset.s
X|	void *memset(void *s, int c, size_t n)
X
X|	Copies the value of c (converted to unsigned char) into the
X|	first n locations of the object pointed to by s.
X
X| !! 386 version !!
X
XOPERAND_SIZE = 102		| operand size override (i.e. force 32 bits)
XBYTE_LIMIT = 16			| if n is above this, we work with words
X
X.define	_MEMSET
X.globl	_MEMSET
X.text
X_MEMSET:
X	push	bp
X	mov	bp,sp
X	push	si
X	push	di
X	mov	cx,8(bp)
X	jcxz	exit		| early exit if n == 0
X	mov	di,4(bp)
X	movb	al,6(bp)
X	cld
X	cmp	cx,#BYTE_LIMIT
X	jbe	byte_set
X	movb	ah,al		| set up second byte
X	test	di,#1		| align on word boundary
X	jz	1f
X	stob
X	dec	cx
X1:
X	test	di,#2		| align on doubleword boundary
X	jz	2f
X	stow
X	dec	cx
X	dec	cx
X2:
X	mov	dx,ax		| duplicate byte in all bytes of EAX
X	.byte	OPERAND_SIZE	| shl eax,16
X	.byte	193
X	.byte	224
X	.byte	16
X	mov	ax,dx
X	mov	dx,cx		| save count
X	shr	cx,#1		| set double words, not bytes
X	shr	cx,#1
X	rep
X	.byte	OPERAND_SIZE
X	stow
X	and	dx,#3		| set up to set leftover bytes
X	jz	exit
X	mov	cx,dx
Xbyte_set:
X	rep
X	stob
Xexit:
X	jmp	.dsret
/
echo x - memset86.S
sed '/^X/s///' > memset86.S << '/'
X| memset.s
X|	void *memset(void *s, int c, size_t n)
X
X|	Copies the value of c (converted to unsigned char) into the
X|	first n locations of the object pointed to by s.
X
XBYTE_LIMIT = 10			| if n is above this, we work with words
X
X.define	_MEMSET
X.globl	_MEMSET
X.text
X_MEMSET:
X	push	bp
X	mov	bp,sp
X	push	si
X	push	di
X	mov	cx,8(bp)
X	jcxz	exit		| early exit if n == 0
X	mov	di,4(bp)
X	movb	al,6(bp)
X	cld
X	cmp	cx,#BYTE_LIMIT
X	jbe	byte_set
X	movb	ah,al		| set up high byte
X	test	di,#1		| align on word boundary
X	jz	1f
X	stob
X	dec	cx
X1:
X	shr	cx,#1		| set words, not bytes
X	rep
X	stow
X	jnc	exit
X	inc	cx		| set up for last byte
Xbyte_set:
X	rep
X	stob
Xexit:
X	jmp	.dsret
/
echo x - strcat.S
sed '/^X/s///' > strcat.S << '/'
X| strcat.s
X|	char *strcat(char *s1, const char *s2)
X
X|	Concatenates the string pointed to by s2 onto the end of the
X|	string pointed to by s1.  Returns s1.
X
X.define _STRCAT
X.globl	_STRCAT
X.text
X_STRCAT:
X	push	bp
X	mov	bp,sp
X	push	si
X	push	di
X	mov	di,4(bp)
X	mov	bx,di
X	mov	si,6(bp)
X	cld
X	mov	cx,#-1		| find end of s1
X	xorb	al,al
X	repne
X	scab
X	dec	di		| point back at null character
X	test	si,#1		| align source on word boundary
X	jz	word_copy
X	lodb
X	stob
X	orb	al,al
X	jz	exit
Xword_copy:			| loop to copy words
X	lodw
X	orb	al,al
X	jz	move_last_byte	| early exit if low byte == 0
X	stow
X	orb	ah,ah
X	jnz	word_copy
X	jmp	exit
Xmove_last_byte:
X	stob			| add odd zero byte
Xexit:
X	mov	ax,bx
X	jmp	.dsret
/
echo x - strchr.S
sed '/^X/s///' > strchr.S << '/'
X| strchr.s
X|	char *strchr(const char *s, int c)
X
X|	Returns location of the first occurrence of c (converted to char)
X|	in the string pointed to by s.  Returns NULL if c does not occur.
X
X.define	_STRCHR
X.globl	_STRCHR
X.text
X_STRCHR:
X	push	bp
X	mov	bp,sp
X	push	si
X	mov	si,4(bp)
X	movb	dl,6(bp)
X	cld
X	test	si,#1		| align source on word boundary
X	jz	word_loop
X	lodb
X	cmpb	al,dl
X	je	one_past
X	orb	al,al
X	jz	no_match
Xword_loop:
X	lodw
X	cmpb	al,dl
X	je	two_past
X	orb	al,al
X	jz	no_match
X	cmpb	ah,dl
X	je	one_past
X	orb	ah,ah
X	jnz	word_loop
Xno_match:
X	xor	ax,ax
X	jmp	.sret
Xtwo_past:
X	dec	si
Xone_past:
X	dec	si
X	mov	ax,si
X	jmp	.sret
/
echo x - strcmp.S
sed '/^X/s///' > strcmp.S << '/'
X| strcmp.s
X|	int strcmp(const char *s1, const char *s2)
X
X|	Compares the strings pointed to by s1 and s2.  Returns zero if
X|	strings are identical, a positive number if s1 greater than s2,
X|	and a negative number otherwise.
X
X.define	_STRCMP
X.globl	_STRCMP
X.text
X_STRCMP:
X	push	bp
X	mov	bp,sp
X	push	si
X	push	di
X	xor	ax,ax		| default return is equality
X	mov	si,4(bp)
X	mov	di,6(bp)
X	cmp	si,di
X	je	exit		| early exit if s1 == s2
X	cld
X	test	si,#1		| align s1 on word boundary
X	jz	word_loop
X	lodb
X	orb	al,al
X	jz	last_byte_test
X	subb	al,(di)
X	jnz	exit
X	inc	di
Xword_loop:			| loop through string by words
X	mov	ax,(si)
X	orb	al,al
X	jz	last_byte_test
X	orb	ah,ah
X	jz	high_byte_zero
X	cmp
X	je	word_loop
X	mov	ax,-2(si)	| find mismatch in final word
X	sub	ax,-2(di)
X	orb	al,al
X	jnz	exit
X	movb	al,ah
X	jmp	exit
Xhigh_byte_zero:
X	subb	al,(di)
X	jnz	exit
X	movb	al,ah
X	inc	di
Xlast_byte_test:
X	subb	al,(di)
Xexit:
X	cbw
X	jmp	.dsret
/
echo x - strcpy.S
sed '/^X/s///' > strcpy.S << '/'
X| strcpy.s
X|	char *strcpy(char *s1, const char *s2)
X
X|	Copy the string pointed to by s2, including the terminating null
X|	character, into the array pointed to by s1.  Returns s1.
X
X.define	_STRCPY
X.globl	_STRCPY
X.text
X_STRCPY:
X	push	bp
X	mov	bp,sp
X	push	si
X	push	di
X	mov	di,4(bp)
X	mov	bx,di
X	mov	si,6(bp)
X	cld
X	test	si,#1		| align source on word boundary
X	jz	word_copy
X	lodb
X	stob
X	orb	al,al
X	jz	exit
Xword_copy:			| loop to copy words
X	lodw
X	orb	al,al
X	jz	move_last_byte	| early exit if low byte == 0
X	stow
X	orb	ah,ah
X	jnz	word_copy
X	jmp	exit
Xmove_last_byte:
X	stob			| add odd zero byte
Xexit:
X	mov	ax,bx
X	jmp	.dsret
/
echo x - strcspn.S
sed '/^X/s///' > strcspn.S << '/'
X| strcspn.s
X|	size_t strcspn(const char *s1, const char *s2)
X
X|	Returns the length of the longest prefix of the string pointed
X|	to by s1 that has none of the characters in the string s2.
X
X.define	_STRCSPN
X.globl	_STRCSPN
X.text
X_STRCSPN:
X	push	bp
X	mov	bp,sp
X	push	si
X	push	di
X	mov	si,4(bp)
X	mov	di,6(bp)
X	cld
X	mov	bx,#-1		| set up character count (-1 for faster loops)
X	cmpb	(di),*0
X	jz	s1_length	| if s2 has length zero, we return s1's length
X	inc	di
X	cmpb	(di),*0
X	jz	find_match	| if s2 has length one, we take a shortcut
X	mov	cx,bx		| find length of s2
X	xorb	al,al
X	repne
X	scab
X	not	cx
X|	dec	cx		| don't need this: we started at s2[1]
X	mov	dx,cx		| save length of s2
Xs1_loop:			| loop through s1 looking for matches with s2
X	lodb
X	inc	bx
X	orb	al,al
X	je	exit
X	mov	di,6(bp)
X	mov	cx,dx
X	repne
X	scab
X	jne	s1_loop
X	jmp	exit
Xs1_length:			| find length of s1
X	mov	di,si
X	mov	cx,bx
X	xorb	al,al
X	repne
X	scab
X	not	cx
X	dec	cx
X	mov	bx,cx
X	jmp	exit
Xfind_match:			| find a match for *s2 in s1
X	movb	dl,-1(di)
X	test	si,#1		| align source on word boundary
X	jz	word_loop
X	lodb
X	inc	bx
X	orb	al,al
X	je	exit
X	cmpb	al,dl
X	je	exit
Xword_loop:
X	lodw
X	inc	bx
X	orb	al,al
X	je	exit
X	cmpb	al,dl
X	je	exit
X	inc	bx
X	orb	ah,ah
X	je	exit
X	cmpb	ah,dl
X	jne	word_loop
Xexit:
X	mov	ax,bx
X	jmp	.dsret
/
echo x - strlen.S
sed '/^X/s///' > strlen.S << '/'
X| strlen.s
X|	size_t strlen(const char *s)
X
X|	Returns the length of the string pointed to by s.
X
X.define	_STRLEN
X.globl	_STRLEN
X.text
X_STRLEN:
X	push	bp
X	mov	bp,sp
X	push	di
X	mov	di,4(bp)
X	cld
X	mov	cx,#-1
X	xorb	al,al
X	repne
X	scab
X	not	cx		| silly trick gives length (including null)
X	dec	cx		| forget about null
X	mov	ax,cx
X	pop	di
X	mov	sp,bp
X	pop	bp
X	ret
/
echo x - strncat.S
sed '/^X/s///' > strncat.S << '/'
X| strncat.s
X|	char *strncat(char *s1, const char *s2, size_t n)
X
X|	Concatenates up to n characters of the string pointed to by s2
X|	onto the end of the string pointed to by s1.  A terminating
X|	null character is always appended.  Returns s1.
X
X.define	_STRNCAT
X.globl	_STRNCAT
X.text
X_STRNCAT:
X	push	bp
X	mov	bp,sp
X	push	si
X	push	di
X	mov	cx,8(bp)
X	jcxz	exit		| early exit if n == 0
X	mov	di,4(bp)
X	mov	si,6(bp)
X	cld
X	mov	cx,#-1		| find end of s1
X	xorb	al,al
X	repne
X	scab
X	dec	di
X	mov	cx,8(bp)
Xbyte_loop:			| loop to copy bytes
X	lodb
X	stob
X	orb	al,al
X	loopnz	byte_loop
X	jz	exit
X	movb	(di),*0		| add terminating null character
Xexit:
X	mov	ax,4(bp)
X	jmp	.dsret
/
echo x - strncmp.S
sed '/^X/s///' > strncmp.S << '/'
X| strncmp.s
X|	int strncmp(const char *s1, const char *s2, size_t n)
X
X|	Compares up to n characters from the strings pointed to by s1
X|	and s2.  Returns zero if the (possibly null terminated) arrays
X|	are identical, a positive number if s1 is greater than s2, and
X|	a negative number otherwise.
X
X.define	_STRNCMP
X.globl	_STRNCMP
X.text
X_STRNCMP:
X	push	bp
X	mov	bp,sp
X	push	si
X	push	di
X	xor	ax,ax		| default result is equality
X	mov	cx,8(bp)
X	jcxz	exit		| early exit if n == 0
X	mov	si,4(bp)
X	mov	di,6(bp)
X	cmp	si,di
X	je	exit		| early exit if s1 == s2
X	cld
X	test	si,#1		| align s1 on word boundary
X	jz	set_length
X	lodb
X	orb	al,al
X	jz	last_byte_test
X	subb	al,(di)
X	jne	exit
X	dec	cx
X	jz	exit		| early exit if n == 1
X	inc	di
Xset_length:
X	mov	dx,cx		| save count
X	shr	cx,#1		| work with words, not bytes
X	jz	fetch_last_byte
Xword_loop:			| loop through string by words
X	mov	ax,(si)
X	orb	al,al
X	jz	last_byte_test
X	orb	ah,ah
X	jz	high_byte_zero
X	cmp
X	loope	word_loop
X	je	fetch_last_byte
X	mov	ax,-2(si)	| find mismatch in final word
X	sub	ax,-2(di)
X	orb	al,al
X	jnz	exit
X	movb	al,ah
X	jmp	exit
Xfetch_last_byte:
X	xor	ax,ax
X	test	dx,#1
X	jz	exit
X	movb	al,(si)
X	jmp	last_byte_test
Xhigh_byte_zero:
X	subb	al,(di)
X	jnz	exit
X	movb	al,ah
X	inc	di
Xlast_byte_test:
X	subb	al,(di)
Xexit:
X	cbw
X	jmp	.dsret
/
echo x - strncpy.S
sed '/^X/s///' > strncpy.S << '/'
X| strncpy.s
X|	char *strncpy(char *s1, const char *s2, size_t n)
X
X|	Copy up to n characters from the string pointed to by s2 to
X|	the array pointed to by s1.  If the source string is shorter
X|	than n characters, the remainder of the destination is padded
X|	with null characters.  If the source is longer than n characters,
X|	the destination will not be null terminated.  Returns s1.
X
XBYTE_LIMIT = 10			| if n is above this, zero fill with words
X
X.define	_STRNCPY
X.globl	_STRNCPY
X.text
X_STRNCPY:
X	push	bp
X	mov	bp,sp
X	push	si
X	push	di
X	mov	cx,8(bp)
X	jcxz	exit		| early exit if n == 0
X	mov	di,4(bp)
X	mov	si,6(bp)
X	cld
X	cmpb	(si),*0
X	je	zero_fill	| if s2 has length zero, take a short cut
X	test	si,#1		| align source on word boundary
X	jz	set_length
X	movb
X	dec	cx
X	jz	exit		| early exit if n == 1
Xset_length:
X	mov	dx,cx		| save count
X	shr	cx,#1		| copy words, not bytes
X	jz	last_byte
Xword_copy:			| loop to copy words
X	lodw
X	orb	al,al
X	jz	restore_length	| early exit if low byte == 0
X	stow
X	orb	ah,ah
X	loopnz	word_copy
X	jz	restore_length
Xlast_byte:
X	test	dx,#1		| move leftover byte
X	jz	exit
X	movb
X	jmp	exit
Xrestore_length:			| retrieve remaining length (in bytes)
X	shl	cx,#1
X	and	dx,#1
X	add	cx,dx
Xzero_fill:			| add null characters if necessary
X	xor	ax,ax
X	cmp	cx,#BYTE_LIMIT
X	jbe	zero_bytes
X	test	di,#1		| align destination on word boundary
X	jz	zero_words
X	stob
X	dec	cx
Xzero_words:
X	mov	dx,cx		| save count
X	shr	cx,#1		| zero words, not bytes
X	rep
X	stow
X	test	dx,#1
X	jz	exit
X	inc	cx		| set up to zero leftover byte
Xzero_bytes:
X	rep
X	stob
Xexit:
X	mov	ax,4(bp)
X	jmp	.dsret
/
echo x - strpbrk.S
sed '/^X/s///' > strpbrk.S << '/'
X| strpbrk.s
X|	char *strpbrk(const char *s1, const char *s2)
X
X|	Returns the address of the first character of the string pointed
X|	to by s1 that is in the string pointed to by s2.  Returns NULL
X|	if no such character exists.
X
X.define	_STRPBRK
X.globl	_STRPBRK
X.text
X_STRPBRK:
X	push	bp
X	mov	bp,sp
X	push	si
X	push	di
X	mov	si,4(bp)
X	mov	di,6(bp)
X	cld
X	xor	ax,ax		| default return value is NULL
X	cmpb	(di),*0
X	jz	exit		| if s2 has length zero, we are done
X	inc	di
X	cmpb	(di),*0
X	jz	find_match	| if s2 has length one, we take a shortcut
X	mov	cx,#-1		| find length of s2
X	repne
X	scab
X	not	cx
X|	dec	cx		| don't need this: we started at s2[1]
X	mov	dx,cx		| save length of s2
Xs1_loop:			| loop through s1 looking for matches with s2
X	lodb
X	orb	al,al
X	jz	exit
X	mov	di,6(bp)
X	mov	cx,dx
X	repne
X	scab
X	jne	s1_loop
X	lea	ax,-1(si)
X	jmp	exit
Xfind_match:			| find a match for *s2 in s1
X	movb	dl,-1(di)
X	test	si,#1		| align source on word boundary
X	jz	word_loop
X	lodb
X	cmpb	al,dl
X	je	one_past
X	orb	al,al
X	jz	no_match
Xword_loop:
X	lodw
X	cmpb	al,dl
X	je	two_past
X	orb	al,al
X	jz	no_match
X	cmpb	ah,dl
X	je	one_past
X	orb	ah,ah
X	jnz	word_loop
Xno_match:
X	xor	ax,ax
X	jmp	.dsret
Xtwo_past:
X	dec	si
Xone_past:
X	dec	si
X	mov	ax,si
Xexit:
X	jmp	.dsret
/
echo x - strrchr.S
sed '/^X/s///' > strrchr.S << '/'
X| strrchr.s
X|	char *strrchr(const char *s, int c)
X
X|	Locates final occurrence of c (as unsigned char) in string s.
X
X.define	_STRRCHR
X.globl	_STRRCHR
X.text
X_STRRCHR:
X	push	bp
X	mov	bp,sp
X	push	si
X	push	di
X	xor	bx,bx		| default result is NULL
X	mov	di,4(bp)
X	cld
X	mov	cx,#-1		| find end of string
X	xorb	al,al
X	repne
X	scab
X	not	cx		| silly trick gives length (including null)
X	dec	di		| point back at null character
X	movb	al,6(bp)	| find last occurrence of c
X	std
X	repne
X	scab
X	jne	exit
X	lea	bx,1(di)	| [inc di; mov bx,di] better on 8088/8086
Xexit:
X	mov	ax,bx
X	pop	di
X	mov	sp,bp
X	pop	bp
X	ret
/
echo x - strspn.S
sed '/^X/s///' > strspn.S << '/'
X| strspn.s
X|	size_t strspn(const char *s1, const char *s2)
X
X|	Returns the length of the longest prefix of the string pointed
X|	to by s1 that is made up of the characters in the string s2.
X
X.define	_STRSPN
X.globl	_STRSPN
X.text
X_STRSPN:
X	push	bp
X	mov	bp,sp
X	push	si
X	push	di
X	mov	si,4(bp)
X	mov	di,6(bp)
X	cld
X	xor	bx,bx		| zero count of valid characters
X	cmpb	(di),*0
X	jz	exit		| if s2 has length zero, we are done
X	inc	di
X	cmpb	(di),*0
X	jz	find_mismatch	| if s2 has length one, we take a shortcut
X	mov	cx,#-1		| find length of s2
X	xorb	al,al
X	repne
X	scab
X	not	cx
X|	dec	cx		| don't need this: we started at s2[1]
X	mov	dx,cx		| save length of s2
X	dec	bx		| set up for faster loop
Xs1_loop:			| loop through s1 looking for matches with s2
X	lodb
X	inc	bx
X	orb	al,al
X	je	exit
X	mov	di,6(bp)
X	mov	cx,dx
X	repne
X	scab
X	je	s1_loop
X	jmp	exit
Xfind_mismatch:			| find a character in s1 that isn't *s2
X	movb	al,-1(di)
X	mov	di,si
X	mov	cx,#-1
X	repe
X	scab
X	dec	di		| point back at mismatch
X	mov	bx,di
X	sub	bx,si		| number of matched characters
Xexit:
X	mov	ax,bx
X	jmp	.dsret
/
echo x - strstr.S
sed '/^X/s///' > strstr.S << '/'
X| strstr.s
X|	char * strstr(const char *s1, const char *s2)
X
X|	Returns a pointer to the first occurrence in the string pointed
X|	to by s1 that is made up of the characters in the string s2.
X
X.define	_STRSTR
X.globl	_STRSTR
X.text
X_STRSTR:
X	push	bp
X	mov	bp,sp
X	sub	sp,#2		| make room for locals
X	push	si
X	push	di
X	mov	si,4(bp)
X	mov	di,6(bp)
X	mov	bx,si		| default result is s1
X	movb	ah,(di)		| fetch first character of s2
X	orb	ah,ah
X	je	exit		| if s2 is null, we are done
X	cld
X	mov	cx,#-1		| find length of s2
X	xorb	al,al
X	repne
X	scab
X	not	cx
X	dec	cx
X	mov	-2(bp),cx	| save length of s2
X	mov	cx,#-1		| find length + 1 of s1
X	mov	di,si
X	repne
X	scab
X	not	cx
X	sub	cx,-2(bp)	| |s1| - |s2| + 1 is number of possibilities
X	jbe	not_found	| if |s1| < |s2|, give up right now
X	mov	dx,cx
X	inc	dx		| set up for faster loop
X	dec	bx
Xs1_loop:
X	dec	dx
X	jz	not_found
X	inc	bx
X	cmpb	ah,(bx)
X	jne	s1_loop		| if first characters don't match, try another
X	mov	di,6(bp)
X	mov	si,bx
X	mov	cx,-2(bp)
X	repe
X	cmpb
X	jne	s1_loop
X	jmp	exit
Xnot_found:
X	xor	bx,bx
Xexit:
X	mov	ax,bx
X	jmp	.dsret
/
echo x - strtok.S
sed '/^X/s///' > strtok.S << '/'
X| strtok.s
X|	char *strtok(char *s1, const char *s2)
X
X|	Returns a pointer to the "next" token in s1.  Tokens are 
X|	delimited by the characters in the string pointed to by s2.
X
X.define	_STRTOK
X.globl	_STRTOK
X.data
Xscan:	.word	0
X.text
X_STRTOK:
X	push	bp
X	mov	bp,sp
X	push	si
X	push	di
X	cld
X	mov	bx,4(bp)
X	or	bx,bx		| if s != NULL,
X	jnz	s2_length	|   we start a new string
X	mov	bx,scan
X	or	bx,bx		| if old string exhausted,
X	jz	exit		|   exit early
Xs2_length:			| find length of s2
X	mov	di,6(bp)
X	mov	cx,#-1
X	xorb	al,al
X	repne
X	scab
X	not	cx
X	dec	cx
X	jz	string_finished	| if s2 has length zero, we are done
X	mov	dx,cx		| save length of s2
X
X	mov	si,bx
X	xor	bx,bx		| return value is NULL
Xdelim_loop:			| dispose of leading delimiters
X	lodb
X	orb	al,al
X	jz	string_finished
X	mov	di,6(bp)
X	mov	cx,dx
X	repne
X	scab
X	je	delim_loop
X
X	lea	bx,-1(si)	| return value is start of token
Xtoken_loop:			| find end of token
X	lodb
X	orb	al,al
X	jz	string_finished
X	mov	di,6(bp)
X	mov	cx,dx
X	repne
X	scab
X	jne	token_loop
X	movb	-1(si),*0	| terminate token
X	mov	scan,si		| set up for next call
X	jmp	exit
Xstring_finished:
X	mov	scan,#0		| ensure NULL return in future
Xexit:
X	mov	ax,bx
X	jmp	.dsret
/

henry@utzoo.uucp (Henry Spencer) (10/11/89)

In article <20136@princeton.Princeton.EDU> nfs@notecnirp.UUCP (Norbert Schlenker) writes:
>... The recent Dhrystone
>1.1 posting by ast shows a 40% increase in Dhrystone rating on my 
>machine with these routines (only strcpy() and strcmp() are used there).

One should not put too much emphasis on this.  This is a known bug in
Dhrystone -- it puts too much weight on the string functions and uses
them in atypically favorable ways.  Many system suppliers find that it
is well worth their while to optimize the bejesus out of their string
routines for the sake of Dhrystone numbers.
-- 
A bit of tolerance is worth a  |     Henry Spencer at U of Toronto Zoology
megabyte of flaming.           | uunet!attcan!utzoo!henry henry@zoo.toronto.edu

nfs@notecnirp.Princeton.EDU (Norbert Schlenker) (10/11/89)

I just hate to look like an idiot in front of thousands, but this 
morning's posting of string routines is going to accomplish exactly
that.  I didn't try linking (out of the library) four routines that
I changed just recently.  Naturally, I got bitten by my oversight.
A slight code rearrangement is needed to placate asld.  Here are
the context diffs for the routines that will cause problems.

My profuse apologies.

-------------------------------- Cut here --------------------------------
echo x - memcmp.S.cdif
sed '/^X/s///' > memcmp.S.cdif << '/'
X*** memcmp.S.orig	Fri Oct  6 10:01:32 1989
X--- memcmp.S	Tue Oct 10 19:20:19 1989
X***************
X*** 5,14 ****
X  |	s1 and s2.  Returns zero if all characters are identical, a
X  |	positive number if s1 greater than s2, a negative number otherwise.
X  
X! BYTE_LIMIT = 10			| if n is above this, we work with words
X! 
X! .define	_MEMCMP
X! .globl	_MEMCMP
X  .text
X  _MEMCMP:
X  	push	bp
X--- 5,15 ----
X  |	s1 and s2.  Returns zero if all characters are identical, a
X  |	positive number if s1 greater than s2, a negative number otherwise.
X  
X! .define	_MEMCMP
X! .globl	_MEMCMP
X! 
X! BYTE_LIMIT = 10			| if n is above this, we work with words
X! 
X  .text
X  _MEMCMP:
X  	push	bp
/
echo x - memmove.S.cdif
sed '/^X/s///' > memmove.S.cdif << '/'
X*** memmove.S.orig	Fri Oct  6 10:04:03 1989
X--- memmove.S	Tue Oct 10 19:20:29 1989
X***************
X*** 11,20 ****
X  |	overlap; since the performance penalty is insignificant, we
X  |	use the safe memmove code for it as well.
X  
X! BYTE_LIMIT = 12			| if n is above this, we work with words
X! 
X! .define	_MEMMOVE, _MEMCPY
X! .globl	_MEMMOVE, _MEMCPY
X  .text
X  _MEMMOVE:
X  _MEMCPY:
X--- 11,21 ----
X  |	overlap; since the performance penalty is insignificant, we
X  |	use the safe memmove code for it as well.
X  
X! .define	_MEMMOVE, _MEMCPY
X! .globl	_MEMMOVE, _MEMCPY
X! 
X! BYTE_LIMIT = 12			| if n is above this, we work with words
X! 
X  .text
X  _MEMMOVE:
X  _MEMCPY:
/
echo x - strncpy.S.cdif
sed '/^X/s///' > strncpy.S.cdif << '/'
X*** strncpy.S.orig	Fri Oct  6 11:11:42 1989
X--- strncpy.S	Tue Oct 10 19:20:41 1989
X***************
X*** 7,16 ****
X  |	with null characters.  If the source is longer than n characters,
X  |	the destination will not be null terminated.  Returns s1.
X  
X! BYTE_LIMIT = 10			| if n is above this, zero fill with words
X! 
X! .define	_STRNCPY
X! .globl	_STRNCPY
X  .text
X  _STRNCPY:
X  	push	bp
X--- 7,17 ----
X  |	with null characters.  If the source is longer than n characters,
X  |	the destination will not be null terminated.  Returns s1.
X  
X! .define	_STRNCPY
X! .globl	_STRNCPY
X! 
X! BYTE_LIMIT = 10			| if n is above this, zero fill with words
X! 
X  .text
X  _STRNCPY:
X  	push	bp
/