[comp.os.minix] Faster string package

nfs@notecnirp.Princeton.EDU (Norbert Schlenker) (10/19/89)

Here is an updated version of the string routines I recently posted.
Only one serious bug was found (but that may be because nobody is
using the code!)  This version differs in the following ways:

  - Standard headers included.
  - Strerror() included (by popular request).
  - Packaging problems fixed.
  - Faster for small inputs (improved linkage).
  - Uses Bruce Evans' method of generating assembler code through
    the C preprocessor.  That makes generating different versions
    for different machines a little simpler.

I have yet to implement strcoll() and strxfrm(), which are ANSI
standard, because I can't figure out what the standard intends.
Does anyone know how to implement the rigmarole involving <locale.h>?

Although I received a few requests for it, memccpy() isn't implemented
either.  I have inconsistent documentation regarding its function, so
am disinclined to foist an implementation on an unsuspecting public.

Thanks to Earl Chew for pointing out the serious bug in memmove().
Thanks to Bruce Evans for his constructive suggestions about speed.

----------------------------  Cut here -------------------------------
#!/bin/sh
echo x - Readme
sed '/^X/s///' > Readme << '/'
XHaving done some a favour by posting a new stdio package, which I believe
Xto be portable across Minix platforms, I am now going to do others a
Xfavour (but to ast, perhaps a disservice) by posting an assembly language
Xversion of the ANSI C string package for PC's and compatibles.  All of
Xthe required routines, save for strcoll() and strxfrm() [missing since
XI don't have a use for locale specific routines] are included.  The
XBerkeley compatibility routines are not included, although the included
X<string.h> defines them as macros.  memccpy() is not included, since I
Xcannot find consistent documentation as to its function.
X
XHenry Spencer's C routines are widely used, very reliable, very portable,
Xand are easily compiled into reasonably efficient code.  They can take no
Xadvantage, of course, of special architectural features, which Intel
Xprocessors possess in abundance in this case.  If the best that could be
Xdone was a 10-20% improvement in the string code, which I would consider
Xfairly typical for assembly over C, I wouldn't consider it worthwhile.
XBut my rewritten routines show much larger improvements for typical
Xinputs - from 40% to 95% depending on function.
X
XThe code that I write tends to use a lot of str[len|cpy|cmp]() and
Xmem[set|cpy]().  The improvements for these routines are substantial
Xenough that I use my assembly language versions.  The recent Dhrystone
X1.1 posting by ast shows a 40% increase in Dhrystone rating on my 
Xmachine with these routines (only strcpy() and strcmp() are used there).
X
XThe code is faster for a number of reasons.  It uses special instructions
Xnot generated by the C compiler, pays careful attention to register
Xcontents, uses simplified linkage, unrolls most loops once, and takes
Xadvantage of word alignment where possible.  The first three involve
Xfairly simple adaptations of Spencer's C code to the Intel architecture.
XThe last two are sometimes unpleasant.  Unrolling loops once saves 10-15%
Xin most cases; the attention to alignment saves 3-5% on top of that.  The
Xcode is less clear (in some cases, MUCH less clear) and harder to debug,
Xbut 20% is not to be sniffed at.
X
XThe code was optimized on a Toshiba 5100, which has an 80386 and uncached
X1-wait state 32 bit memory.  As a result, the code may not be optimal on
Xother machines.  I expect it to be quite good for 16 bit CPU's, with
Xperhaps slightly less improvement on 8088's, where the attention to
Xalignment is wasted.
X
XI am open to bug reports and suggestions for improvement.  I am also
Xinterested in reports of performance on other machines.  To that end, I
Xhave included a program that computes the improvement for a variety of
Xroutines automagically; please email the results to me.  The program
Xasks for a description of the machine, which it uses as a header - the
XCPU and a description of the memory architecture are what I'd like to see.
XI have included a copy of the program's output for my machine in the
Xfile Perf.T5100; I hope the improvements shown there will be typical.
X
XTo use the new string package, check that the macro definitions at the
Xtop of the makefile are compatible with your configuration.  By default,
Xthe makefile generates the performance comparison with the existing library.
XIf you "make install", packed versions of the routines will be installed in
Xyour library.  New versions of two headers, <prototype.h> and <string.h>,
Xwill also be copied into your header directory.
X
XEnjoy!
/
echo x - Copyright
sed '/^X/s///' > Copyright << '/'
XCopyright 1989 Norbert Schlenker.  All rights reserved.
X
XThe copyright holder hereby grants to the public the right to use,
Xmodify, and redistribute this software freely.
X
XThis software is provided "as is" and carries no warranty of any kind.
/
echo x - Makefile
sed '/^X/s///' > Makefile << '/'
X# Should reflect the location of your C preprocessor.
XCPP = /usr/lib/cpp
X
X# Should reflect the location of your headers.
XINCLUDE = /usr/include
X
X# Should reflect the location of your library.
XLIBC = /usr/lib/libc.a
X
X# Should reflect the target machine.
XTARGET = i8088
X#TARGET = i80286
X#TARGET = i80386
X
X# The rest should be fine as is.
XSRCS =	memchr.x memcmp.x memmove.x memset.x strcat.x strchr.x \
X	strcmp.x strcpy.x strcspn.x strlen.x strncat.x strncmp.x \
X	strncpy.x strpbrk.x strrchr.x strspn.x strstr.x strtok.x
X
XCPPS =	memchr.S memcmp.S memmove.S memset.S strcat.S strchr.S \
X	strcmp.S strcpy.S strcspn.S strlen.S strncat.S strncmp.S \
X	strncpy.S strpbrk.S strrchr.S strspn.S strstr.S strtok.S
X
XOBJS =	memchr.s memcmp.s memmove.s memset.s strcat.s strchr.s \
X	strcmp.s strcpy.s strcspn.s strlen.s strncat.s strncmp.s \
X	strncpy.s strpbrk.s strrchr.s strspn.s strstr.s strtok.s
X
X.SUFFIXES: .s .S .x .o .c .y
X
X.x.S:
X	$(CPP) -D$(TARGET) $*.x | \
X	sed '/^$$/d; /^\#/d' >$*.S
X
X.x.s:
X	$(CPP) -D$(TARGET) $*.x | \
X	sed '/^$$/d; /^\#/d' | \
X	sed 's/_MEM/_mem/g; s/_STR/_str/g' | \
X	libpack >$*.s
X
Xperf:	perf.c $(CPPS)
X	cc -o perf perf.c $(CPPS)
X	perf >Perf.local
X
Xinstall: $(OBJS)
X	mv prototype.h string.h $(INCLUDE)
X	ar dv $(LIBC) memcpy.s		# Spencer's memcpy replaced by memmove
X	ar rlv $(LIBC) $?
/
echo x - Perf.T5100
sed '/^X/s///' > Perf.T5100 << '/'
XMachine: Toshiba 5100 (80386; 16 MHz; 1-ws 32 bit memory; no cache)
X
XFunctions called 25,000 times each
X
XFunction call    		% of old time
X-----------------		-------------
Xmemcpy(s1, s2, n)		[n=4]:33	[n=25]:18	[n=1024]:7
Xstrcpy(s1, s2)			[s2=ATOE]:44	[s2=ATOZ]:44
Xstrncpy(s1, s2, n)		[s2=ATOZ,n=10]:41
Xstrcat(ATOJ, ATOE + 1)		46
Xmemcmp(buf, buf2, n)		[n=4]:25	[n=25]:12	[n=1024]:7
Xstrcmp(s1, s2)			[len=5]:39	[len=25]:35
Xstrncmp(s1, s2, n)		[n=4]:44	[n=25]:31
Xmemchr(ATOZ, c, 25)		[c='E']:27	[c='Z']:21
Xstrchr(ATOZ, c)			[c='E']:37	[c='Z']:32
Xstrcspn("word list", s)		[s=" "]:34	[s=" \t\r\n"]:38
Xstrpbrk("word list", s)		[s=" "]:29	[s=" \t\r\n"]:33
Xstrrchr(ATOZ, c)		[c='A']:31	[c='M']:25	[c='Z']:18
Xstrspn("0175713", "01234567")	30
Xstrstr(ATOZ, s)			[s="a"]:55	[s="y"]:55	[s="klmnop"]:48
Xstrtok("a a ... a a"," ") ...
Xstrtok(NULL, " ") ...		38
Xmemset(buf, 0, n)		[n=4]:30	[n=1024]:9	[n=1024(386)]:5
Xstrlen(s)			[s=ATOE]:40	[s=ATOZ]:35
/
echo x - prototype.h
sed '/^X/s///' > prototype.h << '/'
X#ifndef __PROTOTYPE_H
X#define __PROTOTYPE_H
X
X#ifdef __STDC__
X#define _PROTO(p) p
X#else
X#define _PROTO(p) ()
X#define const
X#endif
X
X#endif /* !defined __PROTOTYPE_H */
/
echo x - string.h
sed '/^X/s///' > string.h << '/'
X#ifndef __STRING_H
X#define __STRING_H
X
X/* --- Inclusions --- */
X#include "prototype.h"
X
X/* --- Constants --- */
X#ifndef __STDC__
X#define NULL	0
X#else
X#define NULL	((void *) 0)
X#endif
X
X/* --- Types --- */
X#ifndef __SIZE_T
X#define __SIZE_T
Xtypedef unsigned int size_t;
X#endif
X
X/* --- Prototypes --- */
Xvoid	*memcpy		_PROTO((void *dst, const void *src, size_t n));
Xvoid	*memmove	_PROTO((void *dst, const void *src, size_t n));
Xchar	*strcpy		_PROTO((char *dst, const char *src));
Xchar	*strncpy	_PROTO((char *dst, const char *src, size_t n));
Xchar	*strcat		_PROTO((char *dst, const char *src));
Xchar	*strncat	_PROTO((char *dst, const char *src, size_t n));
Xint	memcmp		_PROTO((const void *s1, const void *s2, size_t n));
Xint	strcmp		_PROTO((const char *s1, const char *s2));
Xint	strcoll		_PROTO((const char *s1, const char *s2));
Xint	strncmp		_PROTO((const char *s1, const char *s2, size_t n));
Xsize_t	strxfrm		_PROTO((char *dst, const char *src, size_t n));
Xvoid	*memchr		_PROTO((const void *s, int c, size_t n));
Xchar	*strchr		_PROTO((const char *s, int c));
Xsize_t	strcspn		_PROTO((const char *s, const char *reject));
Xchar	*strpbrk	_PROTO((const char *s, const char *breakat));
Xchar	*strrchr	_PROTO((const char *s, int c));
Xsize_t	strspn		_PROTO((const char *s, const char *accept));
Xchar	*strstr		_PROTO((const char *s, const char *wanted));
Xchar	*strtok		_PROTO((char *s, const char *delim));
Xvoid	*memset		_PROTO((void *s, int c, size_t n));
Xchar	*strerror	_PROTO((int errnum));
Xsize_t	strlen		_PROTO((const char *s));
X
X/*
X * V7 and Berklix compatibility.
X */
X#ifdef _V7
X#define index(s, c)		strchr(s, c)
X#define rindex(s, c)		strrchr(s, c)
X#endif
X#ifdef _BSD
X#define bcopy(src, dst, n)	memcpy(dst, src, n)
X#define bcmp(s1, s2, n)		memcmp(s1, s2, n)
X#define bzero(dst, n)		memset(dst, 0, n)
X#endif
X
X#endif /* !defined __STRING_H */
/
echo x - memchr.x
sed '/^X/s///' > memchr.x << '/'
X/* memchr.x
X *	void *memchr(const void *s, int c, size_t n)
X *
X *	Returns a pointer to the first occurrence of c (converted to
X *	unsigned char) in the object pointed to by s, NULL if none.
X */
X
X.define	_MEMchr
X.globl	_MEMchr
X.text
X_MEMchr:
X	mov	bx,di		/* save di */
X	mov	di,sp
X	xor	dx,dx		/* default result is NULL */
X	mov	cx,6(di)
X	jcxz	exit		/* early exit if n == 0 */
X	movb	al,4(di)
X	mov	di,2(di)
X	cld
X	repne
X	scab
X	jne	exit
X#ifdef i8088
X	dec	di
X	mov	dx,di
X#else
X	lea	dx,-1(di)
X#endif
Xexit:
X	mov	di,bx		/* restore di */
X	mov	ax,dx
X	ret
/
echo x - memcmp.x
sed '/^X/s///' > memcmp.x << '/'
X/* memcmp.x
X *	int memcmp(const void *s1, const void *s2, size_t n)
X *
X *	Compares the first n characters of the objects pointed to by
X *	s1 and s2.  Returns zero if all characters are identical, a
X *	positive number if s1 greater than s2, a negative number otherwise.
X */
X
X#define BYTE_LIMIT 10		/* if n is above this, work with words */
X
X.define	_MEMcmp
X.globl	_MEMcmp
X.text
X_MEMcmp:
X	mov	bx,sp
X	push	si
X	push	di
X	xor	ax,ax		/* default return is equality */
X	mov	cx,6(bx)
X	jcxz	exit		/* early exit if n == 0 */
X	mov	si,2(bx)
X	mov	di,4(bx)
X	cmp	si,di
X	je	exit		/* early exit if s1 == s2 */
X	cld
X	cmp	cx,*BYTE_LIMIT
X	ja	word_compare
Xbyte_compare:
X	repe
X	cmpb
X	jne	find_difference
X	pop	di
X	pop	si
X	ret
Xword_compare:
X	test	si,#1		/* align s1 on word boundary */
X	jz	word_aligned
X	cmpb
X	jne	find_difference
X	dec	cx
Xword_aligned:
X	mov	dx,cx		/* save count */
X	shr	cx,#1		/* compare words, not bytes */
X	jz	almost_done
X	repe
X	cmp
X	je	almost_done
X	mov	ax,-2(si)	/* fetch mismatched words */
X	sub	ax,-2(di)
X	orb	al,al
X	jz	find_difference	/* if low bytes match, high byte must not */
X	cbw
X	pop	di
X	pop	si
X	ret
Xalmost_done:			/* most of string compared equal */
X	test	dx,#1
X	jz	exit
X	inc	si
X	inc	di
Xfind_difference:
X	movb	al,-1(si)	/* mismatch - determine > or < */
X	subb	al,-1(di)
X	cbw
Xexit:
X	pop	di
X	pop	si
X	ret
/
echo x - memmove.x
sed '/^X/s///' > memmove.x << '/'
X/* memmove.x
X *	void *memmove(void *s1, const void *s2, size_t n)
X *	void *memcpy(void *s1, const void *s2, size_t n)
X *
X *	Copy n characters from the object pointed to by s2 into the
X *	object pointed to by s1.  Copying takes place as if the n
X *	characters pointed to by s2 are first copied to a temporary
X *	area and then copied to the object pointed to by s1.
X *
X *	Per X3J11, memcpy may have undefined results if the objects
X *	overlap; since the performance penalty is insignificant, we
X *	use the safe memmove code for it as well.
X */
X
X#define BYTE_LIMIT 10		/* if n is above this, work with words */
X
X.define	_MEMmove, _MEMcpy
X.globl	_MEMmove, _MEMcpy
X.text
X_MEMmove:
X_MEMcpy:
X	mov	bx,si		/* save si and di */
X	mov	dx,di
X	mov	di,sp
X	mov	cx,6(di)
X	mov	si,4(di)
X	mov	di,2(di)
X	mov	ax,di		/* save a copy of s1 */
X	jcxz	exit		/* early exit if n == 0 */
X	sub	di,si
X	je	exit		/* early exit if s1 == s2 */
X	jb	left_to_right	/* left to right if s1 < s2 */
X	cmp	di,cx
X	jae	left_to_right	/* left to right if no overlap */
Xright_to_left:
X	mov	di,ax		/* retrieve s1 */
X	std
X	add	si,cx		/* compute objects' end addresses */
X	dec	si
X	add	di,cx
X	dec	di
X	cmp	cx,#BYTE_LIMIT
X	jbe	byte_move
X	test	si,#1		/* align source on word boundary */
X	jnz	1f
X	movb
X	dec	cx
X1:
X	dec	si		/* adjust to word boundary */
X	dec	di
X	shr	cx,#1		/* move words, not bytes */
X	rep
X	movw
X	jnc	exit
X#ifdef i8088
X	inc	si		/* fix up addresses for right to left moves */
X	inc	di
X	movb			/* move leftover byte */
X#else
X	movb	cl,1(si)
X	movb	1(di),cl	/* move leftover byte */
X#endif
X	jmp	exit
Xleft_to_right:
X	mov	di,ax		/* retrieve s1 */
X	cld
X	cmp	cx,#BYTE_LIMIT
X	jbe	byte_move
X	test	si,#1		/* align source on word boundary */
X	jz	word_move
X	movb
X	dec	cx
Xword_move:
X	shr	cx,#1		/* move words, not bytes */
X	rep
X	movw
X	rcl	cx,#1		/* set up to move leftover byte */
Xbyte_move:
X	rep
X	movb
Xexit:
X	cld			/* restore direction flag */
X	mov	si,bx		/* restore si and di */
X	mov	di,dx
X	ret
/
echo x - memset.x
sed '/^X/s///' > memset.x << '/'
X/* memset.x
X *	void *memset(void *s, int c, size_t n)
X *
X *	Copies the value of c (converted to unsigned char) into the
X *	first n locations of the object pointed to by s.
X */
X
X#ifdef i80386
X#define BYTE_LIMIT	16	/* if n is above this, work with doublewords */
X#define SIZE_OVERRIDE	.byte 102 /* force 32 bits */
X#define SHLAX(n)	.byte 193,224,n
X#define SHRCX(n)	.byte 193,233,n
X#else
X#define BYTE_LIMIT	10	/* if n is above this, work with words */
X#endif
X
X.define	_MEMset
X.globl	_MEMset
X.text
X_MEMset:
X	mov	bx,di		/* save di */
X	mov	di,sp
X	mov	cx,6(di)
X	jcxz	exit		/* early exit if n == 0 */
X	movb	al,4(di)
X	mov	di,2(di)
X	cld
X	cmp	cx,*BYTE_LIMIT
X	jbe	byte_set
X	movb	ah,al		/* set up second byte */
X	test	di,#1		/* align on word boundary */
X	jz	word_aligned
X	stob
X	dec	cx
Xword_aligned:
X#ifdef i80386
X	test	di,#2		/* align on doubleword boundary */
X	jz	dword_aligned
X	stow
X	sub	cx,*2
Xdword_aligned:
X	mov	dx,ax		/* duplicate byte in all bytes of EAX */
X	SIZE_OVERRIDE
X	SHLAX	(16)
X	mov	ax,dx
X	mov	dx,cx		/* save count */
X	SHRCX	(2)
X	rep
X	SIZE_OVERRIDE
X	stow
X	and	dx,#3		/* set up to set leftover bytes */
X	mov	cx,dx
X#else
X	shr	cx,#1		/* set words, not bytes */
X	rep
X	stow
X	rcl	cx,#1		/* set up to set leftover byte */
X#endif
Xbyte_set:
X	rep
X	stob
Xexit:
X	mov	di,bx		/* restore di */
X	ret
/
echo x - strcat.x
sed '/^X/s///' > strcat.x << '/'
X/* strcat.x
X *	char *strcat(char *s1, const char *s2)
X *
X *	Concatenates the string pointed to by s2 onto the end of the
X *	string pointed to by s1.  Returns s1.
X */
X
X.define _STRcat
X.globl	_STRcat
X.text
X_STRcat:
X	mov	bx,si		/* save si and di */
X	mov	dx,di
X	mov	si,sp
X	mov	di,2(si)
X	push	di		/* save return value */
X	mov	si,4(si)
X	cmpb	(si),*0
X	je	exit		/* early exit if s2 is the null string */
X	cld
X	mov	cx,#-1		/* find end of s1 */
X	xorb	al,al
X	repne
X	scab
X	dec	di		/* point back at null character */
X	test	si,#1		/* align source on word boundary */
X	jz	word_copy
X	movb
Xword_copy:			/* loop to copy words */
X	lodw
X	orb	al,al
X	jz	move_last_byte	/* exit if low byte == 0 */
X	stow
X	orb	ah,ah
X	jnz	word_copy
X	jmp	exit
Xmove_last_byte:
X	stob			/* add odd zero byte */
Xexit:
X	mov	si,bx
X	mov	di,dx
X	pop	ax
X	ret
/
echo x - strchr.x
sed '/^X/s///' > strchr.x << '/'
X/* strchr.x
X *	char *strchr(const char *s, int c)
X *
X *	Returns location of the first occurrence of c (converted to char)
X *	in the string pointed to by s.  Returns NULL if c does not occur.
X */
X
X.define	_STRchr
X.globl	_STRchr
X.text
X_STRchr:
X	mov	bx,si		/* save si */
X	mov	si,sp
X	movb	dl,4(si)
X	mov	si,2(si)
X	cld
X	test	si,#1		/* align string on word boundary */
X	jz	word_loop
X	lodb
X	cmpb	al,dl
X	je	one_past
X	orb	al,al
X	jz	no_match
Xword_loop:			/* look for c word by word */
X	lodw
X	cmpb	al,dl
X	je	two_past
X	orb	al,al
X	jz	no_match
X	cmpb	ah,dl
X	je	one_past
X	orb	ah,ah
X	jnz	word_loop
Xno_match:
X	xor	ax,ax
X	mov	si,bx		/* restore si */
X	ret
Xtwo_past:
X	dec	si
Xone_past:
X#ifdef i8088
X	dec	si
X	mov	ax,si
X#else
X	lea	ax,-1(si)
X#endif
X	mov	si,bx		/* restore si */
X	ret
/
echo x - strcmp.x
sed '/^X/s///' > strcmp.x << '/'
X/* strcmp.x
X *	int strcmp(const char *s1, const char *s2)
X *
X *	Compares the strings pointed to by s1 and s2.  Returns zero if
X *	strings are identical, a positive number if s1 greater than s2,
X *	and a negative number otherwise.
X */
X
X.define	_STRcmp
X.globl	_STRcmp
X.text
X_STRcmp:
X	mov	bx,si		/* save si and di */
X	mov	cx,di
X	mov	di,sp
X	mov	si,2(di)
X	mov	di,4(di)
X	xor	ax,ax		/* default return is equality */
X	cmp	si,di
X	je	exit		/* early exit if s1 == s2 */
X	cld
X	test	si,#1		/* align s1 on word boundary */
X	jz	word_loop
X	lodb
X	orb	al,al
X	jz	last_byte_test
X	subb	al,(di)
X	jnz	exit
X	inc	di
Xword_loop:			/* loop through string by words */
X	mov	ax,(si)
X	orb	al,al
X	jz	last_byte_test
X	orb	ah,ah
X	jz	high_byte_zero
X	cmp
X	je	word_loop
X	mov	ax,-2(si)	/* find mismatch in final word */
X	sub	ax,-2(di)
X	orb	al,al
X	jnz	exit
X	movb	al,ah
X	jmp	exit
Xhigh_byte_zero:
X	subb	al,(di)
X	jnz	exit
X/*	movb	al,ah		/* don't need this: al == ah == 0 */
X	inc	di
Xlast_byte_test:
X	subb	al,(di)
Xexit:
X	cbw
X	mov	si,bx		/* restore si and di */
X	mov	di,cx
X	ret
/
echo x - strcpy.x
sed '/^X/s///' > strcpy.x << '/'
X/* strcpy.x
X *	char *strcpy(char *s1, const char *s2)
X *
X *	Copy the string pointed to by s2, including the terminating null
X *	character, into the array pointed to by s1.  Returns s1.
X */
X
X.define	_STRcpy
X.globl	_STRcpy
X.text
X_STRcpy:
X	mov	bx,si		/* save si and di */
X	mov	cx,di
X	mov	di,sp
X	mov	si,4(di)
X	mov	di,2(di)
X	mov	dx,di
X	cld
X	test	si,#1		/* align source on word boundary */
X	jz	word_copy
X	lodb
X	stob
X	orb	al,al
X	jz	exit
Xword_copy:			/* loop to copy words */
X	lodw
X	orb	al,al
X	jz	move_last_byte	/* early exit if low byte == 0 */
X	stow
X	orb	ah,ah
X	jnz	word_copy
X	jmp	exit
Xmove_last_byte:
X	stob			/* add odd zero byte */
Xexit:
X	mov	ax,dx
X	mov	si,bx		/* restore si and di */
X	mov	di,cx
X	ret
/
echo x - strcspn.x
sed '/^X/s///' > strcspn.x << '/'
X/* strcspn.x
X *	size_t strcspn(const char *s1, const char *s2)
X *
X *	Returns the length of the longest prefix of the string pointed
X *	to by s1 that has none of the characters in the string s2.
X */
X
X.define	_STRcspn
X.globl	_STRcspn
X.text
X_STRcspn:
X	push	bp
X	mov	bp,sp
X	push	si
X	push	di
X	mov	si,4(bp)
X	mov	di,6(bp)
X	cld
X	mov	bx,#-1		/* set up count (-1 for faster loops) */
X	cmpb	(di),*0
X	jz	s1_length	/* if s2 is null, we return s1's length */
X	cmpb	1(di),*0
X	jz	find_match	/* if s2 has length one, we take a shortcut */
X	mov	cx,bx		/* find length of s2 */
X	xorb	al,al
X	repne
X	scab
X	not	cx
X	dec	cx
X	mov	dx,cx		/* save length of s2 */
Xs1_loop:			/* loop over s1 looking for matches with s2 */
X	lodb
X	inc	bx
X	orb	al,al
X	jz	exit
X	mov	di,6(bp)
X	mov	cx,dx
X	repne
X	scab
X	jne	s1_loop
X	jmp	exit
Xs1_length:			/* find length of s1 */
X	mov	di,si
X	mov	cx,bx
X	xorb	al,al
X	repne
X	scab
X	not	cx
X	dec	cx
X	mov	bx,cx
X	jmp	exit
Xfind_match:			/* find a match for *s2 in s1 */
X	movb	dl,(di)
X	test	si,#1		/* align source on word boundary */
X	jz	word_loop
X	lodb
X	inc	bx
X	orb	al,al
X	je	exit
X	cmpb	al,dl
X	je	exit
Xword_loop:
X	lodw
X	inc	bx
X	orb	al,al
X	je	exit
X	cmpb	al,dl
X	je	exit
X	inc	bx
X	orb	ah,ah
X	je	exit
X	cmpb	ah,dl
X	jne	word_loop
Xexit:
X	mov	ax,bx
X	pop	di
X	pop	si
X	mov	sp,bp
X	pop	bp
X	ret
/
echo x - strerror.x
sed '/^X/s///' > strerror.x << '/'
X/* strerror.x
X *	char *strerror(int errnum)
X *
X *	Returns a pointer to an appropriate error message string.
X */
X
X.define	_STRerror
X.globl	_STRerror
X.data
Xunknown: .asciz 'Unknown error'
X.text
X_STRerror:
X	mov	bx,sp
X	mov	bx,2(bx)
X	mov	ax,#unknown	/* default return is "Unknown error" */
X	or	bx,bx
X	jle	exit
X	cmp	bx,_sys_nerr
X	jge	exit
X	sal	bx,#1
X	mov	ax,_sys_errlist(bx)
Xexit:
X	ret
/
echo x - strlen.x
sed '/^X/s///' > strlen.x << '/'
X/* strlen.x
X *	size_t strlen(const char *s)
X *
X *	Returns the length of the string pointed to by s.
X */
X
X.define	_STRlen
X.globl	_STRlen
X.text
X_STRlen:
X	mov	bx,di		/* save di */
X	mov	di,sp
X	mov	di,2(di)
X	mov	cx,#-1
X	xorb	al,al
X	cld
X	repne
X	scab
X	not	cx		/* silly trick gives length (including null) */
X	dec	cx		/* forget about null */
X	mov	ax,cx
X	mov	di,bx		/* restore di */
X	ret
/
echo x - strncat.x
sed '/^X/s///' > strncat.x << '/'
X/* strncat.x
X *	char *strncat(char *s1, const char *s2, size_t n)
X *
X *	Concatenates up to n characters of the string pointed to by s2
X *	onto the end of the string pointed to by s1.  A terminating
X *	null character is always appended.  Returns s1.
X */
X
X.define	_STRncat
X.globl	_STRncat
X.text
X_STRncat:
X	mov	bx,si		/* save si and di */
X	mov	dx,di
X	mov	si,sp
X	mov	cx,6(si)
X	mov	di,2(si)
X	push	di		/* save return value */
X	jcxz	exit		/* early exit if n == 0 */
X	cld
X	mov	cx,#-1		/* find end of s1 */
X	xorb	al,al
X	repne
X	scab
X	dec	di
X	mov	cx,6(si)
X	mov	si,4(si)
Xbyte_loop:			/* loop to copy bytes */
X	lodb
X	stob
X	orb	al,al
X	loopnz	byte_loop
X	jz	exit
X	movb	(di),*0		/* add terminating null character */
Xexit:
X	mov	si,bx		/* restore si and di */
X	mov	di,dx
X	pop	ax
X	ret
/
echo x - strncmp.x
sed '/^X/s///' > strncmp.x << '/'
X/* strncmp.x
X *	int strncmp(const char *s1, const char *s2, size_t n)
X *
X *	Compares up to n characters from the strings pointed to by s1
X *	and s2.  Returns zero if the (possibly null terminated) arrays
X *	are identical, a positive number if s1 is greater than s2, and
X *	a negative number otherwise.
X */
X
X.define	_STRncmp
X.globl	_STRncmp
X.text
X_STRncmp:
X	mov	bx,sp
X	push	si
X	push	di
X	xor	ax,ax		/* default result is equality */
X	mov	cx,6(bx)
X	jcxz	exit		/* early exit if n == 0 */
X	mov	si,2(bx)
X	mov	di,4(bx)
X	cmp	si,di
X	je	exit		/* early exit if s1 == s2 */
X	cld
X	test	si,#1		/* align s1 on word boundary */
X	jz	set_length
X	lodb
X	orb	al,al
X	jz	last_byte_test
X	subb	al,(di)
X	jne	exit
X	dec	cx
X	jz	exit		/* early exit if n == 1 */
X	inc	di
Xset_length:
X	mov	dx,cx		/* save count */
X	shr	cx,#1		/* work with words, not bytes */
X	jz	fetch_last_byte
Xword_loop:			/* loop through string by words */
X	mov	ax,(si)
X	orb	al,al
X	jz	last_byte_test
X	orb	ah,ah
X	jz	high_byte_zero
X	cmp
X	loope	word_loop
X	je	fetch_last_byte
X	mov	ax,-2(si)	/* find mismatch in final word */
X	sub	ax,-2(di)
X	orb	al,al
X	jnz	exit
X	movb	al,ah
X	jmp	exit
Xfetch_last_byte:
X	xor	ax,ax
X	test	dx,#1
X	jz	exit
X	movb	al,(si)
X	jmp	last_byte_test
Xhigh_byte_zero:
X	subb	al,(di)
X	jnz	exit
X	movb	al,ah
X	inc	di
Xlast_byte_test:
X	subb	al,(di)
Xexit:
X	cbw
X	pop	di
X	pop	si
X	ret
/
echo x - strncpy.x
sed '/^X/s///' > strncpy.x << '/'
X/* strncpy.x
X *	char *strncpy(char *s1, const char *s2, size_t n)
X *
X *	Copy up to n characters from the string pointed to by s2 to
X *	the array pointed to by s1.  If the source string is shorter
X *	than n characters, the remainder of the destination is padded
X *	with null characters.  If the source is longer than n characters,
X *	the destination will not be null terminated.  Returns s1.
X */
X
X#define BYTE_LIMIT 10		/* if n is above this, zero fill with words */
X
X.define	_STRncpy
X.globl	_STRncpy
X.text
X_STRncpy:
X	mov	bx,sp
X	push	si
X	push	di
X	mov	cx,6(bx)
X	jcxz	exit		/* early exit if n == 0 */
X	mov	di,2(bx)
X	mov	si,4(bx)
X	cld
X	cmpb	(si),*0
X	je	zero_fill	/* if s2 has length zero, take a short cut */
X	test	si,#1		/* align source on word boundary */
X	jz	set_length
X	movb
X	dec	cx
X	jz	exit		/* early exit if n == 1 */
Xset_length:
X	mov	dx,cx		/* save count */
X	shr	cx,#1		/* copy words, not bytes */
X	jz	last_byte
Xword_copy:			/* loop to copy words */
X	lodw
X	orb	al,al
X	jz	restore_length	/* early exit if low byte == 0 */
X	stow
X	orb	ah,ah
X	loopnz	word_copy
X	jz	restore_length
Xlast_byte:
X	test	dx,#1		/* move leftover byte */
X	jz	exit
X	movb
X	jmp	exit
Xrestore_length:			/* retrieve remaining length (in bytes) */
X	shl	cx,#1
X	and	dx,#1
X	add	cx,dx
Xzero_fill:			/* add null characters if necessary */
X	xor	ax,ax
X	cmp	cx,*BYTE_LIMIT
X	jbe	zero_bytes
X	test	di,#1		/* align destination on word boundary */
X	jz	zero_words
X	stob
X	dec	cx
Xzero_words:
X	shr	cx,#1		/* zero words, not bytes */
X	rep
X	stow
X	rcl	cx,#1		/* set up for leftover byte */
Xzero_bytes:
X	rep
X	stob
Xexit:
X	pop	di
X	pop	si
X	mov	ax,2(bx)
X	ret
/
echo x - strpbrk.x
sed '/^X/s///' > strpbrk.x << '/'
X/* strpbrk.x
X *	char *strpbrk(const char *s1, const char *s2)
X *
X *	Returns the address of the first character of the string pointed
X *	to by s1 that is in the string pointed to by s2.  Returns NULL
X *	if no such character exists.
X */
X
X.define	_STRpbrk
X.globl	_STRpbrk
X.text
X_STRpbrk:
X	mov	bx,sp
X	push	si
X	push	di
X	mov	si,2(bx)
X	mov	di,4(bx)
X	mov	bx,di		/* save a copy of s2 */
X	cld
X	xor	ax,ax		/* default return value is NULL */
X	cmpb	(di),*0
X	jz	exit		/* if s2 has length zero, we are done */
X	cmpb	1(di),*0
X	jz	find_match	/* if s2 has length one, we take a shortcut */
X	mov	cx,#-1		/* find length of s2 */
X	repne
X	scab
X	not	cx
X	dec	cx
X	mov	dx,cx		/* save length of s2 */
Xs1_loop:			/* loop through s1 to find matches with s2 */
X	lodb
X	orb	al,al
X	jz	exit
X	mov	di,bx
X	mov	cx,dx
X	repne
X	scab
X	jne	s1_loop
X#ifdef i8088
X	dec	si
X	mov	ax,si
X#else
X	lea	ax,-1(si)
X#endif
X	pop	di
X	pop	si
X	ret
Xfind_match:			/* find a match for *s2 in s1 */
X	movb	dl,(di)
X	test	si,#1		/* align source on word boundary */
X	jz	word_loop
X	lodb
X	cmpb	al,dl
X	je	one_past
X	orb	al,al
X	jz	no_match
Xword_loop:
X	lodw
X	cmpb	al,dl
X	je	two_past
X	orb	al,al
X	jz	no_match
X	cmpb	ah,dl
X	je	one_past
X	orb	ah,ah
X	jnz	word_loop
Xno_match:
X	xor	ax,ax
X	pop	di
X	pop	si
X	ret
Xtwo_past:
X	dec	si
Xone_past:
X#ifdef i8088
X	dec	si
X	mov	ax,si
X#else
X	lea	ax,-1(si)
X#endif
Xexit:
X	pop	di
X	pop	si
X	ret
/
echo x - strrchr.x
sed '/^X/s///' > strrchr.x << '/'
X/* strrchr.x
X *	char *strrchr(const char *s, int c)
X *
X *	Locates final occurrence of c (as unsigned char) in string s.
X */
X
X.define	_STRrchr
X.globl	_STRrchr
X.text
X_STRrchr:
X	mov	bx,di		/* save di */
X	mov	di,sp
X	xor	dx,dx		/* default result is NULL */
X	movb	ah,4(di)
X	mov	di,2(di)
X	cld
X	mov	cx,#-1		/* find end of string */
X	xorb	al,al
X	repne
X	scab
X	not	cx		/* silly trick gives length (including null) */
X	dec	di		/* point back at null character */
X	movb	al,ah		/* find last occurrence of c */
X	std
X	repne
X	scab
X	jne	exit
X#ifdef i8088
X	inc	di
X	mov	dx,di
X#else
X	lea	dx,1(di)
X#endif
Xexit:
X	cld			/* clear direction flag */
X	mov	di,bx		/* restore di */
X	mov	ax,dx
X	ret
/
echo x - strspn.x
sed '/^X/s///' > strspn.x << '/'
X/* strspn.x
X *	size_t strspn(const char *s1, const char *s2)
X *
X *	Returns the length of the longest prefix of the string pointed
X *	to by s1 that is made up of the characters in the string s2.
X */
X
X.define	_STRspn
X.globl	_STRspn
X.text
X_STRspn:
X	push	bp
X	mov	bp,sp
X	push	si
X	push	di
X	mov	si,4(bp)
X	mov	di,6(bp)
X	cld
X	xor	ax,ax		/* default return value is zero */
X	cmpb	(di),*0
X	jz	exit		/* if s2 has length zero, we are done */
X	cmpb	1(di),*0
X	jz	find_mismatch	/* if s2 has length one, we take a shortcut */
X	mov	cx,#-1		/* find length of s2 */
X	repne
X	scab
X	not	cx
X	dec	cx
X	mov	dx,cx		/* save length of s2 */
X	mov	bx,#-1		/* set up byte count for faster loop */
Xs1_loop:			/* loop over s1 looking for matches with s2 */
X	lodb
X	inc	bx
X	orb	al,al
X	jz	exit
X	mov	di,6(bp)
X	mov	cx,dx
X	repne
X	scab
X	je	s1_loop
X	mov	ax,bx
X	jmp	exit
Xfind_mismatch:			/* find a character in s1 that isn't *s2 */
X	movb	al,(di)
X	mov	di,si
X	mov	cx,#-1
X	repe
X	scab
X	dec	di		/* point back at mismatch */
X	mov	ax,di
X	sub	ax,si		/* number of matched characters */
Xexit:
X	pop	di
X	pop	si
X	mov	sp,bp
X	pop	bp
X	ret
/
echo x - strstr.x
sed '/^X/s///' > strstr.x << '/'
X/* strstr.x
X *	char * strstr(const char *s1, const char *s2)
X *
X *	Returns a pointer to the first occurrence in the string pointed
X *	to by s1 that is made up of the characters in the string s2.
X */
X
X.define	_STRstr
X.globl	_STRstr
X.text
X_STRstr:
X	push	bp
X	mov	bp,sp
X	sub	sp,#2		| make room for locals
X	push	si
X	push	di
X	mov	si,4(bp)
X	mov	di,6(bp)
X	mov	bx,si		| default result is s1
X	movb	ah,(di)		| fetch first character of s2
X	orb	ah,ah
X	je	exit		| if s2 is null, we are done
X	cld
X	mov	cx,#-1		| find length of s2
X	xorb	al,al
X	repne
X	scab
X	not	cx
X	dec	cx
X	mov	-2(bp),cx	| save length of s2
X	mov	cx,#-1		| find length + 1 of s1
X	mov	di,si
X	repne
X	scab
X	not	cx
X	sub	cx,-2(bp)	| |s1| - |s2| + 1 is number of possibilities
X	jbe	not_found	| if |s1| < |s2|, give up right now
X	mov	dx,cx
X	inc	dx		| set up for faster loop
X	dec	bx
Xs1_loop:
X	dec	dx
X	jz	not_found
X	inc	bx
X	cmpb	ah,(bx)
X	jne	s1_loop		| if first characters don't match, try another
X	mov	di,6(bp)
X	mov	si,bx
X	mov	cx,-2(bp)
X	repe
X	cmpb
X	jne	s1_loop
X	jmp	exit
Xnot_found:
X	xor	bx,bx
Xexit:
X	mov	ax,bx
X	pop	di
X	pop	si
X	mov	sp,bp
X	pop	bp
X	ret
/
echo x - strtok.x
sed '/^X/s///' > strtok.x << '/'
X/* strtok.x
X *	char *strtok(char *s1, const char *s2)
X *
X *	Returns a pointer to the "next" token in s1.  Tokens are 
X *	delimited by the characters in the string pointed to by s2.
X */
X
X.define	_STRtok
X.globl	_STRtok
X.data
Xscan:	.word	0
X.text
X_STRtok:
X	push	bp
X	mov	bp,sp
X	push	si
X	push	di
X	cld
X	mov	bx,4(bp)
X	or	bx,bx		/* if s != NULL, */
X	jnz	s2_length	/*   we start a new string */
X	mov	bx,scan
X	or	bx,bx		/* if old string exhausted, */
X	jz	exit		/*   exit early */
Xs2_length:			/* find length of s2 */
X	mov	di,6(bp)
X	mov	cx,#-1
X	xorb	al,al
X	repne
X	scab
X	not	cx
X	dec	cx
X	jz	string_finished	/* if s2 has length zero, we are done */
X	mov	dx,cx		/* save length of s2 */
X
X	mov	si,bx
X	xor	bx,bx		/* return value is NULL */
Xdelim_loop:			/* dispose of leading delimiters */
X	lodb
X	orb	al,al
X	jz	string_finished
X	mov	di,6(bp)
X	mov	cx,dx
X	repne
X	scab
X	je	delim_loop
X
X	lea	bx,-1(si)	/* return value is start of token */
Xtoken_loop:			/* find end of token */
X	lodb
X	orb	al,al
X	jz	string_finished
X	mov	di,6(bp)
X	mov	cx,dx
X	repne
X	scab
X	jne	token_loop
X	movb	-1(si),*0	/* terminate token */
X	mov	scan,si		/* set up for next call */
X	jmp	exit
Xstring_finished:
X	mov	scan,#0		/* ensure NULL return in future */
Xexit:
X	mov	ax,bx
X	pop	di
X	pop	si
X	mov	sp,bp
X	pop	bp
X	ret
/