nfs@notecnirp.Princeton.EDU (Norbert Schlenker) (11/22/89)
Following is an updated version of my string package for PC's. This package now passes all of the following tests: - Spencer's test routines for <string.h>, except those where the tests conflict with the rules in the December 1988 ANSI draft - My own functional tests, which exercise a few areas where Spencer's tests are weak (primarily unaligned strings and some boundary cases) - The Cagney/Chew/Evans .s->.asm translation - The "install and use it" test for the naive user As before, the routines are 40%-95% faster than the generic routines. Relinking utilities that use <string.h> is generally worthwhile. Enjoy! ----------------------------- Snip snip -------------------------------- #!/bin/sh echo x - Readme sed '/^X/s///' > Readme << '/' XHaving done some a favour by posting a new stdio package, which I believe Xto be portable across Minix platforms, I am now going to do others a Xfavour (but to ast, perhaps a disservice) by posting an assembly language Xversion of the ANSI C string package for PC's and compatibles. All of Xthe required routines, save for strcoll() and strxfrm() [missing since XI don't have a use for locale specific routines] are included. The XBerkeley compatibility routines are not included, although the included X<string.h> defines them as macros. memccpy() is not included, since I Xcannot find consistent documentation as to its function. X XHenry Spencer's C routines are widely used, very reliable, very portable, Xand are easily compiled into reasonably efficient code. They can take no Xadvantage, of course, of special architectural features, which Intel Xprocessors possess in abundance in this case. If the best that could be Xdone was a 10-20% improvement in the string code, which I would consider Xfairly typical for assembly over C, I wouldn't consider it worthwhile. XBut my rewritten routines show much larger improvements for typical Xinputs - from 40% to 95% depending on function. X XThe code that I write tends to use a lot of str[len|cpy|cmp]() and Xmem[set|cpy](). The improvements for these routines are substantial Xenough that I use my assembly language versions. The recent Dhrystone X1.1 posting by ast shows a 40% increase in Dhrystone rating on my Xmachine with these routines (only strcpy() and strcmp() are used there). X XThe code is faster for a number of reasons. It uses special instructions Xnot generated by the C compiler, pays careful attention to register Xcontents, uses simplified linkage, unrolls most loops once, and takes Xadvantage of word alignment where possible. The first three involve Xfairly simple adaptations of Spencer's C code to the Intel architecture. XThe last two are sometimes unpleasant. Unrolling loops once saves 10-15% Xin most cases; the attention to alignment saves 3-5% on top of that. The Xcode is less clear (in some cases, MUCH less clear) and harder to debug, Xbut 20% is not to be sniffed at. X XThe code was optimized on a Toshiba 5100, which has an 80386 and uncached X1-wait state 32 bit memory. As a result, the code may not be optimal on Xother machines. I expect it to be quite good for 16 bit CPU's, with Xperhaps slightly less improvement on 8088's, where the attention to Xalignment is wasted. X XI am open to bug reports and suggestions for improvement. I am also Xinterested in reports of performance on other machines. To that end, I Xhave included programs that compute the improvement for a variety of Xroutines automagically; please email the results to me along with a Xdescription of your CPU and memory architecture. I have included a Xcopy of the program's output for my machine in the file Perf.T5100; XI hope the improvements shown there will be typical. X XTo use the new string package, check that the macro definitions at the Xtop of the makefile are compatible with your configuration. By default, Xthe makefile generates the performance comparison with the existing library. XIf you "make install", packed versions of the routines will be installed in Xyour library. New versions of two headers, <prototype.h> and <string.h>, Xwill also be copied into your header directory. X XEnjoy! / echo x - Copyright sed '/^X/s///' > Copyright << '/' XCopyright 1989 Norbert Schlenker. All rights reserved. X XThe copyright holder hereby grants to the public the right to use, Xmodify, and redistribute this software freely. X XThis software is provided "as is" and carries no warranty of any kind. / echo x - Makefile sed '/^X/s///' > Makefile << '/' X# Should reflect the location of your C preprocessor. XCPP = /usr/lib/cpp X X# Should reflect the location of your headers. XINCLUDE = /usr/include X X# Should reflect the location of your library. XLIBC = /usr/lib/libc.a X X# Should reflect the target machine. XTARGET = i8088 X#TARGET = i80286 X#TARGET = i80386 X X# The rest should be fine as is. XHDRS = prototype.h string.h X XSRCS = memchr.x memcmp.x memmove.x memset.x strcat.x strchr.x \ X strcmp.x strcpy.x strcspn.x strlen.x strncat.x strncmp.x \ X strncpy.x strpbrk.x strrchr.x strspn.x strstr.x strtok.x \ X strerror.x X XOBJS = memchr.s memcmp.s memmove.s memset.s strcat.s strchr.s \ X strcmp.s strcpy.s strcspn.s strlen.s strncat.s strncmp.s \ X strncpy.s strpbrk.s strrchr.s strspn.s strstr.s strtok.s \ X strerror.s X X.SUFFIXES: .s .x .o .c .y X X.x.s: X $(CPP) -D$(TARGET) $*.x | sed '/^$$/d; /^\#/d' | libpack >$*.s X Xperf: operf nperf analyze X @echo X @echo Analyzing relative performance of old and new string packages ... X @echo ' This test may take up to 10 minutes to run.' X @echo X @operf >operf.out X @nperf >nperf.out X @analyze operf.out nperf.out | tee Perf.local X Xoperf: perf.s X cc -o operf perf.s X Xnperf: perf.s $(OBJS) X cc -o nperf perf.s $(OBJS) X Xanalyze: analyze.c X cc -o analyze analyze.c X Xinstall: $(OBJS) X mv prototype.h string.h $(INCLUDE) X ar dv $(LIBC) memcpy.s # Spencer's memcpy replaced by memmove X ar rlv $(LIBC) $? X Xstring.shar: Readme Copyright Makefile Perf.T5100 perf.c analyze.c $(HDRS) $(SRCS) X shar \ X Readme Copyright Makefile Perf.T5100 \ X perf.c analyze.c \ X $(HDRS) \ X $(SRCS) \ X >string.shar X Xclean: X rm operf nperf analyze operf.out nperf.out Perf.local *.s / echo x - Perf.T5100 sed '/^X/s///' > Perf.T5100 << '/' Xmemcpy(s1, s2, n): [n=4]: 21 [n=25]: 14 [n=1024]: 7 Xstrcpy(s1, s2): [s2=ATOE]: 42 [s2=ATOZ]: 44 Xstrncpy(s1, s2, n): [s2=ATOZ,n=10]: 39 Xmemcmp(buf, buf2, n): [n=4]: 20 [n=25]: 13 [n=1024]: 7 Xstrcmp(s1, s2): [2*ATOE]: 36 [2*ATOZ]: 33 Xstrncmp(s1, s2, n): [n=4]: 6 [n=25]: 25 Xmemchr(ATOZ, c, 25): [c='E']: 19 [c='Z']: 19 Xstrchr(ATOZ, c): [c='E']: 35 [c='Z']: 31 Xstrpbrk("word list", s):[s=" "]: 26 [s=" \t\r\n"]: 35 Xstrrchr(ATOZ, c): [c='A']: 30 [c='E']: 29 [c='Z']: 17 Xstrstr(ATOZ, s): [s="a"]: 55 [s="y"]: 55 [s="klmnop"]: 47 Xmemset(buf, 0, n): [n=4]: 18 [n=1024]: 4 Xstrlen(s): [s=ATOE]: 41 [s=ATOZ]: 34 / echo x - perf.c sed '/^X/s///' > perf.c << '/' X#include <stdio.h> X#include <string.h> X#include <time.h> X#include <sys/types.h> X#include <sys/times.h> X X/* If you have old Minix headers, you may need to define the following. */ X/* You should be using the stdio package I posted (unpaid political ad). */ X/* #define CLK_TCK 60 */ X/* typedef long clock_t; */ X X#define INNERLOOPCOUNT 1000 X X#define TIME_EXECUTION(x, t) { \ X times(&start); \ X for (i = loop_count; --i >= 0; ) \ X for (j = INNERLOOPCOUNT; --j >= 0; ) \ X dummy = (int) x; \ X times(&end); \ X t = end.tms_utime - start.tms_utime; \ X } X X#define PRINT_TIME(x, args, note) { \ X TIME_EXECUTION(x, elapsed_time); \ X printf("\t%s: %d\n", note, elapsed_time - empty_function_time[args]); \ X } X Xstatic char ATOE[] = "ABCDE"; Xstatic char ATOZ[] = "ABCDEFGHIJKLMNOPQRSTUVWXYZ"; X Xstatic int empty_function(x, y, z) Xint x, y, z; X{ X return x; X} X Xmain() X{ X struct tms start, end; X int loop_count; X clock_t empty_function_time[4]; X long ns; X clock_t elapsed_time; X static char s1[31]; X static char s2[30]; X static char buf[1024]; X static char buf2[1024]; X register int i, j; X int dummy; X int a1, a2, a3; X X/* --- Find a loop count that results in nontrivial function call times --- */ X empty_function_time[0] = 0; X empty_function_time[1] = 0; X for (loop_count = 6; empty_function_time[1] < CLK_TCK/10; loop_count += 2) X TIME_EXECUTION(empty_function(a1), empty_function_time[1]); X TIME_EXECUTION(empty_function(a1, a2), empty_function_time[2]); X TIME_EXECUTION(empty_function(a1, a2, a3), empty_function_time[3]); X ns = (1000000L * empty_function_time[2]) / (CLK_TCK * loop_count); X printf("Loop count: %d,000\tTypical function call: %d.%1d us\n", X loop_count, (int) ns/1000, (int) ((ns%1000) + 50)/100); X X/* --- memcpy --- */ X printf("memcpy(s1, s2, n):\n"); X PRINT_TIME(memcpy(s1 + 1, s2, 4), 3, "[n=4]"); X PRINT_TIME(memcpy(s1, s2 + 1, 25), 3, "[n=25]"); X PRINT_TIME(memcpy(buf, buf2, 1024), 3, "[n=1024]"); X X/* --- strcpy --- */ X printf("strcpy(s1, s2):\n"); X PRINT_TIME(strcpy(s1, ATOE), 2, "[s2=ATOE]"); X PRINT_TIME(strcpy(s1, ATOZ), 2, "[s2=ATOZ]"); X X/* --- strncpy --- */ X printf("strncpy(s1, s2, n):\n"); X PRINT_TIME(strncpy(s1, ATOZ, 10), 3, "[s2=ATOZ,n=10]"); X X/* --- memcmp --- */ X printf("memcmp(buf, buf2, n):\n"); X memset(buf, 0, 1024); X memset(buf2, 0, 1024); X PRINT_TIME(memcmp(buf, buf2, 4), 3, "[n=4]"); X PRINT_TIME(memcmp(buf, buf2, 25), 3, "[n=25]"); X PRINT_TIME(memcmp(buf, buf2, 1024), 3, "[n=1024]"); X X/* --- strcmp --- */ X printf("strcmp(s1, s2):\n"); X strcpy(s1, ATOE); X PRINT_TIME(strcmp(s1, ATOE), 2, "[2*ATOE]"); X strcpy(s1, ATOZ); X PRINT_TIME(strcmp(s1 + 1, ATOZ + 1), 2, "[2*ATOZ]"); /* unaligned */ X X/* --- strncmp --- */ X printf("strncmp(s1, s2, n):\n"); X strcpy(s1, ATOZ); X PRINT_TIME(strncmp(s1, ATOZ, 4), 3, "[n=4]"); X PRINT_TIME(strncmp(s1, ATOZ, 25), 3, "[n=25]"); X X/* --- memchr --- */ X printf("memchr(ATOZ, c, 25):\n"); X PRINT_TIME(memchr(ATOZ, 'E', 25), 3, "[c='E']"); X PRINT_TIME(memchr(ATOZ, 'Z', 25), 3, "[c='Z']"); X X/* --- strchr --- */ X printf("strchr(ATOZ, c):\n"); X PRINT_TIME(strchr(ATOZ, 'E'), 2, "[c='E']"); X PRINT_TIME(strchr(ATOZ, 'Z'), 2, "[c='Z']"); X X/* --- strpbrk --- */ X printf("strpbrk(\"word list\", s):\n"); X PRINT_TIME(strpbrk("word list", " "), 2, "[s=\" \"]"); X PRINT_TIME(strpbrk("word list", " \t\r\n"), 2, "[s=\" \\t\\r\\n\"]"); X X/* --- strrchr --- */ X printf("strrchr(ATOZ, c):\n"); X PRINT_TIME(strrchr(ATOZ, 'A'), 2, "[c='A']"); X PRINT_TIME(strrchr(ATOZ, 'E'), 2, "[c='E']"); X PRINT_TIME(strrchr(ATOZ, 'Z'), 2, "[c='Z']"); X X/* --- strstr --- */ X printf("strstr(ATOZ, s):\n"); X PRINT_TIME(strstr(ATOZ, "a"), 2, "[s=\"a\"]"); X PRINT_TIME(strstr(ATOZ, "y"), 2, "[s=\"y\"]"); X PRINT_TIME(strstr(ATOZ, "klmnop"), 2, "[s=\"klmnop\"]"); X X/* --- memset --- */ X printf("memset(buf, 0, n):\n"); X PRINT_TIME(memset(buf, 0, 4), 3, "[n=4]"); X PRINT_TIME(memset(buf, 0, 1024), 3, "[n=1024]"); X X/* --- strlen --- */ X printf("strlen(s):\n"); X PRINT_TIME(strlen(ATOE), 1, "[s=ATOE]"); X PRINT_TIME(strlen(ATOZ), 1, "[s=ATOZ]"); X X exit(0); X} / echo x - analyze.c sed '/^X/s///' > analyze.c << '/' X/* analyze - analyze relative performance of two string packages */ X/* Input is expected in the form produced by perf. */ X X#include <stdio.h> Xlong atol(); X X#define MAXLINE 80 X#define OLD 0 X#define NEW 1 X#define TWOTABS 16 X#define THREETABS 24 X Xstatic void error_exit(rc, s) Xint rc; Xchar *s; X{ X fprintf(stderr, "analyze: %s\n", s); X exit(rc); X} X Xmain(argc, argv) Xint argc; Xchar *argv[]; X{ X FILE *f[2]; X char line[2][MAXLINE]; X int loops[2]; X long time[2]; X int pct; X int n; X X if (argc != 3 || X (f[OLD] = fopen(argv[1], "r")) == NULL || X (f[NEW] = fopen(argv[2], "r")) == NULL) X error_exit(1, "Usage: analyze file1 file2"); X X if (fgets(line[OLD], MAXLINE, f[OLD]) == NULL || X sscanf(line[OLD], "Loop count: %d,000", &loops[OLD]) != 1) X error_exit(2, "no header in first file"); X if (fgets(line[NEW], MAXLINE, f[NEW]) == NULL || X sscanf(line[NEW], "Loop count: %d,000", &loops[NEW]) != 1) X error_exit(2, "no header in second file"); X X fgets(line[OLD], MAXLINE, f[OLD]); X while (!feof(f[OLD])) { /* header line */ X if (fgets(line[NEW], MAXLINE, f[NEW]) == NULL || X strcmp(line[OLD], line[NEW]) != 0) X error_exit(3, "synchronization failure"); X n = strlen(line[OLD]) - 1; X printf("%.*s", n, line[OLD]); X if (n < TWOTABS) X putchar('\t'); X if (n < THREETABS) X putchar('\t'); X fgets(line[OLD], MAXLINE, f[OLD]); X while (!feof(f[OLD]) && line[OLD][0] == '\t') { /* detail line */ X if (fgets(line[NEW], MAXLINE, f[NEW]) == NULL) X error_exit(3, "synchronization failure"); X if ((n = strcspn(line[OLD], ":")) >= strlen(line[OLD])) X error_exit(4, "input failure"); X if (strncmp(line[OLD], line[NEW], n + 1) != 0) X error_exit(3, "synchronization failure"); X time[OLD] = atol(&line[OLD][n+2]); X time[NEW] = atol(&line[NEW][n+2]); X pct = ((100 * time[NEW] + time[OLD]/2) * loops[OLD]) / X (( time[OLD] ) * loops[NEW]); X printf("%-10.*s %2d\t", n, &line[OLD][1], pct); X fgets(line[OLD], MAXLINE, f[OLD]); X } X putchar('\n'); X } X exit(0); X} / echo x - prototype.h sed '/^X/s///' > prototype.h << '/' X#ifndef __PROTOTYPE_H X#define __PROTOTYPE_H X X#ifdef __STDC__ X#define _PROTO(p) p X#else X#define _PROTO(p) () X#define const X#endif X X#endif /* !defined __PROTOTYPE_H */ / echo x - string.h sed '/^X/s///' > string.h << '/' X#ifndef __STRING_H X#define __STRING_H X X/* --- Inclusions --- */ X#include "prototype.h" X X/* --- Constants --- */ X#ifndef __STDC__ X#define NULL 0 X#else X#define NULL ((void *) 0) X#endif X X/* --- Types --- */ X#ifndef __SIZE_T X#define __SIZE_T Xtypedef unsigned int size_t; X#endif X X/* --- Prototypes --- */ Xvoid *memcpy _PROTO((void *dst, const void *src, size_t n)); Xvoid *memmove _PROTO((void *dst, const void *src, size_t n)); Xchar *strcpy _PROTO((char *dst, const char *src)); Xchar *strncpy _PROTO((char *dst, const char *src, size_t n)); Xchar *strcat _PROTO((char *dst, const char *src)); Xchar *strncat _PROTO((char *dst, const char *src, size_t n)); Xint memcmp _PROTO((const void *s1, const void *s2, size_t n)); Xint strcmp _PROTO((const char *s1, const char *s2)); Xint strcoll _PROTO((const char *s1, const char *s2)); Xint strncmp _PROTO((const char *s1, const char *s2, size_t n)); Xsize_t strxfrm _PROTO((char *dst, const char *src, size_t n)); Xvoid *memchr _PROTO((const void *s, int c, size_t n)); Xchar *strchr _PROTO((const char *s, int c)); Xsize_t strcspn _PROTO((const char *s, const char *reject)); Xchar *strpbrk _PROTO((const char *s, const char *breakat)); Xchar *strrchr _PROTO((const char *s, int c)); Xsize_t strspn _PROTO((const char *s, const char *accept)); Xchar *strstr _PROTO((const char *s, const char *wanted)); Xchar *strtok _PROTO((char *s, const char *delim)); Xvoid *memset _PROTO((void *s, int c, size_t n)); Xchar *strerror _PROTO((int errnum)); Xsize_t strlen _PROTO((const char *s)); X X/* X * V7 and Berklix compatibility. X */ X#ifdef _V7 X#define index(s, c) strchr(s, c) X#define rindex(s, c) strrchr(s, c) X#endif X#ifdef _BSD X#define bcopy(src, dst, n) memcpy(dst, src, n) X#define bcmp(s1, s2, n) memcmp(s1, s2, n) X#define bzero(dst, n) memset(dst, 0, n) X#endif X X#endif /* !defined __STRING_H */ / echo x - memchr.x sed '/^X/s///' > memchr.x << '/' X/* memchr.x X * void *memchr(const void *s, int c, size_t n) X * X * Returns a pointer to the first occurrence of c (converted to X * unsigned char) in the object pointed to by s, NULL if none. X */ X X.define _memchr X.text X_memchr: X mov bx,di /* save di */ X mov di,sp X xor dx,dx /* default result is NULL */ X mov cx,6(di) X jcxz exit /* early exit if n == 0 */ X movb al,4(di) X mov di,2(di) X cld X repne X scab X jne exit X#ifdef i8088 X dec di X mov dx,di X#else X lea dx,-1(di) X#endif Xexit: X mov di,bx /* restore di */ X mov ax,dx X ret / echo x - memcmp.x sed '/^X/s///' > memcmp.x << '/' X/* memcmp.x X * int memcmp(const void *s1, const void *s2, size_t n) X * X * Compares the first n characters of the objects pointed to by X * s1 and s2. Returns zero if all characters are identical, a X * positive number if s1 greater than s2, a negative number otherwise. X */ X X#define BYTE_LIMIT 10 /* if n is above this, work with words */ X X.define _memcmp X.text X_memcmp: X mov bx,sp X push si X push di X xor ax,ax /* default return is equality */ X mov cx,6(bx) X jcxz exit /* early exit if n == 0 */ X mov si,2(bx) X mov di,4(bx) X cmp si,di X je exit /* early exit if s1 == s2 */ X cld X cmp cx,*BYTE_LIMIT X ja word_compare Xbyte_compare: X repe X cmpb X jne one_past_mismatch X pop di X pop si X ret Xword_compare: X test si,#1 /* align s1 on word boundary */ X jz word_aligned X cmpb X jne one_past_mismatch X dec cx Xword_aligned: X mov dx,cx /* save count */ X shr cx,#1 /* compare words, not bytes */ X jz almost_done X repe X cmp X je almost_done X sub si,#2 X sub di,#2 X cmpb X jne one_past_mismatch X jmp at_mismatch Xalmost_done: X test dx,#1 X jz exit X jmp at_mismatch Xone_past_mismatch: X dec si X dec di Xat_mismatch: X xorb ah,ah X movb al,(si) X subb al,(di) X jae exit X cbw Xexit: X pop di X pop si X ret / echo x - memmove.x sed '/^X/s///' > memmove.x << '/' X/* memmove.x X * void *memmove(void *s1, const void *s2, size_t n) X * void *memcpy(void *s1, const void *s2, size_t n) X * X * Copy n characters from the object pointed to by s2 into the X * object pointed to by s1. Copying takes place as if the n X * characters pointed to by s2 are first copied to a temporary X * area and then copied to the object pointed to by s1. X * X * Per X3J11, memcpy may have undefined results if the objects X * overlap; since the performance penalty is insignificant, we X * use the safe memmove code for it as well. X */ X X#define BYTE_LIMIT 10 /* if n is above this, work with words */ X X.define _memmove, _memcpy X.text X_memmove: X_memcpy: X mov bx,si /* save si and di */ X mov dx,di X mov di,sp X mov cx,6(di) X mov si,4(di) X mov di,2(di) X mov ax,di /* save a copy of s1 */ X jcxz exit /* early exit if n == 0 */ X sub di,si X je exit /* early exit if s1 == s2 */ X jb left_to_right /* left to right if s1 < s2 */ X cmp di,cx X jae left_to_right /* left to right if no overlap */ Xright_to_left: X mov di,ax /* retrieve s1 */ X std X add si,cx /* compute where objects end */ X dec si X add di,cx X dec di X cmp cx,#BYTE_LIMIT X jbe byte_move X test si,#1 /* align source on word boundary */ X jnz word_unaligned X movb X dec cx Xword_unaligned: X dec si /* adjust to word boundary */ X dec di X shr cx,#1 /* move words, not bytes */ X rep X movw X jnc exit X#ifdef i8088 X inc si /* fix up addresses for right to left moves */ X inc di X movb /* move leftover byte */ X#else X movb cl,1(si) X movb 1(di),cl /* move leftover byte */ X#endif X jmp exit Xleft_to_right: X mov di,ax /* retrieve s1 */ X cld X cmp cx,#BYTE_LIMIT X jbe byte_move X test si,#1 /* align source on word boundary */ X jz word_move X movb X dec cx Xword_move: X shr cx,#1 /* move words, not bytes */ X rep X movw X rcl cx,#1 /* set up to move leftover byte */ Xbyte_move: X rep X movb Xexit: X cld /* restore direction flag */ X mov si,bx /* restore si and di */ X mov di,dx X ret / echo x - memset.x sed '/^X/s///' > memset.x << '/' X/* memset.x X * void *memset(void *s, int c, size_t n) X * X * Copies the value of c (converted to unsigned char) into the X * first n locations of the object pointed to by s. X */ X X#ifdef i80386 X#define BYTE_LIMIT 16 /* if n is above this, work with doublewords */ X#define SIZE_OVERRIDE .byte 102 /* force 32 bits */ X#define SHLAX(n) .byte 193,224,n X#define SHRCX(n) .byte 193,233,n X#else X#define BYTE_LIMIT 10 /* if n is above this, work with words */ X#endif X X.define _memset X.text X_memset: X push di X mov di,sp X mov cx,8(di) X movb al,6(di) X mov di,4(di) X mov bx,di /* return value is s */ X jcxz exit /* early exit if n == 0 */ X cld X cmp cx,*BYTE_LIMIT X jbe byte_set X movb ah,al /* set up second byte */ X test di,#1 /* align on word boundary */ X jz word_aligned X stob X dec cx Xword_aligned: X#ifdef i80386 X test di,#2 /* align on doubleword boundary */ X jz dword_aligned X stow X sub cx,*2 Xdword_aligned: X mov dx,ax /* duplicate byte in all bytes of EAX */ X SIZE_OVERRIDE X SHLAX (16) X mov ax,dx X mov dx,cx /* save count */ X SHRCX (2) X rep X SIZE_OVERRIDE X stow X and dx,#3 /* set up to set leftover bytes */ X mov cx,dx X#else X shr cx,#1 /* set words, not bytes */ X rep X stow X rcl cx,#1 /* set up to set leftover byte */ X#endif Xbyte_set: X rep X stob Xexit: X pop di X mov ax,bx X ret / echo x - strcat.x sed '/^X/s///' > strcat.x << '/' X/* strcat.x X * char *strcat(char *s1, const char *s2) X * X * Concatenates the string pointed to by s2 onto the end of the X * string pointed to by s1. Returns s1. X */ X X.define _strcat X.text X_strcat: X mov bx,si /* save si and di */ X mov dx,di X mov si,sp X mov di,2(si) X push di /* save return value */ X mov si,4(si) X cmpb (si),*0 X je exit /* early exit if s2 is the null string */ X cld X mov cx,#-1 /* find end of s1 */ X xorb al,al X repne X scab X dec di /* point back at null character */ X test si,#1 /* align source on word boundary */ X jz word_copy X movb Xword_copy: /* loop to copy words */ X lodw X orb al,al X jz move_last_byte /* exit if low byte == 0 */ X stow X orb ah,ah X jnz word_copy X jmp exit Xmove_last_byte: X stob /* add odd zero byte */ Xexit: X mov si,bx X mov di,dx X pop ax X ret / echo x - strchr.x sed '/^X/s///' > strchr.x << '/' X/* strchr.x X * char *strchr(const char *s, int c) X * X * Returns location of the first occurrence of c (converted to char) X * in the string pointed to by s. Returns NULL if c does not occur. X */ X X.define _strchr X.text X_strchr: X mov bx,si /* save si */ X mov si,sp X movb dl,4(si) X mov si,2(si) X cld X test si,#1 /* align string on word boundary */ X jz word_loop X lodb X cmpb al,dl X je one_past X orb al,al X jz no_match Xword_loop: /* look for c word by word */ X lodw X cmpb al,dl X je two_past X orb al,al X jz no_match X cmpb ah,dl X je one_past X orb ah,ah X jnz word_loop Xno_match: X xor ax,ax X mov si,bx /* restore si */ X ret Xtwo_past: X dec si Xone_past: X#ifdef i8088 X dec si X mov ax,si X#else X lea ax,-1(si) X#endif X mov si,bx /* restore si */ X ret / echo x - strcmp.x sed '/^X/s///' > strcmp.x << '/' X/* strcmp.x X * int strcmp(const char *s1, const char *s2) X * X * Compares the strings pointed to by s1 and s2. Returns zero if X * strings are identical, a positive number if s1 greater than s2, X * and a negative number otherwise. X */ X X.define _strcmp X.text X_strcmp: X mov bx,si /* save si and di */ X mov cx,di X mov di,sp X mov si,2(di) X mov di,4(di) X xor ax,ax /* default return is equality */ X cmp si,di X je exit /* early exit if s1 == s2 */ X cld X test si,#1 /* align s1 on word boundary */ X jz setup_loop X lodb X orb al,al X jz last_byte_test X cmpb al,(di) X jne last_byte_test X inc di Xsetup_loop: X sub di,#2 /* set up for faster loop */ Xword_loop: /* loop through string by words */ X lodw X add di,#2 X orb al,al X jz last_byte_test X cmp ax,(di) X jne find_mismatch X orb ah,ah X jnz word_loop X xor ax,ax X jmp exit Xfind_mismatch: X cmpb al,(di) X jne last_byte_test X movb al,ah X inc di Xlast_byte_test: /* Expects: (al)=char of s1; (di)->char of s2 */ X xorb ah,ah X subb al,(di) X jae exit X cbw Xexit: X mov si,bx /* restore si and di */ X mov di,cx X ret / echo x - strcpy.x sed '/^X/s///' > strcpy.x << '/' X/* strcpy.x X * char *strcpy(char *s1, const char *s2) X * X * Copy the string pointed to by s2, including the terminating null X * character, into the array pointed to by s1. Returns s1. X */ X X.define _strcpy X.text X_strcpy: X mov bx,si /* save si and di */ X mov cx,di X mov di,sp X mov si,4(di) X mov di,2(di) X mov dx,di X cld X test si,#1 /* align source on word boundary */ X jz word_copy X lodb X stob X orb al,al X jz exit Xword_copy: /* loop to copy words */ X lodw X orb al,al X jz move_last_byte /* early exit if low byte == 0 */ X stow X orb ah,ah X jnz word_copy X jmp exit Xmove_last_byte: X stob /* add odd zero byte */ Xexit: X mov ax,dx X mov si,bx /* restore si and di */ X mov di,cx X ret / echo x - strcspn.x sed '/^X/s///' > strcspn.x << '/' X/* strcspn.x X * size_t strcspn(const char *s1, const char *s2) X * X * Returns the length of the longest prefix of the string pointed X * to by s1 that has none of the characters in the string s2. X */ X X.define _strcspn X.text X_strcspn: X push bp X mov bp,sp X push si X push di X mov si,4(bp) X mov di,6(bp) X cld X mov bx,#-1 /* set up count (-1 for faster loops) */ X cmpb (di),*0 X jz s1_length /* if s2 is null, we return length of s1 */ X cmpb 1(di),*0 X jz find_match /* if s2 has length one, we take a shortcut */ X mov cx,bx /* find length of s2 */ X xorb al,al X repne X scab X not cx X dec cx X mov dx,cx /* save length of s2 */ Xs1_loop: /* loop over s1 looking for matches with s2 */ X lodb X inc bx X orb al,al X jz exit X mov di,6(bp) X mov cx,dx X repne X scab X jne s1_loop X jmp exit Xs1_length: /* find length of s1 */ X mov di,si X mov cx,bx X xorb al,al X repne X scab X not cx X dec cx X mov bx,cx X jmp exit Xfind_match: /* find a match for *s2 in s1 */ X movb dl,(di) X test si,#1 /* align source on word boundary */ X jz word_loop X lodb X inc bx X orb al,al X je exit X cmpb al,dl X je exit Xword_loop: X lodw X inc bx X orb al,al X je exit X cmpb al,dl X je exit X inc bx X orb ah,ah X je exit X cmpb ah,dl X jne word_loop Xexit: X mov ax,bx X pop di X pop si X mov sp,bp X pop bp X ret / echo x - strlen.x sed '/^X/s///' > strlen.x << '/' X/* strlen.x X * size_t strlen(const char *s) X * X * Returns the length of the string pointed to by s. X */ X X.define _strlen X.text X_strlen: X mov bx,di /* save di */ X mov di,sp X mov di,2(di) X mov cx,#-1 X xorb al,al X cld X repne X scab X not cx /* silly trick gives length (including null) */ X dec cx /* forget about null */ X mov ax,cx X mov di,bx /* restore di */ X ret / echo x - strncat.x sed '/^X/s///' > strncat.x << '/' X/* strncat.x X * char *strncat(char *s1, const char *s2, size_t n) X * X * Concatenates up to n characters of the string pointed to by s2 X * onto the end of the string pointed to by s1. A terminating X * null character is always appended. Returns s1. X */ X X.define _strncat X.text X_strncat: X mov bx,si /* save si and di */ X mov dx,di X mov si,sp X mov cx,6(si) X mov di,2(si) X push di /* save return value */ X jcxz exit /* early exit if n == 0 */ X cld X mov cx,#-1 /* find end of s1 */ X xorb al,al X repne X scab X dec di X mov cx,6(si) X mov si,4(si) Xbyte_loop: /* loop to copy bytes */ X lodb X stob X orb al,al X loopnz byte_loop X jz exit X movb (di),*0 /* add terminating null character */ Xexit: X mov si,bx /* restore si and di */ X mov di,dx X pop ax X ret / echo x - strncmp.x sed '/^X/s///' > strncmp.x << '/' X/* strncmp.x X * int strncmp(const char *s1, const char *s2, size_t n) X * X * Compares up to n characters from the strings pointed to by s1 X * and s2. Returns zero if the (possibly null terminated) arrays X * are identical, a positive number if s1 is greater than s2, and X * a negative number otherwise. X */ X X.define _strncmp X.text X_strncmp: X mov bx,sp X push si X push di X xor ax,ax /* default result is equality */ X mov cx,6(bx) X jcxz exit /* early exit if n == 0 */ X mov si,2(bx) X mov di,4(bx) X cmp si,di X je exit /* early exit if s1 == s2 */ X cld X test si,#1 /* align s1 on word boundary */ X jz setup_loop X lodb X orb al,al X jz last_byte_test X cmpb al,(di) X jne last_byte_test X xor ax,ax X dec cx X jz exit /* early exit if n == 1 */ X inc di Xsetup_loop: X mov dx,cx /* save count */ X shr cx,#1 /* work with words, not bytes */ X jz fetch_last_byte X sub di,#2 /* set up for faster loop */ Xword_loop: /* loop through string by words */ X lodw X add di,#2 X orb al,al X jz last_byte_test X cmp ax,(di) X jne find_mismatch X orb ah,ah X loopnz word_loop X mov ax,#0 /* zero return value (without setting flags) */ X jz exit X test dx,#1 /* check for odd byte at end */ X jz exit X add di,#2 Xfetch_last_byte: X movb al,(si) X jmp last_byte_test Xfind_mismatch: /* check word for mismatched byte */ X cmpb al,(di) X jne last_byte_test X movb al,ah X inc di Xlast_byte_test: /* Expects: (al)=char of s1; (di)->char of s2 */ X xorb ah,ah X subb al,(di) X jae exit X cbw Xexit: X pop di X pop si X ret / echo x - strncpy.x sed '/^X/s///' > strncpy.x << '/' X/* strncpy.x X * char *strncpy(char *s1, const char *s2, size_t n) X * X * Copy up to n characters from the string pointed to by s2 to X * the array pointed to by s1. If the source string is shorter X * than n characters, the remainder of the destination is padded X * with null characters. If the source is longer than n characters, X * the destination will not be null terminated. Returns s1. X */ X X#define BYTE_LIMIT 10 /* if n is above this, zero fill with words */ X X.define _strncpy X.text X_strncpy: X mov bx,sp X push si X push di X mov cx,6(bx) X jcxz exit /* early exit if n == 0 */ X mov di,2(bx) X mov si,4(bx) X cld X cmpb (si),*0 X je zero_fill /* if s2 has length zero, take a short cut */ X test si,#1 /* align source on word boundary */ X jz set_length X movb X dec cx X jz exit /* early exit if n == 1 */ Xset_length: X mov dx,cx /* save count */ X shr cx,#1 /* copy words, not bytes */ X jz last_byte Xword_copy: /* loop to copy words */ X lodw X orb al,al X jz restore_length /* early exit if low byte == 0 */ X stow X orb ah,ah X loopnz word_copy X jz restore_length Xlast_byte: X test dx,#1 /* move leftover byte */ X jz exit X movb X jmp exit Xrestore_length: /* retrieve remaining length (in bytes) */ X shl cx,#1 X and dx,#1 X add cx,dx Xzero_fill: /* add null characters if necessary */ X xor ax,ax X cmp cx,*BYTE_LIMIT X jbe zero_bytes X test di,#1 /* align destination on word boundary */ X jz zero_words X stob X dec cx Xzero_words: X shr cx,#1 /* zero words, not bytes */ X rep X stow X rcl cx,#1 /* set up for leftover byte */ Xzero_bytes: X rep X stob Xexit: X pop di X pop si X mov ax,2(bx) X ret / echo x - strpbrk.x sed '/^X/s///' > strpbrk.x << '/' X/* strpbrk.x X * char *strpbrk(const char *s1, const char *s2) X * X * Returns the address of the first character of the string pointed X * to by s1 that is in the string pointed to by s2. Returns NULL X * if no such character exists. X */ X X.define _strpbrk X.text X_strpbrk: X mov bx,sp X push si X push di X mov si,2(bx) X mov di,4(bx) X mov bx,di /* save a copy of s2 */ X cld X xor ax,ax /* default return value is NULL */ X cmpb (di),*0 X jz exit /* if s2 has length zero, we are done */ X cmpb 1(di),*0 X jz find_match /* if s2 has length one, we take a shortcut */ X mov cx,#-1 /* find length of s2 */ X repne X scab X not cx X dec cx X mov dx,cx /* save length of s2 */ Xs1_loop: /* loop through s1 to find matches with s2 */ X lodb X orb al,al X jz exit X mov di,bx X mov cx,dx X repne X scab X jne s1_loop X#ifdef i8088 X dec si X mov ax,si X#else X lea ax,-1(si) X#endif X pop di X pop si X ret Xfind_match: /* find a match for *s2 in s1 */ X movb dl,(di) X test si,#1 /* align source on word boundary */ X jz word_loop X lodb X cmpb al,dl X je one_past X orb al,al X jz no_match Xword_loop: X lodw X cmpb al,dl X je two_past X orb al,al X jz no_match X cmpb ah,dl X je one_past X orb ah,ah X jnz word_loop Xno_match: X xor ax,ax X pop di X pop si X ret Xtwo_past: X dec si Xone_past: X#ifdef i8088 X dec si X mov ax,si X#else X lea ax,-1(si) X#endif Xexit: X pop di X pop si X ret / echo x - strrchr.x sed '/^X/s///' > strrchr.x << '/' X/* strrchr.x X * char *strrchr(const char *s, int c) X * X * Locates final occurrence of c (as unsigned char) in string s. X */ X X.define _strrchr X.text X_strrchr: X mov bx,di /* save di */ X mov di,sp X xor dx,dx /* default result is NULL */ X movb ah,4(di) X mov di,2(di) X cld X mov cx,#-1 /* find end of string */ X xorb al,al X repne X scab X not cx /* silly trick gives length (including null) */ X dec di /* point back at null character */ X movb al,ah /* find last occurrence of c */ X std X repne X scab X jne exit X#ifdef i8088 X inc di X mov dx,di X#else X lea dx,1(di) X#endif Xexit: X cld /* clear direction flag */ X mov di,bx /* restore di */ X mov ax,dx X ret / echo x - strspn.x sed '/^X/s///' > strspn.x << '/' X/* strspn.x X * size_t strspn(const char *s1, const char *s2) X * X * Returns the length of the longest prefix of the string pointed X * to by s1 that is made up of the characters in the string s2. X */ X X.define _strspn X.text X_strspn: X push bp X mov bp,sp X push si X push di X mov si,4(bp) X mov di,6(bp) X cld X xor bx,bx /* default return value is zero */ X cmpb (di),*0 X jz exit /* if s2 has length zero, we are done */ X cmpb 1(di),*0 X jz find_mismatch /* if s2 has length one, we take a shortcut */ X mov cx,#-1 /* find length of s2 */ X xorb al,al X repne X scab X not cx X dec cx X mov dx,cx /* save length of s2 */ X dec bx /* set up byte count for faster loop */ Xs1_loop: /* loop over s1 looking for matches with s2 */ X lodb X inc bx X orb al,al X jz exit X mov di,6(bp) X mov cx,dx X repne X scab X je s1_loop X jmp exit Xfind_mismatch: /* find a character in s1 that is not *s2 */ X movb al,(di) X mov di,si X mov cx,#-1 X repe X scab X dec di /* point back at mismatch */ X mov bx,di X sub bx,si /* number of matched characters */ Xexit: X mov ax,bx X pop di X pop si X mov sp,bp X pop bp X ret / echo x - strstr.x sed '/^X/s///' > strstr.x << '/' X/* strstr.x X * char * strstr(const char *s1, const char *s2) X * X * Returns a pointer to the first occurrence in the string pointed X * to by s1 that is made up of the characters in the string s2. X */ X X.define _strstr X.text X_strstr: X push bp X mov bp,sp X sub sp,#2 /* make room for locals */ X push si X push di X mov si,4(bp) X mov di,6(bp) X mov bx,si /* default result is s1 */ X movb ah,(di) /* fetch first character of s2 */ X orb ah,ah X je exit /* if s2 is null, we are done */ X cld X mov cx,#-1 /* find length of s2 */ X xorb al,al X repne X scab X not cx X dec cx X mov -2(bp),cx /* save length of s2 */ X mov cx,#-1 /* find length + 1 of s1 */ X mov di,si X repne X scab X not cx X sub cx,-2(bp) /* |s1| - |s2| + 1 is number of possibilities */ X jbe not_found /* if |s1| < |s2|, give up right now */ X mov dx,cx X inc dx /* set up for faster loop */ X dec bx Xs1_loop: X dec dx X jz not_found X inc bx X cmpb ah,(bx) X jne s1_loop /* no match on first character - try another */ X mov di,6(bp) X mov si,bx X mov cx,-2(bp) X repe X cmpb X jne s1_loop X jmp exit Xnot_found: X xor bx,bx Xexit: X mov ax,bx X pop di X pop si X mov sp,bp X pop bp X ret / echo x - strtok.x sed '/^X/s///' > strtok.x << '/' X/* strtok.x X * char *strtok(char *s1, const char *s2) X * X * Returns a pointer to the "next" token in s1. Tokens are X * delimited by the characters in the string pointed to by s2. X */ X X.define _strtok X.data Xscan: .word 0 X.text X_strtok: X push bp X mov bp,sp X push si X push di X cld X mov bx,4(bp) X or bx,bx /* if s != NULL, */ X jnz s2_length /* we start a new string */ X mov bx,scan X or bx,bx /* if old string exhausted, */ X jz exit /* exit early */ Xs2_length: /* find length of s2 */ X mov di,6(bp) X mov cx,#-1 X xorb al,al X repne X scab X not cx X dec cx X jz string_finished /* if s2 has length zero, we are done */ X mov dx,cx /* save length of s2 */ X X mov si,bx X xor bx,bx /* return value is NULL */ Xdelim_loop: /* dispose of leading delimiters */ X lodb X orb al,al X jz string_finished X mov di,6(bp) X mov cx,dx X repne X scab X je delim_loop X X lea bx,-1(si) /* return value is start of token */ Xtoken_loop: /* find end of token */ X lodb X orb al,al X jz string_finished X mov di,6(bp) X mov cx,dx X repne X scab X jne token_loop X movb -1(si),*0 /* terminate token */ X mov scan,si /* set up for next call */ X jmp exit Xstring_finished: X mov scan,#0 /* ensure NULL return in future */ Xexit: X mov ax,bx X pop di X pop si X mov sp,bp X pop bp X ret / echo x - strerror.x sed '/^X/s///' > strerror.x << '/' X/* strerror.x X * char *strerror(int errnum) X * X * Returns a pointer to an appropriate error message string. X */ X X.define _strerror X.data X.extern _sys_nerr, _sys_errlist Xunknown: .asciz 'Unknown error' X.text X_strerror: X mov bx,sp X mov bx,2(bx) X mov ax,#unknown /* default return is "Unknown error" */ X or bx,bx X jle exit X cmp bx,_sys_nerr X jge exit X sal bx,#1 X mov ax,_sys_errlist(bx) Xexit: X ret / exit