nfs@notecnirp.Princeton.EDU (Norbert Schlenker) (10/10/89)
Having done some a favour by posting a new stdio package, which I believe to be portable across Minix platforms, I am now going to do others a favour (but to ast, perhaps a disservice) by posting an assembly language version of the ANSI C string package for PC's and compatibles. All of the required routines, save for strcoll() and strxfrm() [missing since I don't have a use for locale specific routines] and strerror() [missing since I see no advantage over the C implementation], are included. The Berkeley compatibility routines and memccpy() are not included either. I realize that this posting may cause some distress because of nonportability, but I believe the performance gain to be worthwhile. Henry Spencer's C routines are widely used, very reliable, very portable, and are easily compiled into reasonably efficient code. They can take no advantage, of course, of special architectural features, which Intel processors possess in abundance in this case. If the best that could be done was a 10-20% improvement in the string code, which I would consider fairly typical for assembly over C, I wouldn't consider it worthwhile. But my rewritten routines show much larger improvements for typical inputs - from 40% to 95% depending on function. The code that I write tends to use a lot of str[len|cpy|cmp]() and mem[set|cpy](). The improvements for these routines are substantial enough that I use my assembly language versions. The recent Dhrystone 1.1 posting by ast shows a 40% increase in Dhrystone rating on my machine with these routines (only strcpy() and strcmp() are used there). The code is faster for four reasons. It uses special instructions not generated by the C compiler; it pays careful attention to register contents; it unrolls most loops once; and it takes advantage of word alignment where possible. The first two involve fairly simple adap- tations of Spencer's C code to the Intel architecture. The last two are sometimes unpleasant. Unrolling loops once saves 10-15% in most cases; the attention to alignment saves 3-5% on top of that. The code is less clear (in some cases, MUCH less clear) and harder to debug, but 20% is not to be sniffed at. The code was optimized on a Toshiba 5100, which has an 80386 and uncached 1-wait state 32 bit memory. As a result, the code may not be optimal on other machines. I expect it to be quite good for 16 bit CPU's, with perhaps slightly less improvement on 8088's, where the attention to alignment is wasted. I am open to bug reports and suggestions for improvement. I am also interested in reports of performance on other machines. To that end, I have included a program that computes the improvement for a variety of routines automagically; please email the results to me. The program asks for a description of the machine, which it uses as a header - the CPU and a description of the memory architecture are what I'd like to see. I have included a copy of the program's output for my machine in the file Perf.T5100; I hope the improvements shown there will be typical. Feed the following to /bin/sh to unpack it. If you have a '386, link memset386.S to memset.S; otherwise link memset86.S. By default, the makefile generates the performance comparison with the existing library. If you "make install", packed versions of the routines will be installed in /usr/lib/libc.a. (After installation in the library, the performance comparison cannot be done, since the old routines are gone.) Enjoy! ------------------------------ Cut here -------------------------------- echo x - Copyright sed '/^X/s///' > Copyright << '/' XCopyright 1989 Norbert Schlenker. All rights reserved. X XThe copyright holder hereby grants to the public the right to use, Xmodify, and redistribute this software freely. X XThis software is provided "as is" and carries no warranty of any kind. / echo x - Makefile sed '/^X/s///' > Makefile << '/' XSRCS = memchr.S memcmp.S memmove.S memset.S strcat.S strchr.S \ X strcmp.S strcpy.S strcspn.S strlen.S strncat.S strncmp.S \ X strncpy.S strpbrk.S strrchr.S strspn.S strstr.S strtok.S X XOBJS = memchr.s memcmp.s memmove.s memset.s strcat.s strchr.s \ X strcmp.s strcpy.s strcspn.s strlen.s strncat.s strncmp.s \ X strncpy.s strpbrk.s strrchr.s strspn.s strstr.s strtok.s X X.SUFFIXES: .s .S .o .c .y X X.S.s: X libpack < $*.S | \ X sed "/_[MS]/s/`echo $* | tr '[a-z]' '[A-Z]'`/$*/" | \ X sed "/^\$$/d" >$*.s X Xperf: perf.c $(SRCS) X cc -o perf perf.c $(SRCS) X perf >Perf.local X Xinstall: $(OBJS) X ar d /usr/lib/libc.a memcpy.s # Spencer's memcpy replaced by memmove X ar rlv /usr/lib/libc.a $? / echo x - perf.c sed '/^X/s///' > perf.c << '/' X#include <stdio.h> X#include <string.h> X#include <time.h> X#include <sys/times.h> X X#define INNERLOOPCOUNT 1000 X X#define TIME_EXECUTION(x, t) { \ X times(&start); \ X for (i = loop_count; --i >= 0; ) \ X for (j = INNERLOOPCOUNT; --j >= 0; ) \ X x; \ X times(&end); \ X t = end.tms_utime - start.tms_utime; \ X } X X#define GAIN(argc) \ X (((new_time - empty_function_time[argc]) * 100 + 50) / \ X (old_time - empty_function_time[argc])) X X Xstatic char ATOE[] = "ABCDE"; Xstatic char ATOZ[] = "ABCDEFGHIJKLMNOPQRSTUVWXYZ"; X Xint empty_function(x, y, z) Xchar *x; Xchar *y; Xchar *z; X{ X} X Xmain() X{ X struct tms start, end; X register int i, j; X int loop_count; X clock_t empty_function_time[4]; X clock_t old_time; X clock_t new_time; X clock_t fake_time; X char *s; X static char s1[31]; X static char s2[30]; X static char bigbuf[1024]; X static char bigbuf2[1024]; X X fprintf(stderr, "This test takes about 7 minutes to run.\n"); X fprintf(stderr, "Describe the machine you are running on: "); X fgets(bigbuf, 60, stdin); X printf("Machine: %s\n", bigbuf); X X/* --- Find a loop count that results in nontrivial function call times --- */ X empty_function_time[0] = 0; X empty_function_time[1] = 0; X for (loop_count = 10; empty_function_time[1] < CLK_TCK/10; loop_count += 5) X TIME_EXECUTION(empty_function("x"), empty_function_time[1]); X TIME_EXECUTION(empty_function("x","y"), empty_function_time[2]); X TIME_EXECUTION(empty_function("x","y","z"), empty_function_time[3]); X printf("Functions called %d,000 times each\n\n", loop_count); X printf("Function call \t\t%% of old time\n"); X printf("-----------------\t\t-------------\n"); X X/* --- memmove --- */ X TIME_EXECUTION(memcpy(s1 + 1, s2, 4), old_time); X TIME_EXECUTION(MEMCPY(s1 + 1, s2, 4), new_time); X printf("memcpy(s1, s2, n)\t\t[n=4]:%d", GAIN(3)); X TIME_EXECUTION(memcpy(s1, s2 + 1, 25), old_time); X TIME_EXECUTION(MEMCPY(s1, s2 + 1, 25), new_time); X printf("\t[n=25]:%d", GAIN(3)); X TIME_EXECUTION(memcpy(bigbuf, bigbuf2, 1024), old_time); X TIME_EXECUTION(MEMCPY(bigbuf, bigbuf2, 1024), new_time); X printf("\t[n=1024]:%d\n", GAIN(3)); X X/* --- strcpy --- */ X TIME_EXECUTION(strcpy(s1, ATOE), old_time); X TIME_EXECUTION(STRCPY(s1, ATOE), new_time); X printf("strcpy(s1, s2)\t\t\t[s2=ATOE]:%d", GAIN(2)); X TIME_EXECUTION(strcpy(s1, ATOZ), old_time); X TIME_EXECUTION(STRCPY(s1, ATOZ), new_time); X printf("\t[s2=ATOZ]:%d\n", GAIN(2)); X X/* --- strncpy --- */ X TIME_EXECUTION(strncpy(s1, ATOZ, 10), old_time); X TIME_EXECUTION(STRNCPY(s1, ATOZ, 10), new_time); X printf("strncpy(s1, s2, n)\t\t[s2=ATOZ,n=10]:%d\n", GAIN(3)); X X/* --- strcat --- */ X strcpy(s1, ATOZ); X times(&start); X for (i = loop_count; --i >= 0; ) X for (j = INNERLOOPCOUNT; --j >= 0; ) { X s1[11] = '\0'; X strcat(s1, ATOE + 1); X } X times(&end); X old_time = end.tms_utime - start.tms_utime; X times(&start); X for (i = loop_count; --i >= 0; ) X for (j = INNERLOOPCOUNT; --j >= 0; ) { X s1[11] = '\0'; X STRCAT(s1, ATOE + 1); X } X times(&end); X new_time = end.tms_utime - start.tms_utime; X printf("strcat(ATOJ, ATOE + 1)\t\t%d\n", GAIN(2)); X X/* --- memcmp --- */ X memset(bigbuf, 0, 1024); X memset(bigbuf2, 0, 1024); X TIME_EXECUTION(memcmp(bigbuf, bigbuf2, 4), old_time); X TIME_EXECUTION(MEMCMP(bigbuf, bigbuf2, 4), new_time); X printf("memcmp(buf, buf2, n)\t\t[n=4]:%d", GAIN(3)); X TIME_EXECUTION(memcmp(bigbuf, bigbuf2, 25), old_time); X TIME_EXECUTION(MEMCMP(bigbuf, bigbuf2, 25), new_time); X printf("\t[n=25]:%d", GAIN(3)); X TIME_EXECUTION(memcmp(bigbuf, bigbuf2, 1024), old_time); X TIME_EXECUTION(MEMCMP(bigbuf, bigbuf2, 1024), new_time); X printf("\t[n=1024]:%d\n", GAIN(3)); X X/* --- strcmp --- */ X strcpy(s1, ATOE); X TIME_EXECUTION(strcmp(s1, ATOE), old_time); X TIME_EXECUTION(STRCMP(s1, ATOE), new_time); X printf("strcmp(s1, s2)\t\t\t[len=5]:%d", GAIN(2)); X strcpy(s1, ATOZ); X TIME_EXECUTION(strcmp(s1 + 1, ATOZ + 1), old_time); X TIME_EXECUTION(STRCMP(s1 + 1, ATOZ + 1), new_time); X printf("\t[len=25]:%d\n", GAIN(2)); X X/* --- strncmp --- */ X strcpy(s1, ATOZ); X TIME_EXECUTION(strncmp(s1, ATOZ, 4), old_time); X TIME_EXECUTION(STRNCMP(s1, ATOZ, 4), new_time); X printf("strncmp(s1, s2, n)\t\t[n=4]:%d", GAIN(3)); X TIME_EXECUTION(strncmp(s1, ATOZ, 25), old_time); X TIME_EXECUTION(STRNCMP(s1, ATOZ, 25), new_time); X printf("\t[n=25]:%d\n", GAIN(3)); X X/* --- memchr --- */ X TIME_EXECUTION(memchr(ATOZ, 'E', 25), old_time); X TIME_EXECUTION(MEMCHR(ATOZ, 'E', 25), new_time); X printf("memchr(ATOZ, c, 25)\t\t[c='E']:%d", GAIN(3)); X TIME_EXECUTION(memchr(ATOZ, 'Z', 25), old_time); X TIME_EXECUTION(MEMCHR(ATOZ, 'Z', 25), new_time); X printf("\t[c='Z']:%d\n", GAIN(3)); X X/* --- strchr --- */ X strcpy(s1, ATOZ); X strcpy(s2, s1); X TIME_EXECUTION(strchr(ATOZ, 'E'), old_time); X TIME_EXECUTION(STRCHR(ATOZ, 'E'), new_time); X printf("strchr(ATOZ, c))\t\t[c='E']:%d", GAIN(2)); X TIME_EXECUTION(strchr(ATOZ, 'Z'), old_time); X TIME_EXECUTION(STRCHR(ATOZ, 'Z'), new_time); X printf("\t[c='Z']:%d\n", GAIN(2)); X X/* --- strcspn --- */ X TIME_EXECUTION(strcspn("word list", " "), old_time); X TIME_EXECUTION(STRCSPN("word list", " "), new_time); X printf("strcspn(\"word list\", s)\t\t[s=\" \"]:%d", GAIN(2)); X TIME_EXECUTION(strcspn("word list", " \t\r\n"), old_time); X TIME_EXECUTION(STRCSPN("word list", " \t\r\n"), new_time); X printf("\t[s=\" \\t\\r\\n\"]:%d\n", GAIN(2)); X X/* --- strpbrk --- */ X TIME_EXECUTION(strpbrk("word list", " "), old_time); X TIME_EXECUTION(STRPBRK("word list", " "), new_time); X printf("strpbrk(\"word list\", s)\t\t[s=\" \"]:%d", GAIN(2)); X TIME_EXECUTION(strpbrk("word list", " \t\r\n"), old_time); X TIME_EXECUTION(STRPBRK("word list", " \t\r\n"), new_time); X printf("\t[s=\" \\t\\r\\n\"]:%d\n", GAIN(2)); X X/* --- strrchr --- */ X TIME_EXECUTION(strrchr(ATOZ, 'A'), old_time); X TIME_EXECUTION(STRRCHR(ATOZ, 'A'), new_time); X printf("strrchr(ATOZ, c)\t\t[c='A']:%d", GAIN(2)); X TIME_EXECUTION(strrchr(ATOZ, 'M'), old_time); X TIME_EXECUTION(STRRCHR(ATOZ, 'M'), new_time); X printf("\t[c='M']:%d", GAIN(2)); X TIME_EXECUTION(strrchr(ATOZ, 'Z'), old_time); X TIME_EXECUTION(STRRCHR(ATOZ, 'Z'), new_time); X printf("\t[c='Z']:%d\n", GAIN(2)); X X/* --- strspn --- */ X TIME_EXECUTION(strspn("0175713", "01234567"), old_time); X TIME_EXECUTION(STRSPN("0175713", "01234567"), new_time); X printf("strspn(\"0175713\", \"01234567\")\t%d\n", GAIN(2)); X X/* --- strstr --- */ X TIME_EXECUTION(strstr(ATOZ, "a"), old_time); X TIME_EXECUTION(STRSTR(ATOZ, "a"), new_time); X printf("strstr(ATOZ, s)\t\t\t[s=\"a\"]:%d", GAIN(2)); X TIME_EXECUTION(strstr(ATOZ, "y"), old_time); X TIME_EXECUTION(STRSTR(ATOZ, "y"), new_time); X printf("\t[s=\"y\"]:%d", GAIN(2)); X TIME_EXECUTION(strstr(ATOZ, "klmnop"), old_time); X TIME_EXECUTION(STRSTR(ATOZ, "klmnop"), new_time); X printf("\t[s=\"klmnop\"]:%d\n", GAIN(2)); X X/* --- strtok --- */ X times(&start); X for (i = loop_count; --i >= 0; ) X for (j = INNERLOOPCOUNT; --j >= 0; ) { X strcpy(s1, "a a a a a a a a a a a a"); X empty_function(NULL, " "); X empty_function(NULL, " "); X empty_function(NULL, " "); X empty_function(NULL, " "); X empty_function(NULL, " "); X empty_function(NULL, " "); X empty_function(NULL, " "); X empty_function(NULL, " "); X empty_function(NULL, " "); X empty_function(NULL, " "); X empty_function(NULL, " "); X empty_function(NULL, " "); X } X times(&end); X fake_time = end.tms_utime - start.tms_utime; X times(&start); X for (i = loop_count; --i >= 0; ) X for (j = INNERLOOPCOUNT; --j >= 0; ) { X strcpy(s1, "a a a a a a a a a a a a"); X strtok(s1, " "); X strtok(NULL, " "); X strtok(NULL, " "); X strtok(NULL, " "); X strtok(NULL, " "); X strtok(NULL, " "); X strtok(NULL, " "); X strtok(NULL, " "); X strtok(NULL, " "); X strtok(NULL, " "); X strtok(NULL, " "); X strtok(NULL, " "); X } X times(&end); X old_time = end.tms_utime - start.tms_utime - fake_time; X times(&start); X for (i = loop_count; --i >= 0; ) X for (j = INNERLOOPCOUNT; --j >= 0; ) { X strcpy(s1, "a a a a a a a a a a a a"); X STRTOK(s1, " "); X STRTOK(NULL, " "); X STRTOK(NULL, " "); X STRTOK(NULL, " "); X STRTOK(NULL, " "); X STRTOK(NULL, " "); X STRTOK(NULL, " "); X STRTOK(NULL, " "); X STRTOK(NULL, " "); X STRTOK(NULL, " "); X STRTOK(NULL, " "); X STRTOK(NULL, " "); X } X times(&end); X new_time = end.tms_utime - start.tms_utime - fake_time; X printf("strtok(\"a a ... a a\",\" \") ...\n"); X printf("strtok(NULL, \" \") ...\t\t%d\n", GAIN(0)); X X/* --- memset --- */ X TIME_EXECUTION(memset(bigbuf, 0, 4), old_time); X TIME_EXECUTION(MEMSET(bigbuf, 0, 4), new_time); X printf("memset(buf, 0, n)\t\t[n=4]:%d", GAIN(3)); X TIME_EXECUTION(memset(bigbuf, 0, 1024), old_time); X TIME_EXECUTION(MEMSET(bigbuf, 0, 1024), new_time); X printf("\t[n=1024]:%d\n", GAIN(3)); X X/* --- strlen --- */ X TIME_EXECUTION(strlen(ATOE), old_time); X TIME_EXECUTION(STRLEN(ATOE), new_time); X printf("strlen(s)\t\t\t[s=ATOE]:%d", GAIN(1)); X TIME_EXECUTION(strlen(ATOZ), old_time); X TIME_EXECUTION(STRLEN(ATOZ), new_time); X printf("\t[s=ATOZ]:%d\n", GAIN(1)); X X exit(0); X} / echo x - Perf.T5100 sed '/^X/s///' > Perf.T5100 << '/' XMachine: Toshiba 5100 (80386; 16 MHz; 1-ws 32 bit memory; no cache) X XFunctions called 25,000 times each X XFunction call % of old time X----------------- ------------- Xmemcpy(s1, s2, n) [n=4]:35 [n=25]:18 [n=1024]:7 Xstrcpy(s1, s2) [s2=ATOE]:58 [s2=ATOZ]:47 Xstrncpy(s1, s2, n) [s2=ATOZ,n=10]:46 Xstrcat(ATOJ, ATOE + 1) 51 Xmemcmp(buf, buf2, n) [n=4]:31 [n=25]:13 [n=1024]:7 Xstrcmp(s1, s2) [len=5]:47 [len=25]:37 Xstrncmp(s1, s2, n) [n=4]:51 [n=25]:33 Xmemchr(ATOZ, c, 25) [c='E']:31 [c='Z']:23 Xstrchr(ATOZ, c)) [c='E']:51 [c='Z']:33 Xstrcspn("word list", s) [s=" "]:33 [s=" \t\r\n"]:36 Xstrpbrk("word list", s) [s=" "]:35 [s=" \t\r\n"]:38 Xstrrchr(ATOZ, c) [c='A']:31 [c='M']:25 [c='Z']:18 Xstrspn("0175713", "01234567") 30 Xstrstr(ATOZ, s) [s="a"]:55 [s="y"]:55 [s="klmnop"]:48 Xstrtok("a a ... a a"," ") ... Xstrtok(NULL, " ") ... 45 Xmemset(buf, 0, n) [n=4]:50 [n=1024]:9 [n=1024(386)]:5 Xstrlen(s) [s=ATOE]:47 [s=ATOZ]:38 / echo x - memchr.S sed '/^X/s///' > memchr.S << '/' X| memchr.s X| void *memchr(const void *s, int c, size_t n) X X| Returns a pointer to the first occurrence of c (converted to X| unsigned char) in the object pointed to by s, NULL if none. X X.define _MEMCHR X.globl _MEMCHR X.text X_MEMCHR: X push bp X mov bp,sp X push di X xor bx,bx | default result is NULL X mov cx,8(bp) X jcxz exit | early exit if n == 0 X movb al,6(bp) X mov di,4(bp) X cld X repne X scab X jne exit X lea bx,-1(di) | [dec di; mov bx,di] better on 8088/8086 Xexit: X mov ax,bx X pop di X mov sp,bp X pop bp X ret / echo x - memcmp.S sed '/^X/s///' > memcmp.S << '/' X| memcmp.s X| int memcmp(const void *s1, const void *s2, size_t n) X X| Compares the first n characters of the objects pointed to by X| s1 and s2. Returns zero if all characters are identical, a X| positive number if s1 greater than s2, a negative number otherwise. X XBYTE_LIMIT = 10 | if n is above this, we work with words X X.define _MEMCMP X.globl _MEMCMP X.text X_MEMCMP: X push bp X mov bp,sp X push si X push di X xor ax,ax | default return is equality X mov cx,8(bp) X jcxz exit | early exit if n == 0 X mov si,4(bp) X mov di,6(bp) X cmp si,di X je exit | early exit if s1 == s2 X cld X cmp cx,#BYTE_LIMIT X ja word_compare Xbyte_compare: X repe X cmpb X je exit X jmp find_difference Xword_compare: X test si,#1 | align s1 on word boundary X jz word_aligned X cmpb X jne find_difference X dec cx Xword_aligned: X mov dx,cx | save count X shr cx,#1 | compare words, not bytes X jz almost_done X repe X cmp X je almost_done X mov ax,-2(si) | fetch mismatched words X sub ax,-2(di) X orb al,al X jz find_difference | if low bytes match, high byte must not X cbw X jmp .dsret Xalmost_done: | most of string compared equal X test dx,#1 X jz exit X inc si X inc di Xfind_difference: X movb al,-1(si) | mismatch - determine > or < X subb al,-1(di) X cbw Xexit: X jmp .dsret / echo x - memmove.S sed '/^X/s///' > memmove.S << '/' X| memmove.s X| void *memmove(void *s1, const void *s2, size_t n) X| void *memcpy(void *s1, const void *s2, size_t n) X X| Copy n characters from the object pointed to by s2 into the X| object pointed to by s1. Copying takes place as if the n X| characters pointed to by s2 are first copied to a temporary X| area and then copied to the object pointed to by s1. X X| Per X3J11, memcpy may have undefined results if the objects X| overlap; since the performance penalty is insignificant, we X| use the safe memmove code for it as well. X XBYTE_LIMIT = 12 | if n is above this, we work with words X X.define _MEMMOVE, _MEMCPY X.globl _MEMMOVE, _MEMCPY X.text X_MEMMOVE: X_MEMCPY: X push bp X mov bp,sp X push si X push di X mov cx,8(bp) X jcxz exit | early exit if n == 0 X mov di,4(bp) X mov si,6(bp) X cmp si,di X je exit | early exit if s1 == s2 X ja left_to_right X std | s1 < s2: move right to left X add si,cx | compute objects' end addresses X dec si X add di,cx X dec di X cmp cx,#BYTE_LIMIT X jbe byte_move X test si,#1 | align source on word boundary X jnz 1f X movb X dec cx X1: X dec si | adjust to word boundary X dec di X jmp word_move Xleft_to_right: X cld | s1 > s2: move left to right X cmp cx,#BYTE_LIMIT X jbe byte_move X test si,#1 | align source on word boundary X jz word_move X movb X dec cx Xword_move: X shr cx,#1 | move words, not bytes X rep X movw X jnc exit X inc cx | set up to move leftover byte Xbyte_move: X rep X movb Xexit: X jmp .dsret / echo x - memset386.S sed '/^X/s///' > memset386.S << '/' X| memset.s X| void *memset(void *s, int c, size_t n) X X| Copies the value of c (converted to unsigned char) into the X| first n locations of the object pointed to by s. X X| !! 386 version !! X XOPERAND_SIZE = 102 | operand size override (i.e. force 32 bits) XBYTE_LIMIT = 16 | if n is above this, we work with words X X.define _MEMSET X.globl _MEMSET X.text X_MEMSET: X push bp X mov bp,sp X push si X push di X mov cx,8(bp) X jcxz exit | early exit if n == 0 X mov di,4(bp) X movb al,6(bp) X cld X cmp cx,#BYTE_LIMIT X jbe byte_set X movb ah,al | set up second byte X test di,#1 | align on word boundary X jz 1f X stob X dec cx X1: X test di,#2 | align on doubleword boundary X jz 2f X stow X dec cx X dec cx X2: X mov dx,ax | duplicate byte in all bytes of EAX X .byte OPERAND_SIZE | shl eax,16 X .byte 193 X .byte 224 X .byte 16 X mov ax,dx X mov dx,cx | save count X shr cx,#1 | set double words, not bytes X shr cx,#1 X rep X .byte OPERAND_SIZE X stow X and dx,#3 | set up to set leftover bytes X jz exit X mov cx,dx Xbyte_set: X rep X stob Xexit: X jmp .dsret / echo x - memset86.S sed '/^X/s///' > memset86.S << '/' X| memset.s X| void *memset(void *s, int c, size_t n) X X| Copies the value of c (converted to unsigned char) into the X| first n locations of the object pointed to by s. X XBYTE_LIMIT = 10 | if n is above this, we work with words X X.define _MEMSET X.globl _MEMSET X.text X_MEMSET: X push bp X mov bp,sp X push si X push di X mov cx,8(bp) X jcxz exit | early exit if n == 0 X mov di,4(bp) X movb al,6(bp) X cld X cmp cx,#BYTE_LIMIT X jbe byte_set X movb ah,al | set up high byte X test di,#1 | align on word boundary X jz 1f X stob X dec cx X1: X shr cx,#1 | set words, not bytes X rep X stow X jnc exit X inc cx | set up for last byte Xbyte_set: X rep X stob Xexit: X jmp .dsret / echo x - strcat.S sed '/^X/s///' > strcat.S << '/' X| strcat.s X| char *strcat(char *s1, const char *s2) X X| Concatenates the string pointed to by s2 onto the end of the X| string pointed to by s1. Returns s1. X X.define _STRCAT X.globl _STRCAT X.text X_STRCAT: X push bp X mov bp,sp X push si X push di X mov di,4(bp) X mov bx,di X mov si,6(bp) X cld X mov cx,#-1 | find end of s1 X xorb al,al X repne X scab X dec di | point back at null character X test si,#1 | align source on word boundary X jz word_copy X lodb X stob X orb al,al X jz exit Xword_copy: | loop to copy words X lodw X orb al,al X jz move_last_byte | early exit if low byte == 0 X stow X orb ah,ah X jnz word_copy X jmp exit Xmove_last_byte: X stob | add odd zero byte Xexit: X mov ax,bx X jmp .dsret / echo x - strchr.S sed '/^X/s///' > strchr.S << '/' X| strchr.s X| char *strchr(const char *s, int c) X X| Returns location of the first occurrence of c (converted to char) X| in the string pointed to by s. Returns NULL if c does not occur. X X.define _STRCHR X.globl _STRCHR X.text X_STRCHR: X push bp X mov bp,sp X push si X mov si,4(bp) X movb dl,6(bp) X cld X test si,#1 | align source on word boundary X jz word_loop X lodb X cmpb al,dl X je one_past X orb al,al X jz no_match Xword_loop: X lodw X cmpb al,dl X je two_past X orb al,al X jz no_match X cmpb ah,dl X je one_past X orb ah,ah X jnz word_loop Xno_match: X xor ax,ax X jmp .sret Xtwo_past: X dec si Xone_past: X dec si X mov ax,si X jmp .sret / echo x - strcmp.S sed '/^X/s///' > strcmp.S << '/' X| strcmp.s X| int strcmp(const char *s1, const char *s2) X X| Compares the strings pointed to by s1 and s2. Returns zero if X| strings are identical, a positive number if s1 greater than s2, X| and a negative number otherwise. X X.define _STRCMP X.globl _STRCMP X.text X_STRCMP: X push bp X mov bp,sp X push si X push di X xor ax,ax | default return is equality X mov si,4(bp) X mov di,6(bp) X cmp si,di X je exit | early exit if s1 == s2 X cld X test si,#1 | align s1 on word boundary X jz word_loop X lodb X orb al,al X jz last_byte_test X subb al,(di) X jnz exit X inc di Xword_loop: | loop through string by words X mov ax,(si) X orb al,al X jz last_byte_test X orb ah,ah X jz high_byte_zero X cmp X je word_loop X mov ax,-2(si) | find mismatch in final word X sub ax,-2(di) X orb al,al X jnz exit X movb al,ah X jmp exit Xhigh_byte_zero: X subb al,(di) X jnz exit X movb al,ah X inc di Xlast_byte_test: X subb al,(di) Xexit: X cbw X jmp .dsret / echo x - strcpy.S sed '/^X/s///' > strcpy.S << '/' X| strcpy.s X| char *strcpy(char *s1, const char *s2) X X| Copy the string pointed to by s2, including the terminating null X| character, into the array pointed to by s1. Returns s1. X X.define _STRCPY X.globl _STRCPY X.text X_STRCPY: X push bp X mov bp,sp X push si X push di X mov di,4(bp) X mov bx,di X mov si,6(bp) X cld X test si,#1 | align source on word boundary X jz word_copy X lodb X stob X orb al,al X jz exit Xword_copy: | loop to copy words X lodw X orb al,al X jz move_last_byte | early exit if low byte == 0 X stow X orb ah,ah X jnz word_copy X jmp exit Xmove_last_byte: X stob | add odd zero byte Xexit: X mov ax,bx X jmp .dsret / echo x - strcspn.S sed '/^X/s///' > strcspn.S << '/' X| strcspn.s X| size_t strcspn(const char *s1, const char *s2) X X| Returns the length of the longest prefix of the string pointed X| to by s1 that has none of the characters in the string s2. X X.define _STRCSPN X.globl _STRCSPN X.text X_STRCSPN: X push bp X mov bp,sp X push si X push di X mov si,4(bp) X mov di,6(bp) X cld X mov bx,#-1 | set up character count (-1 for faster loops) X cmpb (di),*0 X jz s1_length | if s2 has length zero, we return s1's length X inc di X cmpb (di),*0 X jz find_match | if s2 has length one, we take a shortcut X mov cx,bx | find length of s2 X xorb al,al X repne X scab X not cx X| dec cx | don't need this: we started at s2[1] X mov dx,cx | save length of s2 Xs1_loop: | loop through s1 looking for matches with s2 X lodb X inc bx X orb al,al X je exit X mov di,6(bp) X mov cx,dx X repne X scab X jne s1_loop X jmp exit Xs1_length: | find length of s1 X mov di,si X mov cx,bx X xorb al,al X repne X scab X not cx X dec cx X mov bx,cx X jmp exit Xfind_match: | find a match for *s2 in s1 X movb dl,-1(di) X test si,#1 | align source on word boundary X jz word_loop X lodb X inc bx X orb al,al X je exit X cmpb al,dl X je exit Xword_loop: X lodw X inc bx X orb al,al X je exit X cmpb al,dl X je exit X inc bx X orb ah,ah X je exit X cmpb ah,dl X jne word_loop Xexit: X mov ax,bx X jmp .dsret / echo x - strlen.S sed '/^X/s///' > strlen.S << '/' X| strlen.s X| size_t strlen(const char *s) X X| Returns the length of the string pointed to by s. X X.define _STRLEN X.globl _STRLEN X.text X_STRLEN: X push bp X mov bp,sp X push di X mov di,4(bp) X cld X mov cx,#-1 X xorb al,al X repne X scab X not cx | silly trick gives length (including null) X dec cx | forget about null X mov ax,cx X pop di X mov sp,bp X pop bp X ret / echo x - strncat.S sed '/^X/s///' > strncat.S << '/' X| strncat.s X| char *strncat(char *s1, const char *s2, size_t n) X X| Concatenates up to n characters of the string pointed to by s2 X| onto the end of the string pointed to by s1. A terminating X| null character is always appended. Returns s1. X X.define _STRNCAT X.globl _STRNCAT X.text X_STRNCAT: X push bp X mov bp,sp X push si X push di X mov cx,8(bp) X jcxz exit | early exit if n == 0 X mov di,4(bp) X mov si,6(bp) X cld X mov cx,#-1 | find end of s1 X xorb al,al X repne X scab X dec di X mov cx,8(bp) Xbyte_loop: | loop to copy bytes X lodb X stob X orb al,al X loopnz byte_loop X jz exit X movb (di),*0 | add terminating null character Xexit: X mov ax,4(bp) X jmp .dsret / echo x - strncmp.S sed '/^X/s///' > strncmp.S << '/' X| strncmp.s X| int strncmp(const char *s1, const char *s2, size_t n) X X| Compares up to n characters from the strings pointed to by s1 X| and s2. Returns zero if the (possibly null terminated) arrays X| are identical, a positive number if s1 is greater than s2, and X| a negative number otherwise. X X.define _STRNCMP X.globl _STRNCMP X.text X_STRNCMP: X push bp X mov bp,sp X push si X push di X xor ax,ax | default result is equality X mov cx,8(bp) X jcxz exit | early exit if n == 0 X mov si,4(bp) X mov di,6(bp) X cmp si,di X je exit | early exit if s1 == s2 X cld X test si,#1 | align s1 on word boundary X jz set_length X lodb X orb al,al X jz last_byte_test X subb al,(di) X jne exit X dec cx X jz exit | early exit if n == 1 X inc di Xset_length: X mov dx,cx | save count X shr cx,#1 | work with words, not bytes X jz fetch_last_byte Xword_loop: | loop through string by words X mov ax,(si) X orb al,al X jz last_byte_test X orb ah,ah X jz high_byte_zero X cmp X loope word_loop X je fetch_last_byte X mov ax,-2(si) | find mismatch in final word X sub ax,-2(di) X orb al,al X jnz exit X movb al,ah X jmp exit Xfetch_last_byte: X xor ax,ax X test dx,#1 X jz exit X movb al,(si) X jmp last_byte_test Xhigh_byte_zero: X subb al,(di) X jnz exit X movb al,ah X inc di Xlast_byte_test: X subb al,(di) Xexit: X cbw X jmp .dsret / echo x - strncpy.S sed '/^X/s///' > strncpy.S << '/' X| strncpy.s X| char *strncpy(char *s1, const char *s2, size_t n) X X| Copy up to n characters from the string pointed to by s2 to X| the array pointed to by s1. If the source string is shorter X| than n characters, the remainder of the destination is padded X| with null characters. If the source is longer than n characters, X| the destination will not be null terminated. Returns s1. X XBYTE_LIMIT = 10 | if n is above this, zero fill with words X X.define _STRNCPY X.globl _STRNCPY X.text X_STRNCPY: X push bp X mov bp,sp X push si X push di X mov cx,8(bp) X jcxz exit | early exit if n == 0 X mov di,4(bp) X mov si,6(bp) X cld X cmpb (si),*0 X je zero_fill | if s2 has length zero, take a short cut X test si,#1 | align source on word boundary X jz set_length X movb X dec cx X jz exit | early exit if n == 1 Xset_length: X mov dx,cx | save count X shr cx,#1 | copy words, not bytes X jz last_byte Xword_copy: | loop to copy words X lodw X orb al,al X jz restore_length | early exit if low byte == 0 X stow X orb ah,ah X loopnz word_copy X jz restore_length Xlast_byte: X test dx,#1 | move leftover byte X jz exit X movb X jmp exit Xrestore_length: | retrieve remaining length (in bytes) X shl cx,#1 X and dx,#1 X add cx,dx Xzero_fill: | add null characters if necessary X xor ax,ax X cmp cx,#BYTE_LIMIT X jbe zero_bytes X test di,#1 | align destination on word boundary X jz zero_words X stob X dec cx Xzero_words: X mov dx,cx | save count X shr cx,#1 | zero words, not bytes X rep X stow X test dx,#1 X jz exit X inc cx | set up to zero leftover byte Xzero_bytes: X rep X stob Xexit: X mov ax,4(bp) X jmp .dsret / echo x - strpbrk.S sed '/^X/s///' > strpbrk.S << '/' X| strpbrk.s X| char *strpbrk(const char *s1, const char *s2) X X| Returns the address of the first character of the string pointed X| to by s1 that is in the string pointed to by s2. Returns NULL X| if no such character exists. X X.define _STRPBRK X.globl _STRPBRK X.text X_STRPBRK: X push bp X mov bp,sp X push si X push di X mov si,4(bp) X mov di,6(bp) X cld X xor ax,ax | default return value is NULL X cmpb (di),*0 X jz exit | if s2 has length zero, we are done X inc di X cmpb (di),*0 X jz find_match | if s2 has length one, we take a shortcut X mov cx,#-1 | find length of s2 X repne X scab X not cx X| dec cx | don't need this: we started at s2[1] X mov dx,cx | save length of s2 Xs1_loop: | loop through s1 looking for matches with s2 X lodb X orb al,al X jz exit X mov di,6(bp) X mov cx,dx X repne X scab X jne s1_loop X lea ax,-1(si) X jmp exit Xfind_match: | find a match for *s2 in s1 X movb dl,-1(di) X test si,#1 | align source on word boundary X jz word_loop X lodb X cmpb al,dl X je one_past X orb al,al X jz no_match Xword_loop: X lodw X cmpb al,dl X je two_past X orb al,al X jz no_match X cmpb ah,dl X je one_past X orb ah,ah X jnz word_loop Xno_match: X xor ax,ax X jmp .dsret Xtwo_past: X dec si Xone_past: X dec si X mov ax,si Xexit: X jmp .dsret / echo x - strrchr.S sed '/^X/s///' > strrchr.S << '/' X| strrchr.s X| char *strrchr(const char *s, int c) X X| Locates final occurrence of c (as unsigned char) in string s. X X.define _STRRCHR X.globl _STRRCHR X.text X_STRRCHR: X push bp X mov bp,sp X push si X push di X xor bx,bx | default result is NULL X mov di,4(bp) X cld X mov cx,#-1 | find end of string X xorb al,al X repne X scab X not cx | silly trick gives length (including null) X dec di | point back at null character X movb al,6(bp) | find last occurrence of c X std X repne X scab X jne exit X lea bx,1(di) | [inc di; mov bx,di] better on 8088/8086 Xexit: X mov ax,bx X pop di X mov sp,bp X pop bp X ret / echo x - strspn.S sed '/^X/s///' > strspn.S << '/' X| strspn.s X| size_t strspn(const char *s1, const char *s2) X X| Returns the length of the longest prefix of the string pointed X| to by s1 that is made up of the characters in the string s2. X X.define _STRSPN X.globl _STRSPN X.text X_STRSPN: X push bp X mov bp,sp X push si X push di X mov si,4(bp) X mov di,6(bp) X cld X xor bx,bx | zero count of valid characters X cmpb (di),*0 X jz exit | if s2 has length zero, we are done X inc di X cmpb (di),*0 X jz find_mismatch | if s2 has length one, we take a shortcut X mov cx,#-1 | find length of s2 X xorb al,al X repne X scab X not cx X| dec cx | don't need this: we started at s2[1] X mov dx,cx | save length of s2 X dec bx | set up for faster loop Xs1_loop: | loop through s1 looking for matches with s2 X lodb X inc bx X orb al,al X je exit X mov di,6(bp) X mov cx,dx X repne X scab X je s1_loop X jmp exit Xfind_mismatch: | find a character in s1 that isn't *s2 X movb al,-1(di) X mov di,si X mov cx,#-1 X repe X scab X dec di | point back at mismatch X mov bx,di X sub bx,si | number of matched characters Xexit: X mov ax,bx X jmp .dsret / echo x - strstr.S sed '/^X/s///' > strstr.S << '/' X| strstr.s X| char * strstr(const char *s1, const char *s2) X X| Returns a pointer to the first occurrence in the string pointed X| to by s1 that is made up of the characters in the string s2. X X.define _STRSTR X.globl _STRSTR X.text X_STRSTR: X push bp X mov bp,sp X sub sp,#2 | make room for locals X push si X push di X mov si,4(bp) X mov di,6(bp) X mov bx,si | default result is s1 X movb ah,(di) | fetch first character of s2 X orb ah,ah X je exit | if s2 is null, we are done X cld X mov cx,#-1 | find length of s2 X xorb al,al X repne X scab X not cx X dec cx X mov -2(bp),cx | save length of s2 X mov cx,#-1 | find length + 1 of s1 X mov di,si X repne X scab X not cx X sub cx,-2(bp) | |s1| - |s2| + 1 is number of possibilities X jbe not_found | if |s1| < |s2|, give up right now X mov dx,cx X inc dx | set up for faster loop X dec bx Xs1_loop: X dec dx X jz not_found X inc bx X cmpb ah,(bx) X jne s1_loop | if first characters don't match, try another X mov di,6(bp) X mov si,bx X mov cx,-2(bp) X repe X cmpb X jne s1_loop X jmp exit Xnot_found: X xor bx,bx Xexit: X mov ax,bx X jmp .dsret / echo x - strtok.S sed '/^X/s///' > strtok.S << '/' X| strtok.s X| char *strtok(char *s1, const char *s2) X X| Returns a pointer to the "next" token in s1. Tokens are X| delimited by the characters in the string pointed to by s2. X X.define _STRTOK X.globl _STRTOK X.data Xscan: .word 0 X.text X_STRTOK: X push bp X mov bp,sp X push si X push di X cld X mov bx,4(bp) X or bx,bx | if s != NULL, X jnz s2_length | we start a new string X mov bx,scan X or bx,bx | if old string exhausted, X jz exit | exit early Xs2_length: | find length of s2 X mov di,6(bp) X mov cx,#-1 X xorb al,al X repne X scab X not cx X dec cx X jz string_finished | if s2 has length zero, we are done X mov dx,cx | save length of s2 X X mov si,bx X xor bx,bx | return value is NULL Xdelim_loop: | dispose of leading delimiters X lodb X orb al,al X jz string_finished X mov di,6(bp) X mov cx,dx X repne X scab X je delim_loop X X lea bx,-1(si) | return value is start of token Xtoken_loop: | find end of token X lodb X orb al,al X jz string_finished X mov di,6(bp) X mov cx,dx X repne X scab X jne token_loop X movb -1(si),*0 | terminate token X mov scan,si | set up for next call X jmp exit Xstring_finished: X mov scan,#0 | ensure NULL return in future Xexit: X mov ax,bx X jmp .dsret /
henry@utzoo.uucp (Henry Spencer) (10/11/89)
In article <20136@princeton.Princeton.EDU> nfs@notecnirp.UUCP (Norbert Schlenker) writes: >... The recent Dhrystone >1.1 posting by ast shows a 40% increase in Dhrystone rating on my >machine with these routines (only strcpy() and strcmp() are used there). One should not put too much emphasis on this. This is a known bug in Dhrystone -- it puts too much weight on the string functions and uses them in atypically favorable ways. Many system suppliers find that it is well worth their while to optimize the bejesus out of their string routines for the sake of Dhrystone numbers. -- A bit of tolerance is worth a | Henry Spencer at U of Toronto Zoology megabyte of flaming. | uunet!attcan!utzoo!henry henry@zoo.toronto.edu
nfs@notecnirp.Princeton.EDU (Norbert Schlenker) (10/11/89)
I just hate to look like an idiot in front of thousands, but this morning's posting of string routines is going to accomplish exactly that. I didn't try linking (out of the library) four routines that I changed just recently. Naturally, I got bitten by my oversight. A slight code rearrangement is needed to placate asld. Here are the context diffs for the routines that will cause problems. My profuse apologies. -------------------------------- Cut here -------------------------------- echo x - memcmp.S.cdif sed '/^X/s///' > memcmp.S.cdif << '/' X*** memcmp.S.orig Fri Oct 6 10:01:32 1989 X--- memcmp.S Tue Oct 10 19:20:19 1989 X*************** X*** 5,14 **** X | s1 and s2. Returns zero if all characters are identical, a X | positive number if s1 greater than s2, a negative number otherwise. X X! BYTE_LIMIT = 10 | if n is above this, we work with words X! X! .define _MEMCMP X! .globl _MEMCMP X .text X _MEMCMP: X push bp X--- 5,15 ---- X | s1 and s2. Returns zero if all characters are identical, a X | positive number if s1 greater than s2, a negative number otherwise. X X! .define _MEMCMP X! .globl _MEMCMP X! X! BYTE_LIMIT = 10 | if n is above this, we work with words X! X .text X _MEMCMP: X push bp / echo x - memmove.S.cdif sed '/^X/s///' > memmove.S.cdif << '/' X*** memmove.S.orig Fri Oct 6 10:04:03 1989 X--- memmove.S Tue Oct 10 19:20:29 1989 X*************** X*** 11,20 **** X | overlap; since the performance penalty is insignificant, we X | use the safe memmove code for it as well. X X! BYTE_LIMIT = 12 | if n is above this, we work with words X! X! .define _MEMMOVE, _MEMCPY X! .globl _MEMMOVE, _MEMCPY X .text X _MEMMOVE: X _MEMCPY: X--- 11,21 ---- X | overlap; since the performance penalty is insignificant, we X | use the safe memmove code for it as well. X X! .define _MEMMOVE, _MEMCPY X! .globl _MEMMOVE, _MEMCPY X! X! BYTE_LIMIT = 12 | if n is above this, we work with words X! X .text X _MEMMOVE: X _MEMCPY: / echo x - strncpy.S.cdif sed '/^X/s///' > strncpy.S.cdif << '/' X*** strncpy.S.orig Fri Oct 6 11:11:42 1989 X--- strncpy.S Tue Oct 10 19:20:41 1989 X*************** X*** 7,16 **** X | with null characters. If the source is longer than n characters, X | the destination will not be null terminated. Returns s1. X X! BYTE_LIMIT = 10 | if n is above this, zero fill with words X! X! .define _STRNCPY X! .globl _STRNCPY X .text X _STRNCPY: X push bp X--- 7,17 ---- X | with null characters. If the source is longer than n characters, X | the destination will not be null terminated. Returns s1. X X! .define _STRNCPY X! .globl _STRNCPY X! X! BYTE_LIMIT = 10 | if n is above this, zero fill with words X! X .text X _STRNCPY: X push bp /