ok@edai.UUCP (05/11/84)
................ Here is a collection of useful string-handling functions in/for C. Many of them are based on UNIX routines with the same names, but the code was independently derived. Each of the routines has been tested, and seems to work. But free software is worth what you pay for it, and I dare say this is no exception. I make no claim that any of this is good for anything at all. Use it at your peril. Manual pages? You must be joking. 2853 lines follow the dots. ................ # to unbundle, csh this file echo Makefile cat >Makefile <<'EOF' # File : strings.d/Makefile # Author : Richard A. O'Keefe. # Updated: 4 May 1984. # Purpose: UNIX make(1)file for the strings library. # If you are not using a Vax, or if your strings might be 2^16 # characters long or longer, use # CFLAGS=-O # On the Vax we can use the string instructions some but not all the time. CFLAGS=-O -DVaxAsm # The SIII functions are the ones described in the System III # string(3) manual page, and also in ctype(3), atoi(3). SIII=strcat.o strncat.o strcmp.o strncmp.o strcpy.o strncpy.o strlen.o\ strchr.o strrchr.o strpbrk.o strspn.o strcspn.o strtok.o\ _c2type.o str2int.o getopt.o # The BSD2 functions are the ones described in the 4.2bsd # bstring(3) manual page, plus a couple of my additions. # All except ffs have VAX-specific machine code versions. BSD2=bcmp.o bcopy.o bfill.o bmove.o bzero.o ffs.o # The "mine" functions are the ones which are entirely my own # invention, though they are supposed to fit into the SIII conventions. mine=strmov.o strnmov.o strrpt.o strnrpt.o strcase.o strncase.o strend.o\ strnlen.o strcpbrk.o int2str.o _str2map.o _str2pat.o _str2set.o\ strpack.o strcpack.o strtrans.o strntrans.o strpref.o strsuff.o\ strtrim.o strctrim.o strfield.o strkey.o # The "find" functions are my code, but they are based on published # work by Boyer, Moore, and Hospool. (See _str2pat.c.) find=strfind.o strrepl.o strings.a: ${SIII} ${BSD2} ${mine} ${find} rm strings.a; ar rc strings.a *.o; ranlib strings.a scan=strpbrk.o strcprbk.o strspn.o strcspn.o strpack.o strcpack.o \ strtrim.o strctrim.o strtok.o ${scan} _str2set.o: _str2set.h tran=strtrans.o strntrans.o ${tran} _str2map.o: _str2map.h ${find}: _str2pat.h str2int.o: ctypes.h ${SIII} ${BSD2} ${mine} ${find}: strings.h clean: -rm *.o 'EOF' echo READ-ME cat >READ-ME <<'EOF' File : READ-ME Author : Richard A. O'Keefe. Updated: 30 April 1984 Purpose: Explain the new strings package. The UNIX string libraries (described in the string(3) manual page) differ from UNIX to UNIX (e.g. strtok is not in V7 or 4.1bsd). Worse, the sources are not in the public domain, so that if there is a string routine which is nearly what you want but not quite you can't take a copy and modify it. And of course C programmers on non-UNIX systems are at the mercy of their supplier. This package was designed to let me do reasonable things with C's strings whatever UNIX (V7, PaNiX, UX63, 4.1bsd) I happen to be using. Everything in the System III manual is here and does just what the S3 manual says it does. There are also lots of new goodies. I'm sorry about the names, but the routines do have to work on asphyxiated-at- birth systems which truncate identifiers. The convention is that a routine is called str [n] [c] <operation> If there is an "n", it means that the function takes an (int) "length" argument, which bounds the number of characters to be moved or looked at. If the function has a "set" argument, a "c" in the name indicates that the complement of the set is used. Functions or variables whose names start with _ are support routines which aren't really meant for general use. I don't know what the "p" is doing in "strpbrk", but it is there in the S3 manual so it's here too. "istrtok" does not follow this rule, but with 7 letters what can you do? I have included new versions of atoi(3) and atol(3) as well. They use a new primitive str2int, which takes a pair of bounds and a radix, and does much more thorough checking than the normal atoi and atol do. The result returned by atoi & atol is valid if and only if errno == 0. There is also an output conversion routine int2str, with itoa and ltoa as interface macros. Only after writing int2str did I notice that the str2int routine has no provision for unsigned numbers. On reflection, I don't greatly care. I'm afraid that int2str may depend on your "C" compiler in unexpected ways. Do check the code with -S. Several of these routines have "asm" inclusions conditional on the VaxAsm option. These insertions can make the routines which have them quite a bit faster, but there is a snag. The VAX architects, for some reason best known to themselves and their therapists, decided that all "strings" were shorter than 2^16 bytes. Even when the length operands are in 32-bit registers, only 16 bits count. So the "asm" versions do not work for long strings. If you can guarantee that all your strings will be short, define VaxAsm in the makefile, but in general, and when using other machines, do not define it. To use this library, you need the "strings.a" library file and the "strings.h" and "ctypes.h" header files. The other header files are for compiling the library itself, though if you are hacking extensions you may find them useful. General users really shouldn't see them. I've defined a few macros I find useful in "strings.h"; if you have no need for "index", "rindex", "streql", and "beql", just edit them out. On the 4.1bsd system I am using declaring all these functions 'extern' does not mean that they will all be loaded; but only the ones you use. When using lesser systems you may find it necessary to break strings.h up, or you could get by with just adding "extern" declarations for the functions you want as you need them. Many of these functions have the same names as functions in the "standard C library", by design as this is a replacement/reimplementation of part of that library. So you may have to talk the loader into loading this library first. Again, I've found no problems on 4.1bsd. You may wonder at my failure to provide manual pages for this code. For the things in V7, 4.?, or SIII, you should be able to use whichever manual page came with that system, and anything I might write would be so like it as to raise suspicions of violating AT&T copyrights. In the sources you will find comments which provide far more documentation for these routines than AT&T ever provided for their strings stuff, I just don't happen to have put it in nroff -man form. Had I done so, the .3 files would have outbulked the .c files! These files are in the public domain. This includes getopt.c, which is the work of Henry Spencer, University of Toronto Zoology, who says of it "None of this software is derived from Bell software. I had no access to the source for Bell's versions at the time I wrote it. This software is hereby explicitly placed in the public domain. It may be used for any purpose on any machine by anyone." I would greatly prefer it if *my* material received no military use. 'EOF' echo _c2type.c cat >_c2type.c <<'EOF' /* File : _c2type.c Author : Richard A. O'Keefe. Updated: 4 May 1984 Purpose: Map character codes to types The mapping used here is such that we can use it for converting numbers expressed in a variety of radices to binary as well as for classifying characters. */ char _c2type[129] = { 37, /* EOF == -1 */ 37, 37, 37, 37, 37, 37, 37, 37, 37, 38, 39, 39, 39, 39, 37, 37, 37, 37, 37, 37, 37, 37, 37, 37, 37, 37, 37, 37, 37, 37, 37, 37, 38, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 00, 01, 02, 03, 04, 05, 06, 07, 8, 9, 36, 36, 36, 36, 36, 36, 36, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 36, 36, 36, 36, 36, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 36, 36, 36, 37 }; 'EOF' echo _str2map.c cat >_str2map.c <<'EOF' /* File : _str2map.c Author : Richard A. O'Keefe. Updated: 20 April 1984 Defines: _map_vec[], _str2map(). _str2map(option, from, to) constructs a translation table. If from or to is NullS, the same string is used as last time, so if you want to translate a whole lot of strings using the same mapping you don't have to reconstruct it each time. The options are 0: initialise the map to the identity function, then map each from[i] to the corresponding to[i]. If to[] is shorter than from[], its last character is repeated as often as needed. 1: as 0, but don't initialise the map. 2: initialise the map to send every character to to[0], then map each from[i] to itself. For example, to build a map which forces letters to lower case but sends everything else to blank, call _str2map(2, "abcdefghijklmnopqrstuvwxyz", " "); _str2map(1, "ABCDEFGHIJKLMNOPQRSTUVWXYZ", "abcdefghijklmnopqrstuvwxyz"); Only strtrans() and strntrans() in this package call _str2map; if you want to build your own maps this way you can "fool" them into using it, as when the two strings are NullS they don't change the map. As an extra-special dubious *hack*, _map_vec has an extra NUL character at the end, so after calling _str2map(0, "", ""), you can use _map_vec+1 as a string of the 127 non-NUL characters (or if the _AlphabetSize is 256, of the 255 non-NUL characters). */ #include "strings.h" #include "_str2map.h" static _char_ *oldFrom = "?"; static char *oldTo = "?"; char _map_vec[_AlphabetSize+1]; void _str2map(option, from, to) int option; register _char_ *from; register char *to; { register int i, c; if (from == NullS && to == NullS) return; if (from == NullS) from = oldFrom; else oldFrom = from; if (to == NullS) to = oldTo; else oldTo = to; switch (option) { case 0: for (i = _AlphabetSize; --i >= 0; _map_vec[i] = i) ; case 1: while (i = *from++) { _map_vec[i] = *to++; if (!*to) { c = *--to; while (i = *from++) _map_vec[i] = c; return; } } return; case 2: c = *to; for (i = _AlphabetSize; --i >= 0; _map_vec[i] = c) ; while (c = *from++) _map_vec[c] = c; return; } } 'EOF' echo _str2map.h cat >_str2map.h <<'EOF' /* File : _str2map.h Author : Richard A. O'Keefe. Updated: 11 April 1984 Purpose: Definitions from _str2map.c */ extern char _map_vec[_AlphabetSize+1]; extern void _str2map(/*int,_char_^,char^*/); 'EOF' echo _str2pat.c cat >_str2pat.c <<'EOF' /* File : _str2pat.c Author : Richard A. O'Keefe. Updated: 23 April 1984 Defines: _pat_lim, _pat_vec[], _str2pat() Searching in this package is done by an algorithm due to R. Nigel Hospool, described in Software Practice & Experience 1980, p505. Elsewhere I have a version of it which does exact case or either case match, word more or literal mode, forwards or backwards, and will look for the Nth instance. For most applications that is too much and a simple exact case forward search will do. Hospool's algorithm is a simplification of the Boyer-Moore algorithm which doesn't guarantee linear time, but in practice is very good indeed. _str2pat(pat) builds a search table for the string pat. As usual in this pacakge, if pat == NullS, the table is not changed and the last search string is re-used. To support this, _str2pat returns the actual search string. */ #include "strings.h" #include "_str2pat.h" int _pat_lim; int _pat_vec[_AlphabetSize]; static _char_ *oldPat = ""; _char_ *_str2pat(pat) register _char_ *pat; { register int L, i; if (pat == NullS) pat = oldPat; else oldPat = pat; for (L = 0; *pat++; L++) ; for (i = _AlphabetSize; --i >= 0; _pat_vec[i] = L) ; _pat_lim = --L; pat = oldPat; for (i = L; i > 0; i--) _pat_vec[*pat++] = i; return oldPat; } 'EOF' echo _str2pat.h cat >_str2pat.h <<'EOF' /* File : _str2pat.h Author : Richard A. O'Keefe. Updated: 20 April 1984 Purpose: Definitions from _str2pat.c */ extern int _pat_lim; extern int _pat_vec[]; extern _char_ *_str2pat(/*_char_^*/); 'EOF' echo _str2set.c cat >_str2set.c <<'EOF' /* File : _str2set.c Author : Richard A. O'Keefe. Updated: 20 April 1984 Defines: _set_ctr, _set_vec[], _str2set(). Purpose: Convert a character string to a set. */ /* The obvious way of representing a set of characters is as a vector of 0s and 1s. The snag with that is that to convert a string to such a vector, we have to clear all the elements to 0, and then set the elements corresponding to characters in the string to 1, so the cost is O(|alphabet|+|string|). This package uses another method, where there is a vector of small numbers and a counter. A character is in the current set if and only if the corresponding element of the vector is equal to the current value of the counter. Every so often the vector elements would overflow and we have to clear the vector, but the cost is reduced to O(|string|+1). Note that NUL ('\0') will never be in any set built by str2set. While this method reduces the cost of building a set, it would be useful to avoid it entirely. So when the "set" argument is NullS the set is not changed. Use NullS to mean "the same set as before." MaxPosChar is the largest integer value which can be stored in a "char". Although we might get a slightly wider range by using "unsigned char", "char" may be cheaper (as on a PDP-11). By all means change the number from 127 if your C is one of those that treats char as unsigned, but don't change it just because _AlphabetSize is 256, the two are unrelated. And don't dare change it on a VAX: it is built into the asm code! */ #include "strings.h" #include "_str2set.h" #define MaxPosChar 127 int _set_ctr = MaxPosChar; char _set_vec[_AlphabetSize]; void _str2set(set) register char *set; { if (set == NullS) return; if (++_set_ctr == MaxPosChar+1) { #if VaxAsm asm("movc5 $0,4(ap),$0,$128,__set_vec"); #else ~VaxAsm register char *w = &_set_vec[_AlphabetSize]; do *--w = NUL; while (w != &_set_vec[0]); #endif VaxAsm _set_ctr = 1; } while (*set) _set_vec[*set++] = _set_ctr; } 'EOF' echo _str2set.h cat >_str2set.h <<'EOF' /* File : _str2set.h Updated: 10 April 1984 Purpose: External declarations for strprbk, strspn, strcspn &c Copyright (C) 1984 Richard A. O'Keefe. */ extern int _set_ctr; extern char _set_vec[]; extern void _str2set(/*char^*/); 'EOF' echo ascii.h cat >ascii.h <<'EOF' /* File : strings.d/ascii.h Author : Richard A. O'Keefe Updated: 28 April 1984 Purpose: Define Ascii mnemonics. This file defines the ASCII control characters. Note that these names refer to their use in communication; it is an ERROR to use these names to talk about keyboard commands. For example, DO NOT use EOT when you mean "end of file", as many people prefer ^Z (if the Ascii code were taken seriously, EOT would log you off and hang up the line as well). Similarly, DO NOT use DEL when you mean "interrupt", many people prefer ^C. When writing a screen editor, you should speak of tocntrl('C') rather than ETX (see the header file "ctypes.h"). */ #define NUL '\000' /* null character */ #define SOH '\001' /* Start Of Heading, start of message */ #define STX '\002' /* Start Of Text, end of address */ #define ETX '\003' /* End of TeXt, end of message */ #define EOT '\004' /* End Of Transmission */ #define ENQ '\005' /* ENQuiry "who are you" */ #define ACK '\006' /* (positive) ACKnowledge */ #define BEL '\007' /* ring the BELl */ #define BS '\010' /* BackSpace */ #define HT '\011' /* Horizontal Tab */ #define TAB '\011' /* an unofficial name for HT */ #define LF '\012' /* Line Feed (does not imply cr) */ #define NL '\012' /* unix unofficial name for LF: new line */ #define VT '\013' /* Vertical Tab */ #define FF '\014' /* Form Feed (new page starts AFTER this) */ #define CR '\015' /* Carriage Return */ #define SO '\016' /* Shift Out; select alternate character set */ #define SI '\017' /* Shift In; select ASCII again */ #define DLE '\020' /* Data Link Escape */ #define DC1 '\021' /* Device Control 1 */ #define XON '\021' /* transmitter on, resume output */ #define DC2 '\022' /* Device Control 2 (auxiliary on) */ #define DC3 '\023' /* Device Control 3 */ #define XOFF '\023' /* transmitter off, suspend output */ #define DC4 '\024' /* Device Control 4 (auxiliary off) */ #define NAK '\025' /* Negative AcKnowledge (signal error) */ #define SYN '\026' /* SYNchronous idle */ #define ETB '\027' /* End of Transmission Block, logical end of medium */ #define CAN '\030' /* CANcel */ #define EM '\031' /* End of Medium */ #define SUB '\032' /* SUBstitute */ #define ESC '\033' /* ESCape */ #define FS '\034' /* File Separator */ #define GS '\035' /* Group Separator */ #define RS '\036' /* Record Separator */ #define US '\037' /* Unit Separator */ #define SP '\040' /* SPace */ #define DEL '\177' /* DELete, rubout */ 'EOF' echo bcmp.c cat >bcmp.c <<'EOF' /* File : bcmp.c Author : Richard A. O'Keefe. Updated: 23 April 1984 Defines: bcmp() bcmp(s1, s2, len) returns 0 if the "len" bytes starting at "s1" are identical to the "len" bytes starting at "s2", non-zero if they are different. The 4.2bsd manual page doesn't say what non-zero value is returned, though the BUGS note says that it takes its parameters backwards from strcmp. This suggests that it is something like for (; --len >= 0; s1++, s2++) if (*s1 != *s2) return *s2-*s1; return 0; There, I've told you how to do it. As the manual page doesn't come out and *say* that this is the result, I tried to figure out what a useful result might be. (I'd forgotten than strncmp stops when it hits a NUL, which the above does not do.) What I came up with was: the result is the number of bytes in the differing tails. That is, after you've skipped the equal parts, how many characters are left? To put it another way, N-bcmp(s1,s2,N) is the number of equal bytes (the size of the common prefix). After deciding on this definition I discovered that the CMPC3 instruction does exactly what I wanted. The code assumes that N is non-negative. Note: the "b" routines are there to exploit certain VAX order codes, but the CMPC3 instruction will only test 65535 characters. The asm code is presented for your interest and amusement. */ #include "strings.h" #if VaxAsm int bcmp(s1, s2, len) char *s1, *s2; int len; { asm("cmpc3 12(ap),*4(ap),*8(ap)"); } #else ~VaxAsm int bcmp(s1, s2, len) register char *s1, *s2; register int len; { while (--len >= 0 && *s1++ == *s2++) ; return len+1; } #endif VaxAsm 'EOF' echo bcopy.c cat >bcopy.c <<'EOF' /* File : bcopy.c Author : Richard A. O'Keefe. Updated: 23 April 1984 Defines: bcopy() bcopy(src, dst, len) moves exactly "len" bytes from the source "src" to the destination "dst". It does not check for NUL characters as strncpy() and strnmov() do. Thus if your C compiler doesn't support structure assignment, you can simulate it with bcopy(&from, &to, sizeof from); BEWARE: the first two arguments are the other way around from almost everything else. I'm sorry about that, but that's the way it is in the 4.2bsd manual, though they list it as a bug. For a version with the arguments the right way around, use bmove(). No value is returned. Note: the "b" routines are there to exploit certain VAX order codes, but the MOVC3 instruction will only move 65535 characters. The asm code is presented for your interest and amusement. */ #include "strings.h" #if VaxAsm void bcopy(src, dst, len) char *src, *dst; int len; { asm("movc3 12(ap),*4(ap),*8(ap)"); } #else ~VaxAsm void bcopy(src, dst, len) register char *src, *dst; register int len; { while (--len >= 0) *dst++ = *src++; } #endif VaxAsm 'EOF' echo bfill.c cat >bfill.c <<'EOF' /* File : bfill.c Author : Richard A. O'Keefe. Updated: 23 April 1984 Defines: bfill() bfill(dst, len, fill) moves "len" fill characters to "dst". Thus to set a buffer to 80 spaces, do bfill(buff, ' ', 80). Note: the "b" routines are there to exploit certain VAX order codes, but the MOVC5 instruction will only move 65535 characters. The asm code is presented for your interest and amusement. */ #include "strings.h" #if VaxAsm void bfill(dst, len, fill) register char *dst; int len; int fill; /* actually char */ { asm("movc5 $0,*4(ap),12(ap),8(ap),*4(ap)"); } #else ~VaxAsm void bfill(dst, len, fill) register char *dst; register int len; register int fill; /* char */ { while (--len >= 0) *dst++ = fill; } #endif VaxAsm 'EOF' echo bmove.c cat >bmove.c <<'EOF' /* File : bmove.c Author : Richard A. O'Keefe. Updated: 23 April 1984 Defines: bmove() bmove(dst, src, len) moves exactly "len" bytes from the source "src" to the destination "dst". It does not check for NUL characters as strncpy() and strnmov() do. Thus if your C compiler doesn't support structure assignment, you can simulate it with bmove(&to, &from, sizeof from); The standard 4.2bsd routine for this purpose is bcopy. But as bcopy has its first two arguments the other way around you may find this a bit easier to get right. No value is returned. Note: the "b" routines are there to exploit certain VAX order codes, but the MOVC3 instruction will only move 65535 characters. The asm code is presented for your interest and amusement. */ #include "strings.h" #if VaxAsm void bmove(dst, src, len) char *dst, *src; int len; { asm("movc3 12(ap),*8(ap),*4(ap)"); } #else ~VaxAsm void bmove(dst, src, len) register char *dst, *src; register int len; { while (--len >= 0) *dst++ = *src++; } #endif VaxAsm 'EOF' echo bzero.c cat >bzero.c <<'EOF' /* File : bzero.c Author : Richard A. O'Keefe. Updated: 23 April 1984 Defines: bzero() bzero(dst, len) moves "len" 0 bytes to "dst". Thus to clear a disc buffer to 0s do bzero(buffer, BUFSIZ). Note: the "b" routines are there to exploit certain VAX order codes, but the MOVC5 instruction will only move 65535 characters. The asm code is presented for your interest and amusement. */ #include "strings.h" #if VaxAsm void bzero(dst, len) char *dst; int len; { asm("movc5 $0,*4(ap),$0,8(ap),*4(ap)"); } #else ~VaxAsm void bzero(dst, len) register char *dst; register int len; { while (--len >= 0) *dst++ = 0; } #endif VaxAsm 'EOF' echo ctypes.demo cat >ctypes.demo <<'EOF' EOF . . . . . . . . . # . . ch DD? OD? XD? AN? AF? LC? UC? PT? PR? CT? SP? EL? ^@ . . . . . . . . . # . . ^A . . . . . . . . . # . . ^B . . . . . . . . . # . . ^C . . . . . . . . . # . . ^D . . . . . . . . . # . . ^E . . . . . . . . . # . . ^F . . . . . . . . . # . . ^G . . . . . . . . . # . . ^H . . . . . . . . . # . . ^I . . . . . . . . . # # . ^J . . . . . . . . . # # # ^K . . . . . . . . . # # # ^L . . . . . . . . . # # # ^M . . . . . . . . . # # # ^N . . . . . . . . . # . . ^O . . . . . . . . . # . . ch DD? OD? XD? AN? AF? LC? UC? PT? PR? CT? SP? EL? ^P . . . . . . . . . # . . ^Q . . . . . . . . . # . . ^R . . . . . . . . . # . . ^S . . . . . . . . . # . . ^T . . . . . . . . . # . . ^U . . . . . . . . . # . . ^V . . . . . . . . . # . . ^W . . . . . . . . . # . . ^X . . . . . . . . . # . . ^Y . . . . . . . . . # . . ^Z . . . . . . . . . # . . ^[ . . . . . . . . . # . . ^\ . . . . . . . . . # . . ^] . . . . . . . . . # . . ^^ . . . . . . . . . # . . ^_ . . . . . . . . . # . . ch DD? OD? XD? AN? AF? LC? UC? PT? PR? CT? SP? EL? . . . . . . . . # . # . ! . . . . . . . # # . . . " . . . . . . . # # . . . # . . . . . . . # # . . . $ . . . . . . . # # . . . % . . . . . . . # # . . . & . . . . . . . # # . . . ' . . . . . . . # # . . . ( . . . . . . . # # . . . ) . . . . . . . # # . . . * . . . . . . . # # . . . + . . . . . . . # # . . . , . . . . . . . # # . . . - . . . . . . . # # . . . . . . . . . . . # # . . . / . . . . . . . # # . . . ch DD? OD? XD? AN? AF? LC? UC? PT? PR? CT? SP? EL? 0 # # # # . . . . # . . . 1 # # # # . . . . # . . . 2 # # # # . . . . # . . . 3 # # # # . . . . # . . . 4 # # # # . . . . # . . . 5 # # # # . . . . # . . . 6 # # # # . . . . # . . . 7 # # # # . . . . # . . . 8 # . # # . . . . # . . . 9 # . # # . . . . # . . . : . . . . . . . # # . . . ; . . . . . . . # # . . . < . . . . . . . # # . . . = . . . . . . . # # . . . > . . . . . . . # # . . . ? . . . . . . . # # . . . ch DD? OD? XD? AN? AF? LC? UC? PT? PR? CT? SP? EL? @ . . . . . . . # # . . . A . . # # # . # . # . . . B . . # # # . # . # . . . C . . # # # . # . # . . . D . . # # # . # . # . . . E . . # # # . # . # . . . F . . # # # . # . # . . . G . . . # # . # . # . . . H . . . # # . # . # . . . I . . . # # . # . # . . . J . . . # # . # . # . . . K . . . # # . # . # . . . L . . . # # . # . # . . . M . . . # # . # . # . . . N . . . # # . # . # . . . O . . . # # . # . # . . . ch DD? OD? XD? AN? AF? LC? UC? PT? PR? CT? SP? EL? P . . . # # . # . # . . . Q . . . # # . # . # . . . R . . . # # . # . # . . . S . . . # # . # . # . . . T . . . # # . # . # . . . U . . . # # . # . # . . . V . . . # # . # . # . . . W . . . # # . # . # . . . X . . . # # . # . # . . . Y . . . # # . # . # . . . Z . . . # # . # . # . . . [ . . . . . . . # # . . . \ . . . . . . . # # . . . ] . . . . . . . # # . . . ^ . . . . . . . # # . . . _ . . . . . . . # # . . . ch DD? OD? XD? AN? AF? LC? UC? PT? PR? CT? SP? EL? ` . . . . . . . # # . . . a . . # # # # . . # . . . b . . # # # # . . # . . . c . . # # # # . . # . . . d . . # # # # . . # . . . e . . # # # # . . # . . . f . . # # # # . . # . . . g . . . # # # . . # . . . h . . . # # # . . # . . . i . . . # # # . . # . . . j . . . # # # . . # . . . k . . . # # # . . # . . . l . . . # # # . . # . . . m . . . # # # . . # . . . n . . . # # # . . # . . . o . . . # # # . . # . . . ch DD? OD? XD? AN? AF? LC? UC? PT? PR? CT? SP? EL? p . . . # # # . . # . . . q . . . # # # . . # . . . r . . . # # # . . # . . . s . . . # # # . . # . . . t . . . # # # . . # . . . u . . . # # # . . # . . . v . . . # # # . . # . . . w . . . # # # . . # . . . x . . . # # # . . # . . . y . . . # # # . . # . . . z . . . # # # . . # . . . { . . . . . . . # # . . . | . . . . . . . # # . . . } . . . . . . . # # . . . ~ . . . . . . . # # . . . DEL . . . . . . . # . # . . 'EOF' echo ctypes.h cat >ctypes.h <<'EOF' /* File : ctypes.h Author : Richard A. O'Keefe. Updated: 26 April 1984 Purpose: Reimplement the UNIX ctype(3) library. isaneol(c) means that c is a line terminating character. isalnum, ispunct, isspace, and isaneol are defined on the range -1..127, i.e. on ASCII U {EOF}, while all the other macros are defined for any integer. isodigit(c) checks for Octal digits. isxdigit(c) checkx for heXadecimal digits. */ #define isdigit(c) ((unsigned)((c)-'0') < 10) #define islower(c) ((unsigned)((c)-'a') < 26) #define isupper(c) ((unsigned)((c)-'A') < 26) #define isprint(c) ((unsigned)((c)-' ') < 95) #define iscntrl(c) ((unsigned)((c)-' ') >= 95) #define isascii(c) ((unsigned)(c) < 128) #define isalpha(c) ((unsigned)(((c)|32)-'a') < 26) extern char _c2type[]; #define isalnum(c) (_c2type[(c)+1] < 36) #define ispunct(c) (_c2type[(c)+1] == 36) #define isspace(c) (_c2type[(c)+1] > 37) #define isaneol(c) (_c2type[(c)+1] > 38) #define isxdigit(c) (_c2type[(c)+1] < 16) #define isodigit(c) ((unsigned)((c)-'0') < 8) /* The following "conversion" macros have been in some versions of UNIX but are not in all. tocntrl is new. The original motivation for ^? being a name for DEL was that (x)^64 mapped A..Z to ^A..^Z and also ? to DEL. The trouble is that this trick doesn't work for lower case letters. The version given here is not mine. I wish it was. It has the nice property that DEL is mapped to itself (so does EOF). tolower(c) and toupper(c) are only defined when isalpha(c). */ #define tolower(c) ((c)|32) #define toupper(c) ((c)&~32) #define tocntrl(c) (((((c)+1)&~96)-1)&127) #define toascii(c) ((c)&127) 'EOF' echo ffs.c cat >ffs.c <<'EOF' /* File : ffs.c Author : Richard A. O'Keefe. Updated: 20 April 1984 Defines: ffs(), ffc() ffs(i) returns the index of the least significant 1 bit in i, where 1 means the least significant bit and 32 means the most significant bit, or returns -1 if i is zero. ffc(i) returns the index of the least significant 0 bit in i, where 1 means the least significant bit and 32 means the most significant bit, or returns -1 if i is zero. These functions mimic the VAX FFS and FFC instructions, except that the latter return much more sensible values. This file only exists to make it easier to move 4.2bsd programs to System III (which is rather like moving up from a Rolls Royce to a model T Ford), and so I haven't bother with assembly code versions. */ #include "strings.h" int ffs(i) register int i; { register int N; for (N = 8*sizeof(int); --N >= 0; i >>= 1) if (i&1) return 8*sizeof(int)-N; return -1; } int ffc(i) register int i; { register int N; for (N = 8*sizeof(int); --N >= 0; i >>= 1) if (!(i&1)) return 8*sizeof(int)-N; return -1; } 'EOF' echo getopt.3 cat >getopt.3 <<'EOF' .TH GETOPT 3 local .DA 25 March 1982 .SH NAME getopt \- get option letter from argv .SH SYNOPSIS .ft B int getopt(argc, argv, optstring) .br int argc; .br char **argv; .br char *optstring; .sp extern char *optarg; .br extern int optind; .ft .SH DESCRIPTION .I Getopt returns the next option letter in .I argv that matches a letter in .IR optstring . .I Optstring is a string of recognized option letters; if a letter is followed by a colon, the option is expected to have an argument that may or may not be separated from it by white space. .I Optarg is set to point to the start of the option argument on return from .IR getopt . .PP .I Getopt places in .I optind the .I argv index of the next argument to be processed. Because .I optind is external, it is normally initialized to zero automatically before the first call to .IR getopt . .PP When all options have been processed (i.e., up to the first non-option argument), .I getopt returns .BR EOF . The special option .B \-\- may be used to delimit the end of the options; .B EOF will be returned, and .B \-\- will be skipped. .SH SEE ALSO getopt(1) .SH DIAGNOSTICS .I Getopt prints an error message on .I stderr and returns a question mark .RB ( ? ) when it encounters an option letter not included in .IR optstring . .SH EXAMPLE The following code fragment shows how one might process the arguments for a command that can take the mutually exclusive options .B a and .BR b , and the options .B f and .BR o , both of which require arguments: .PP .RS .nf main(argc, argv) int argc; char **argv; { int c; extern int optind; extern char *optarg; \&. \&. \&. while ((c = getopt(argc, argv, "abf:o:")) != EOF) { switch (c) { case 'a': if (bflg) errflg++; else aflg++; break; case 'b': if (aflg) errflg++; else bflg++; break; case 'f': ifile = optarg; break; case 'o': ofile = optarg; break; case '?': default: errflg++; break; } } if (errflg) { fprintf(stderr, "Usage: ..."); exit(2); } for (; optind < argc; optind++) { \&. \&. \&. } \&. \&. \&. } .RE .PP A template similar to this can be found in .IR /usr/pub/template.c . .SH HISTORY Written by Henry Spencer, working from a Bell Labs manual page. Behavior believed identical to the Bell version. .SH BUGS It is not obvious how `\-' standing alone should be treated; this version treats it as a non-option argument, which is not always right. .PP Option arguments are allowed to begin with `\-'; this is reasonable but reduces the amount of error checking possible. .PP .I Getopt is quite flexible but the obvious price must be paid: there is much it could do that it doesn't, like checking mutually exclusive options, checking type of option arguments, etc. 'EOF' echo getopt.c cat >getopt.c <<'EOF' /* File : getopt.c Author : Henry Spencer, University of Toronto Updated: 28 April 1984 Purpose: get option letter from argv. */ #include <stdio.h> #include "strings.h" char *optarg; /* Global argument pointer. */ int optind = 0; /* Global argv index. */ int getopt(argc, argv, optstring) int argc; char *argv[]; char *optstring; { register int c; register char *place; static char *scan = NullS; /* Private scan pointer. */ optarg = NullS; if (scan == NullS || *scan == '\0') { if (optind == 0) optind++; if (optind >= argc) return EOF; place = argv[optind]; if (place[0] != '-' || place[1] == '\0') return EOF; optind++; if (place[1] == '-' && place[2] == '\0') return EOF; scan = place+1; } c = *scan++; place = index(optstring, c); if (place == NullS || c == ':') { fprintf(stderr, "%s: unknown option %c\n", argv[0], c); return '?'; } if (*++place == ':') { if (*scan != '\0') { optarg = scan, scan = NullS; } else { optarg = argv[optind], optind++; } } return c; } 'EOF' echo int2str.c cat >int2str.c <<'EOF' /* File : int2str.c Author : Richard A. O'Keefe Updated: 30 April 1984 Defines: int2str(), itoa(), ltoa() int2str(dst, radix, val) converts the (long) integer "val" to character form and moves it to the destination string "dst" followed by a terminating NUL. The result is normally a pointer to this NUL character, but if the radix is dud the result will be NullS and nothing will be changed. If radix is -2..-36, val is taken to be SIGNED. If radix is 2.. 36, val is taken to be UNSIGNED. That is, val is signed if and only if radix is. You will normally use radix -10 only through itoa and ltoa, for radix 2, 8, or 16 unsigned is what you generally want. _dig_vec is public just in case someone has a use for it. The definitions of itoa and ltoa are actually macros in strings.h, but this is where the code is. */ #include "strings.h" char _dig_vec[] = "0123456789abcdefghijklmnopqrstuvwxyz"; char *int2str(dst, radix, val) register char *dst; register int radix; register long val; { char buffer[33]; register char *p; if (radix < 0) { if (radix < -36 || radix > -2) return NullS; if (val < 0) { *dst++ = '-'; val = -val; } radix = -radix; } else { if (radix > 36 || radix < 2) return NullS; } /* The slightly contorted code which follows is due to the fact that few machines directly support unsigned long / and %. Certainly the VAX C compiler generates a subroutine call. In the interests of efficiency (hollow laugh) I let this happen for the first digit only; after that "val" will be in range so that signed integer division will do. Sorry 'bout that. CHECK THE CODE PRODUCED BY YOUR C COMPILER. The first % and / should be unsigned, the second % and / signed, but C compilers tend to be extraordinarily sensitive to minor details of style. This works on a VAX, that's all I claim for it. */ p = &buffer[32]; *p = '\0'; *--p = _dig_vec[(unsigned long)val%(unsigned long)radix]; val = (unsigned long)val/(unsigned long)radix; while (val != 0) *--p = _dig_vec[val%radix], val /= radix; while (*dst++ = *p++) ; return dst-1; } 'EOF' echo str2int.c cat >str2int.c <<'EOF' /* File : str2int.c Author : Richard A. O'Keefe Updated: 27 April 1984 Defines: str2int(), atoi(), atol() str2int(src, radix, lower, upper, &val) converts the string pointed to by src to an integer and stores it in val. It skips leading spaces and tabs (but not newlines, formfeeds, backspaces), then it accepts an optional sign and a sequence of digits in the specified radix. The result should satisfy lower <= *val <= upper. The result is a pointer to the first character after the number; trailing spaces will NOT be skipped. If an error is detected, the result will be NullS, the value put in val will be 0, and errno will be set to EDOM if there are no digits ERANGE if the result would overflow or otherwise fail to lie within the specified bounds. Check that the bounds are right for your machine. This looks amazingly complicated for what you probably thought was an easy task. Coping with integer overflow and the asymmetric range of twos complement machines is anything but easy. So that users of atoi and atol can check whether an error occured, I have taken a wholly unprecedented step: errno is CLEARED if this call has no problems. */ #include "strings.h" #include "ctypes.h" #include <errno.h> extern int errno; /* CHECK THESE CONSTANTS FOR YOUR MACHINE!!! */ #if pdp11 # define MaxInt 0x7fffL /* int = 16 bits */ # define MinInt 0x8000L # define MaxLong 0x7fffffffL /* long = 32 bits */ # define MinLong 0x80000000L #else ~pdp11 # define MaxInt 0x7fffffffL /* int = 32 bits */ # define MinInt 0x80000000L # define MaxLong 0x7fffffffL /* long = 32 bits */ # define MinLong 0x80000000L #endif pdp11 char *str2int(src, radix, lower, upper, val) register char *src; register int radix; long lower, upper, *val; { int sign; /* is number negative (+1) or positive (-1) */ int n; /* number of digits yet to be converted */ long limit; /* "largest" possible valid input */ long scale; /* the amount to multiply next digit by */ long sofar; /* the running value */ register int d; /* (negative of) next digit */ char *answer; /* Make sure *val is sensible in case of error */ *val = 0; /* Check that the radix is in the range 2..36 */ if (radix < 2 || radix > 36) { errno = EDOM; return NullS; } /* The basic problem is: how do we handle the conversion of a number without resorting to machine-specific code to check for overflow? Obviously, we have to ensure that no calculation can overflow. We are guaranteed that the "lower" and "upper" arguments are valid machine integers. On sign-and-magnitude, twos-complement, and ones-complement machines all, if +|n| is representable, so is -|n|, but on twos complement machines the converse is not true. So the "maximum" representable number has a negative representative. Limit is set to min(-|lower|,-|upper|); this is the "largest" number we are concerned with. */ /* Calculate Limit using Scale as a scratch variable */ if ((limit = lower) > 0) limit = -limit; if ((scale = upper) > 0) scale = -scale; if (scale < limit) limit = scale; /* Skip leading spaces and check for a sign. Note: because on a 2s complement machine MinLong is a valid integer but |MinLong| is not, we have to keep the current converted value (and the scale!) as *negative* numbers, so the sign is the opposite of what you might expect. Should the test in the loop be isspace(*src)? */ while (*src == ' ' || *src == '\t') src++; sign = -1; if (*src == '+') src++; else if (*src == '-') src++, sign = 1; /* Check that there is at least one digit */ if (_c2type[1+ *src] >= radix) { errno = EDOM; return NullS; } /* Skip leading zeros so that we never compute a power of radix in scale that we won't have a need for. Otherwise sticking enough 0s in front of a number could cause the multiplication to overflow when it neededn't. */ while (*src == '0') src++; /* Move over the remaining digits. We have to convert from left to left in order to avoid overflow. Answer is after last digit. */ for (n = 0; _c2type[1+ *src++] < radix; n++) ; answer = --src; /* The invariant we want to maintain is that src is just to the right of n digits, we've converted k digits to sofar, scale = -radix**k, and scale < sofar < 0. Now if the final number is to be within the original Limit, we must have (to the left)*scale+sofar >= Limit, or (to the left)*scale >= Limit-sofar, i.e. the digits to the left of src must form an integer <= (Limit-sofar)/(scale). In particular, this is true of the next digit. In our incremental calculation of Limit, IT IS VITAL that (-|N|)/(-|D|) = |N|/|D| */ for (sofar = 0, scale = -1; --n >= 0; ) { d = _c2type[1+ *--src]; if (-d < limit) { errno = ERANGE; return NullS; } limit = (limit+d)/radix, sofar += d*scale; if (n != 0) scale *= radix; /* watch out for overflow!!! */ } /* Now it might still happen that sofar = -32768 or its equivalent, so we can't just multiply by the sign and check that the result is in the range lower..upper. All of this caution is a right pain in the neck. If only there were a standard routine which says generate thus and such a signal on integer overflow... But not enough machines can do it *SIGH*. */ if (sign < 0 && sofar < -MaxLong /* twos-complement problem */ || (sofar*=sign) < lower || sofar > upper) { errno = ERANGE; return NullS; } *val = sofar; errno = 0; /* indicate that all went well */ return answer; } int atoi(src) char *src; { long val; str2int(src, 10, MinInt, MaxInt, &val); return (int)val; } long atol(src) char *src; { long val; str2int(src, 10, MinLong, MaxLong, &val); return val; } 'EOF' echo strcase.c cat >strcase.c <<'EOF' /* File : strcase.c Author : Richard A. O'Keefe. Updated: 4 May 1984 Defines: strcase() strcase(dst, src, op) copies characters from src to dst until a NUL is encountered changing the alphabetic case of letters according to the op. The operations available are 0 -> convert to lower case llllll 1 -> convert to upper case UUUUUU 2 -> capitalise each word Cccccc 3 -> change each letter to the opposite case 3 isn't particularly useful unless you know that all the letters in src are already in the same case. BEWARE: this is set up for ASCII only. You can use the same idea for EBCDIC, but the magic numbers are different. I haven't used an #ifdef because (a) I don't know what name to use (ebcdic? Ebcdic?) and (b) I don't suppose many people will want it. The result is a pointer to the NUL which now ends dst. You can use strcase(buff, buff, op) safely. */ #include "strings.h" #include "ctypes.h" #define UPPER 0 /* EBCDIC: 64 */ #define LOWER 32 /* EBCDIC: 0 */ #define OTHER 32 /* EBCDIC: 64 */ char *strcase(dst, src, op) register char *dst, *src; int op; { register int d; /* Should be char */ register int mask; /* Should be char */ char initial, rest; switch (op) { case 0: initial = LOWER, rest = LOWER; break; case 1: initial = UPPER, rest = UPPER; break; case 2: initial = UPPER, rest = LOWER; break; case 3: while (d = *src++) *dst++ = isalpha(d) ? d^OTHER : d; goto done; } for (mask = initial; d = *src++; *dst++ = d) if (isalpha(d)) { d = (d &~ OTHER) | mask, mask = rest; } else { mask = initial; } done: *dst = '\0'; return dst; } 'EOF' echo strcat.c cat >strcat.c <<'EOF' /* File : strcat.c Author : Richard A. O'Keefe. Updated: 10 April 1984 Defines: strcat() strcat(s, t) concatenates t on the end of s. There had better be enough room in the space s points to; strcat has no way to tell. Note that strcat has to search for the end of s, so if you are doing a lot of concatenating it may be better to use strmov, e.g. strmov(strmov(strmov(strmov(s,a),b),c),d) rather than strcat(strcat(strcat(strcpy(s,a),b),c),d). strcat returns the old value of s. */ #include "strings.h" char *strcat(s, t) register char *s, *t; { char *save; for (save = s; *s++; ) ; for (--s; *s++ = *t++; ) ; return save; } 'EOF' echo strchr.c cat >strchr.c <<'EOF' /* File : strchr.c Author : Richard A. O'Keefe. Updated: 20 April 1984 Defines: strchr(), index() strchr(s, c) returns a pointer to the first place in s where c occurs, or NullS if c does not occur in s. This function is called index in V7 and 4.?bsd systems; while not ideal the name is clearer than strchr, so index remains in strings.h as a macro. NB: strchr looks for single characters, not for sets or strings. To find the NUL character which closes s, use strchr(s, '\0') or strend(s). The parameter 'c' is declared 'int' so it will go in a register; if your C compiler is happy with register _char_ change it to that. */ #include "strings.h" char *strchr(s, c) register _char_ *s; register int c; { for (;;) { if (*s == c) return s; if (!*s++) return NullS; } } 'EOF' echo strcmp.c cat >strcmp.c <<'EOF' /* File : strcmp.c Author : Richard A. O'Keefe. Updated: 10 April 1984 Defines: strcmp() strcmp(s, t) returns > 0, = 0, or < 0 when s > t, s = t, or s < t according to the ordinary lexicographical order. To test for equality, the macro streql(s,t) is clearer than !strcmp(s,t). Note that if the string contains characters outside the range 0..127 the result is machine-dependent; PDP-11s and VAXen use signed bytes, some other machines use unsigned bytes. */ #include "strings.h" int strcmp(s, t) register char *s, *t; { while (*s == *t++) if (!*s++) return 0; return s[0]-t[-1]; } 'EOF' echo strcpack.c cat >strcpack.c <<'EOF' /* File : strcpack.c Author : Richard A. O'Keefe. Updated: 20 April 1984 Defines: strcpack() strcpack(dst, src, set, c) copies characters from src to dst, stopping when it finds a NUL. If c is NUL, characters not in the set are not copied to dst. If c is not NUL, sequences of characters not in the set are copied as a single c. strcpack is to strpack as strcspn is to strspn. If your C compiler is happy with register _char_, change the declaration of c. The result is the address of the NUL byte that now terminates "dst". Note that dst may safely be the same as src. */ #include "strings.h" #include "_str2set.h" char *strcpack(dst, src, set, c) register _char_ *dst, *src; char *set; register int c; { register int chr; _str2set(set); while (chr = *src++) { if (_set_vec[chr] != _set_ctr) { while ((chr = *src++) && _set_vec[chr] != _set_ctr) ; if (c) *dst++ = c; /* 1. If you don't want trailing */ if (!chr) break; /* 2. things turned into "c", swap */ } /* lines 1 and 2. */ *dst++ = chr; } *dst = 0; return dst; } 'EOF' echo strcpbrk.c cat >strcpbrk.c <<'EOF' /* File : strcpbrk.c Author : Richard A. O'Keefe. Updated: 20 April 1984 Defines: strcpbrk() strcpbrk(s1, s2) returns a pointer to the first character of s1 which does not occur in s2. It is to strpbrk as strcspn is to strspn. It relies on NUL never being in a set. */ #include "strings.h" #include "_str2set.h" char *strcpbrk(str, set) register _char_ *str; char *set; { _str2set(set); while (_set_vec[*str++] == _set_ctr); return *--str ? str : NullS; } 'EOF' echo strcpy.c cat >strcpy.c <<'EOF' /* File : strcpy.c Author : Richard A. O'Keefe. Updated: 20 April 1984 Defines: strcpy() strcpy(dst, src) copies all the characters of src (including the closing NUL) to dst, and returns the old value of dst. Maybe this is useful for doing i = strlen(strcpy(dst, src)); I've always found strmov handier. */ #include "strings.h" char *strcpy(dst, src) register char *dst, *src; { char *save; for (save = dst; *dst++ = *src++; ) ; return save; } 'EOF' echo strcspn.c cat >strcspn.c <<'EOF' /* File : strcspn.c Author : Richard A. O'Keefe. Updated: 11 April 1984 Defines: strspn() strcspn(s1, s2) returns the length of the longest prefix of s1 consisting entirely of characters which are NOT in s2 ("c" is "complement"). NUL is considered to be part of s2. As _str2set will never include NUL in a set, we have to check for it explicitly. */ #include "strings.h" #include "_str2set.h" int strcspn(str, set) register _char_ *str; char *set; { register int L; _str2set(set); for (L = 0; *str && _set_vec[*str++] != _set_ctr; L++) ; return L; } 'EOF' echo strctrim.c cat >strctrim.c <<'EOF' /* File : strctrim.c Author : Richard A. O'Keefe. Updated: 20 April 1984 Defines: strctrim() strctrim(dst, src, set, ends) copies src to dst, but will skip leading characters not in set if ends <= 0 and will skip trailing characters not in set if ends >= 0. Thus there are three cases: ends < 0 : trim a prefix ends = 0 : trim a prefix and a suffix both ends > 0 : trim a suffix This is to strtrim as strcspn is to strspn. */ #include "strings.h" #include "_str2set.h" char *strctrim(dst, src, set, ends) register char *dst, *src; char *set; int ends; { _str2set(set); if (ends <= 0) { register int chr; while ((chr = *src++) && _set_vec[chr] != _set_ctr) ; --src; } if (ends >= 0) { register int chr; register char *save = dst; while (chr = *src++) { *dst++ = chr; if (_set_vec[chr] == _set_ctr) save = dst; } dst = save, *dst = NUL; } else { while (*dst++ = *src++) ; --dst; } return dst; } 'EOF' echo strend.c cat >strend.c <<'EOF' /* File : strend.c Author : Richard A. O'Keefe. Updated: 23 April 1984 Defines: strend() strend(s) returns a character pointer to the NUL which ends s. That is, strend(s)-s == strlen(s). This is useful for adding things at the end of strings. It is redundant, because strchr(s,'\0') could be used instead, but this is clearer and faster. Beware: the asm version works only if strlen(s) < 65535. */ #include "strings.h" #if VaxAsm char *strend(s) char *s; { asm("locc $0,$65535,*4(ap)"); asm("movl r1,r0"); } #else ~VaxAsm char *strend(s) register char *s; { while (*s++); return s-1; } #endif VaxAsm 'EOF' echo strfield.c cat >strfield.c <<'EOF' /* File : strfield.c Author : Richard A. O'Keefe. Updated: 21 April 1984 Defines: strfield() strfield(src, fields, chars, blanks, tabch) is based on the key specifications of the sort(1) command. tabch corresponds to 'x' in -t'x'. If it is NUL, a field is leading layout (spaces, tabs &c) followed by at least one non-layout character, and is terminated by the next layout character or NUL. If it is not NUL, a field is terminated by tabch or NUL. fields is the number of fields to skip over. It corresponds to m in -m.n or +m.n . There must be at least this many fields, and only the last may be terminated by NUL. chars is the number of characters to skip after the fields have been skipped. At least this many non-NUL characters must remain after the fields have been skipped. Note that it is entirely possible for this skip to cross one or more field boundaries. This corresponds to n in +m.n or -m.n . Finally, if blanks is not 0, any layout characters will be skipped. There need not be any. This corresponds to the letter b in +2.0b or -0.4b . The result is NullS if the source ran out of fields or ran out of chars. Otherwise it is a pointer to the first character of src which was not skipped. It is quite possible for this character to be the terminating NUL. Example: to skip to the user-id field of /etc/passwd: user_id = strfield(line, 2, 0, 0, ':'); to check whether "line" is at least 27 characters long: if (strfield(line, 0, 27, 0, 0)) then-it-is; to select the third blank-delimited field in a line: head = strfield(line, 2, 0, 1, 0); tail = strfield(head, 1, 0, 0, 0); (* the field is the tail-head characters starting at head *) It's not a bug, it's a feature: "layout" means any ASCII character in the range '\1' .. ' ', including '\n', '\f' and so on. */ #include "strings.h" char *strfield(src, fields, chars, blanks, tabch) register char *src; int fields, chars, blanks, tabch; { if (tabch <= 0) { while (--fields >= 0) { while (*src <= ' ') if (!*src++) return NullS; while (*++src > ' ') ; } } else if (fields > 0) { do if (!*src) return NullS; while (*src++ != tabch || --fields > 0); } while (--chars >= 0) if (!*src++) return NullS; if (blanks) while (*src && *src++ <= ' ') ; return src; } 'EOF' echo strfind.c cat >strfind.c <<'EOF' /* File : strfind.c Author : Richard A. O'Keefe. Updated: 23 April 1984 Defines: strfind() strfind(src, pat) looks for an instance of pat in src. pat is not a regex(3) pattern, it is a literal string which must be matched exactly. As a special hack to prevent infinite loops, the empty string will be found just once, at the far end of src. This is hard to justify. The result is a pointer to the first character AFTER the located instance, or NullS if pat does not occur in src. The reason for returning the place after the instance is so that you can count the number of instances by writing _str2pat(ToBeFound); for (p = src, n = 0; p = strfind(p, NullS); n++) ; If you want a pointer to the first character of the instance, it is up to you to subtract strlen(pat). If there were a strnfind it wouldn't have to look at all the characters of src, this version does otherwise it could miss the closing NUL. */ #include "strings.h" #include "_str2pat.h" char *strfind(src, pat) char *src, *pat; { register char *s, *p; register int c, lastch; pat = _str2pat(pat); if (_pat_lim < 0) { for (s = src; *s++; ) ; return s-1; } /* The pattern is non-empty */ for (c = _pat_lim, lastch = pat[c]; ; c = _pat_vec[c]) { for (s = src; --c >= 0; ) if (!*s++) return NullS; c = *s, src = s; if (c == lastch) { for (s -= _pat_lim, p = pat; *p; ) if (*s++ != *p++) goto not_yet; return s; not_yet:; } } } 'EOF' echo strings.h cat >strings.h <<'EOF' /* File : strings.h Updated: 30 April 1984 Purpose: Header file for the "string(3C)" package. Copyright (C) 1984 Richard A. O'Keefe. All the routines in this package are the original work of R.A.O'Keefe. Any resemblance between them and any routines in licensed software is due entirely to these routines having been written using the "man 3 string" UNIX manual page, or in some cases the "man 1 sort" manual page as a specification. See the READ-ME to find the conditions under which these routines may be used & copied. */ #define NullS (char*)0 #define NUL '\0' #ifndef _AlphabetSize #define _AlphabetSize 128 #endif #if _AlphabetSize == 128 typedef char _char_; #endif #if _AlphabetSize == 256 typedef unsigned char _char_; #endif /* NullS is the "nil" character pointer. NULL would work in most cases, but in some C compilers pointers and integers may be of different sizes, so it is handy to have a nil pointer that one can pass to a function as well as compare pointers against. NUL is the "end of string character". Strings are deemed to end at the first NUL, or, for the routines which take an N argument, when N is exhausted. None of the routines in this package works on the length alone. (NUL is the ASCII name for this character.) The routines which move characters around don't care whether they are signed or unsigned. But the routines which compare a character in a string with an argument, or use a character from a string as an index into an array, do care. I have assumed that _AlphabetSize = 128 => only 0..127 appear in strings _AlphabetSize = 256 => only 0..255 appear in strings The files _str2set.c and _str2map.c declare character vectors using this size. If you don't have unsigned char, your machine may treat char as unsigned anyway. */ extern char *strcat(/*char^,char^*/); extern char *strncat(/*char^,char^,int*/); extern int strcmp(/*char^,char^*/); extern int strncmp(/*char^,char^,int*/); #define streql !strcmp #define strneql !strncmp extern char *strcpy(/*char^,char^*/); extern char *strncpy(/*char^,char^,int*/); extern int strlen(/*char^*/); extern int strnlen(/*char^,int*/); extern char *strchr(/*char^,_char_*/); extern char *strrchr(/*char^,_char_*/); #define index strchr #define rindex strrchr extern char *strmov(/*char^,char^*/); extern char *strnmov(/*char^,char^,int*/); extern char *strend(/*char^*/); extern char *strpbrk(/*char^,char^*/); extern char *strcpbrk(/*char^,char^*/); extern int strspn(/*char^,char^*/); extern int strcspn(/*char^,char^*/); extern char *strtok(/*char^,char^*/); extern void istrtok(/*char^,char^*/); extern char *strpack(/*_char_^,_char_^,char^,int*/); extern char *strcpack(/*_char_^,_char_^,char^,int*/); extern int strrpt(/*char^,char^,int*/); extern int strnrpt(/*char^,int,char^,int*/); extern void strtrans(/*_char_^,_char_^,_char_^,_char_^*/); extern void strntrans(/*_char_^,_char_^,int,_char_^,_char_^*/); extern char *strtrim(/*char^,char^,char^,int*/); extern char *strctrim(/*char^,char^,char^,int*/); extern char *strfield(/*char^,int,int,int,int*/); extern char *strkey(/*char^,char^,char^,char^*/); extern char *strfind(/*char^,char^*/); extern char *strrepl(/*char^,char^,char^,char^*/); extern void bcopy(/*char^,char^,int*/); extern void bmove(/*char^,char^,int*/); extern void bfill(/*char^,int,char*/); extern void bzero(/*char^,int*/); extern int bcmp(/*char^,char^,int*/); #define beql !bcmp extern int ffs(/*int*/); extern int ffc(/*int*/); extern char *str2int(/*char^,int,long,long,long^*/); extern int atoi(/*char^*/); extern long atol(/*char^*/); extern char *int2str(/*char^,int,long*/); #define itoa(d, n) int2str(d, -10, (long)(n)) #define ltoa(d, n) int2str(d, -10, (long)(n)) 'EOF' echo strkey.c cat >strkey.c <<'EOF' /* File : strkey.c Author : Richard A. O'Keefe. Updated: 20 April 1984 Defines: strkey() strkey(dst, head, tail, options) copies tail-head characters from head to dst according to the options. If tail is NullS, it copies up to the terminating NUL of head. This function is meant for doing comparisons as by sort(1). The options are thus a string of characters taken from "bdfin". In case the options came from somewhere else other letters are ignored. -b: leading layout characters are not copied. -d: only letters, digits, and blanks are copied. -i: only graphic characters (32..126) are copied. -n: a numeric string is copied. These options are incompatible, and the last is taken. -f: upper case letters are copied as lower case. The question of what to do with a numeric string is an interesting one, and I don't claim that this is a brilliant answer. However, the solution used here does mean that the caller can compare two strings as strings without needing to know that they are numeric. A number is copied as <sign><9 digits>.<remaining digits>, where <sign> is '-' for a negative number and '0' for a positive number. The magic number 9 is defined to be DigitMagic. The idea is that to compare two lines using the keys -tx +m1.n1<flags> -m2.n2 you do h1 = strfield(line1, m1, n1, 0, 'x'); t1 = strfield(h1, 1, 0, 0, 'x'); strkey(buff1, h1, t1, "flags"); h2 = strfield(line2, m2, n2, 0, 'x'); t2 = strfield(h2, 1, 0, 0, 'x'); strkey(buff2, h2, t2, "flags"); ... strcmp(buff1, buff2) ... The point of all this, of course, is to make it easier to write new utilities which are compatible with sort(1) than ones which are not. */ #include "strings.h" #define DigitMagic 9 char *strkey(dst, head, tail, flags) register char *dst, *head, *tail; char *flags; { register int c; int b = 0; /* b option? */ int f = 0; /* f option? */ int k = 0; /* 3->n, 2->d, 1->i, 0->none of them */ while (*flags) switch (*flags++|32) { case 'b': b++; break; case 'f': f++; break; case 'i': k = 1; break; case 'd': k = 2; break; case 'n': k = 3; break; default : /*ignore*/break; } flags = dst; /* save return value */ if (tail == NullS) for (tail = head; *tail; tail++) ; if (b) while (head != tail && *head <= ' ') head++; switch (k) { case 0: if (f) { while (head != tail) { c = *head++; if (c >= 'A' && c <= 'Z') c |= 32; *dst++ = c; } } else { while (head != tail) *dst++ = *head++; } break; case 1: if (f) { while (head != tail) { c = *head++; if (c >= 32 && c <= 126) { if (c >= 'A' && c <= 'Z') c |= 32; *dst++ = c; } } } else { while (head != tail) { c = *head++; if (c >= 32 && c <= 126) *dst++ = c; } } break; case 2: if (f) f = 32; while (head != tail) { c = *head++; if (c >= '0' && c <= '9' || c >= 'a' && c <= 'z' || c == ' ') { *dst++ = c; } else if (c >= 'A' && c <= 'Z') { *dst++ = c|f; } } break; case 3: if (*head == '-' && head != tail) { *dst++ = *head++; head++; } else { *dst++ = '0'; } b = 0; while (head != tail) { c = *head; if (c < '0' || c > '9') break; b++, head++; } f = DigitMagic-b; while (--f >= 0) *dst++ = '0'; head -= b; while (--b >= 0) *dst++ = *head++; if (*head == '.' && head != tail) { *dst++ = *head++; while (head != tail) { c = *head++; if (c < '0' || c > '9') break; *dst++ = c; } /* now remove trailing 0s and possibly the '.' as well */ while (*--dst == '0') ; if (*dst != '.') dst++; } break; } *dst = NUL; return flags; /* saved initial value of dst */ } 'EOF' echo strlen.c cat >strlen.c <<'EOF' /* File : strlen.c Author : Richard A. O'Keefe. Updated: 23 April 1984 Defines: strlen() strlen(s) returns the number of characters in s, that is, the number of non-NUL characters found before the closing NULEosCh. Note: some non-standard C compilers for 32-bit machines take int to be 16 bits, either put up with short strings or change int to long throughout this package. Better yet, BOYCOTT such shoddy compilers. Beware: the asm version works only if strlen(s) < 65536. */ #include "strings.h" #if VaxAsm int strlen(s) char *s; { asm("locc $0,$65535,*4(ap)"); asm("subl3 r0,$65535,r0"); } #else ~VaxAsm int strlen(s) register char *s; { register int L; for (L = 0; *s++; L++) ; return L; } #endif VaxAsm 'EOF' echo strmov.c cat >strmov.c <<'EOF' /* File : strmov.c Author : Richard A. O'Keefe. Updated: 20 April 1984 Defines: strmov() strmov(dst, src) moves all the characters of src (including the closing NUL) to dst, and returns a pointer to the new closing NUL in dst. The similar UNIX routine strcpy returns the old value of dst, which I have never found useful. strmov(strmov(dst,a),b) moves a//b into dst, which seems useful. */ #include "strings.h" char *strmov(dst, src) register char *dst, *src; { while (*dst++ = *src++) ; return dst-1; } 'EOF' echo strncase.c cat >strncase.c <<'EOF' /* File : strncase.c Author : Richard A. O'Keefe. Updated: 4 May 1984 Defines: strncase() strncase(dst, src, n, op) copies characters from src to dst until n runs out or a NUL is copied, whichever occurs first. It changes the alphabetic case of letters according to op. The options are 0 -> convert to lower case llllll 1 -> convert to upper case UUUUUU 2 -> capitalise each word Cccccc 3 -> change each letter to the opposite case This is the "n" version of strcase(). The result is a character pointer to the closing NUL if one was transferred, otherwise to the next character after the last one transferred. (The idea is that strncase(dst, src, n, op) = strnlen(src, n).) You can use strncase(buff, buff, n, op) safely. */ #include "strings.h" #include "ctypes.h" #define UPPER 0 /* EBCDIC: 64 */ #define LOWER 32 /* EBCDIC: 0 */ #define OTHER 32 /* EBCDIC: 64 */ char *strcase(dst, src, n, op) register char *dst, *src; int n; int op; { register int d; /* Should be char */ register int mask; /* Should be char */ char initial, rest; switch (op) { case 0: initial = LOWER, rest = LOWER; break; case 1: initial = UPPER, rest = UPPER; break; case 2: initial = UPPER, rest = LOWER; break; case 3: while (--n >= 0 && (d = *src++)) *dst++ = isalpha(d) ? d^OTHER : d; goto done; } for (mask = initial; --n >= 0 && (d = *src++); *dst++ = d) if (isalpha(d)) { d = (d &~ OTHER) | mask, mask = rest; } else { mask = initial; } done: if (n >= 0) *dst = '\0'; return dst; } 'EOF' echo strncat.c cat >strncat.c <<'EOF' /* File : strncat.c Author : Richard A. O'Keefe. Updated: 20 April 1984 Defines: strncat() strncat(dst, src, n) copies up to n characters of src to the end of dst. As with strcat, it has to search for the end of dst. Even if it abandons src early because n runs out it will still close dst with a NUL. See also strnmov. */ #include "strings.h" char *strncat(dst, src, n) register char *dst, *src; register int n; { char *save; for (save = dst; *dst++; ) ; for (--dst; --n >= 0; ) if (!(*dst++ = *src++)) return save; *dst = NUL; return save; } 'EOF' echo strncmp.c cat >strncmp.c <<'EOF' /* File : strncmp.c Author : Richard A. O'Keefe. Updated: 10 April 1984 Defines: strncmp() strncmp(s, t, n) compares the first n characters of s and t. If they are the same in the first n characters it returns 0, otherwise it returns the same value as strcmp(s, t) would. */ #include "strings.h" int strncmp(s, t, n) register char *s, *t; register int n; { while (--n >= 0) { if (*s != *t++) return s[0]-t[-1]; if (!*s++) return 0; } return 0; } 'EOF' echo strncpy.c cat >strncpy.c <<'EOF' /* File : strncpy.c Author : Richard A. O'Keefe. Updated: 20 April 1984 Defines: strncpy() strncpy(dst, src, n) copies up to n characters of src to dst. It will pad dst on the right with NUL or truncate it as necessary to ensure that n characters exactly are transferred. It returns the old value of dst as strcpy does. */ #include "strings.h" char *strncpy(dst, src, n) register char *dst, *src; register int n; { char *save; for (save = dst; --n >= 0; ) { if (!(*dst++ = *src++)) { while (--n >= 0) *dst++ = NUL; return save; } } return save; } 'EOF' echo strnlen.c cat >strnlen.c <<'EOF' /* File : strnlen.c Author : Richard A. O'Keefe. Updated: 10 April 1984 Defines: strnlen() strnlen(s, n) returns the number of characters up to the first NUL in s, or n, whichever is smaller. */ #include "strings.h" int strnlen(s, n) register char *s; register int n; { register int L; for (L = 0; --n >= 0 && *s++; L++) ; return L; } 'EOF' echo strnmov.c cat >strnmov.c <<'EOF' /* File : strnmov.c Author : Richard A. O'Keefe. Updated: 20 April 1984 Defines: strnmov() strnmov(dst, src, n) moves up to n characters of src to dst. It always moves exactly n characters to dst; if src is shorter than n characters dst will be extended on the right with NULs, while if src is longer than n characters dst will be a truncated version of src and will not have a closing NUL. The result is a pointer to the first NUL in dst, or is dst+n if dst was truncated. */ #include "strings.h" char *strnmov(dst, src, n) register char *dst, *src; register int n; { while (--n >= 0) { if (!(*dst++ = *src++)) { src = dst-1; while (--n >= 0) *dst++ = NUL; return src; } } return dst; } 'EOF' echo strnrpt.c cat >strnrpt.c <<'EOF' /* File : strnrpt.c Author : Richard A. O'Keefe. Updated: 20 April 1984 Defines: strnrpt() strnrpt(dst, n, src, k) "RePeaTs" the string src into dst k times, but will truncate the result at n characters if necessary. E.g. strnrpt(dst, 7, "hack ", 2) will move "hack ha" to dst WITHOUT the closing NUL. The result is the number of characters moved, not counting the closing NUL. Equivalent to strrpt-ing into an infinite buffer and then strnmov-ing the result. */ #include "strings.h" int strnrpt(dst, n, src, k) register char *dst; register int n; char *src; int k; { char *save; for (save = dst; --k >= 0; dst--) { register char *p; for (p = src; ; ) { if (--n < 0) return dst-save; if (!(*dst++ = *p++)) break; } } return dst-save; } 'EOF' echo strntrans.c cat >strntrans.c <<'EOF' /* File : strntrans.c Author : Richard A. O'Keefe. Updated: 20 April 1984 Defines: strntrans() strntrans(dst, src, n, from, to) copies exactly n characters from src to dst. It will not stop when it encounters a NUL, so you can use it with a table which maps NUL to something different. No value is returned. */ #include "strings.h" #include "_str2map.h" void strntrans(dst, src, n, from, to) register _char_ *dst, *src; register int n; _char_ *from, *to; { _str2map(0, from, to); while (--n >= 0) *dst++ = _map_vec[*src++] ; } 'EOF' echo strpack.c cat >strpack.c <<'EOF' /* File : strpack.c Author : Richard A. O'Keefe. Updated: 20 April 1984 Defines: strpack() strpack(dst, src, set, c) copies characters from src to dst, stopping when it finds a NUL. If c is NUL, characters in set are not copied to dst. If c is not NUL, sequences of characters from set are copied as a single c. strpack(d, s, " \t", ' ') can be used to compress white space, strpack(d, s, " \t", NUL) to eliminate it. To translate characters in set to c without compressing runs, see strtrans(). The result is the address of the NUL byte now terminating dst. Note that dst may safely be the same as src. */ #include "strings.h" #include "_str2set.h" char *strpack(dst, src, set, c) register _char_ *dst, *src; char *set; register int c; { register int chr; _str2set(set); while (chr = *src++) { if (_set_vec[chr] == _set_ctr) { while ((chr = *src++) && _set_vec[chr] == _set_ctr) ; if (c) *dst++ = c; /* 1. If you don't want trailing */ if (!chr) break; /* 2. things turned into "c", swap */ } /* lines 1 and 2. */ *dst++ = chr; } *dst = 0; return dst; } 'EOF' echo strpbrk.c cat >strpbrk.c <<'EOF' /* File : strpbrk.c Author : Richard A. O'Keefe. Updated: 11 April 1984 Defines: strpbrk() strpbrk(s1, s2) returns NullS if no character of s2 occurs in s1, or a pointer to the first character of s1 which occurs in s2 if there is one. It generalises strchr (v7=index). It wouldn't be useful to consider NUL as part of s2, as that would occur in every s1. */ #include "strings.h" #include "_str2set.h" char *strpbrk(str, set) register _char_ *str; char *set; { _str2set(set); while (_set_vec[*str] != _set_ctr) if (!*str++) return NullS; return str; } 'EOF' echo strpref.c cat >strpref.c <<'EOF' /* File : strpref.c Author : Richard A. O'Keefe. Updated: 11 April 1984 Defines: strpref() strpref(src, prefix) checks whether prefix is a prefix of src. If it is not, the result is NullS. If it is, the result is a pointer to the first character of src after the prefix (src+strlen(prefix)). */ #include "strings.h" char *strpref(src, prefix) register char *src, *prefix; { while (*prefix) if (*src++ != *prefix++) return NullS; return src; } 'EOF' echo strrchr.c cat >strrchr.c <<'EOF' /* File : strrchr.c Author : Richard A. O'Keefe. Updated: 10 April 1984 Defines: strrchr(), rindex() strrchr(s, c) returns a pointer to the last place in s where c occurs, or NullS if c does not occur in s. This function is called rindex in V7 and 4.?bsd systems; while not ideal the name is clearer than strrchr, so rindex remains in strings.h as a macro. NB: strrchr looks for single characters, not for sets or strings. The parameter 'c' is declared 'int' so it will go in a register; if your C compiler is happy with register char change it to that. */ #include "strings.h" char *strrchr(s, c) register _char_ *s; register int c; { register char *t; t = NullS; do if (*s == c) t = s; while (*s++); return t; } 'EOF' echo strrepl.c cat >strrepl.c <<'EOF' /* File : strrepl.c Author : Richard A. O'Keefe. Updated: 23 April 1984 Defines: strrepl() strrepl(dst, src, pat, rep, times) copies src to dst, replacing the first "times" non-overlapping instances of pat by rep. pat is not a regex(3) pattern, it is a literal string which must be matched exactly. As a special hack, since strfind claims to find "" just once at the end of the src string, strrepl does a strcat when pat is an empty string "". If times <= 0, it is just strmov. The result is a pointer to the NUL which now terminates dst. BEWARE: even when rep is shorter than pat it is NOT necessarily safe for dst to be the same as src. ALWAYS make sure dst and src do not/ will not overlap. You have been warned. There really ought to be a strnrepl with a bound for the size of the destination string, but there isn't. */ #include "strings.h" #include "_str2pat.h" char *strrepl(dst, src, pat, rep, times) char *dst, *src, *pat, *rep; int times; { register char *s, *p; register int c, lastch; pat = _str2pat(pat); if (times <= 0) { for (p = dst, s = src; *p++ = *s++; ) ; return p-1; } if (_pat_lim < 0) { for (p = dst, s = src; *p++ = *s++; ) ; for (--p, s = rep; *p++ = *s++; ) ; return p-1; } /* The pattern is non-empty and times is positive */ c = _pat_lim, lastch = pat[c]; for (;;) { for (s = src, p = dst; --c >= 0; ) if (!(*p++ = *s++)) return p-1; c = *s, src = s, dst = p; if (c == lastch) { for (s -= _pat_lim, p = pat; *p; ) if (*s++ != *p++) goto not_yet; for (p = dst-_pat_lim, s = rep; *p++ = *s++; ) ; --p; if (--times == 0) { for (s = src; *p++ = *++s; ) ; return p-1; } dst = p, src++, c = _pat_lim; } else { not_yet: c = _pat_vec[c]; } } } 'EOF' echo strrpt.c cat >strrpt.c <<'EOF' /* File : strrpt.c Author : Richard A. O'Keefe. Updated: 20 April 1984 Defines: strrpt() strrpt(dst, src, k) "RePeaTs" the string src into dst k times. E.g. strrpt(dst, "hack ", 2) will move "hack hack" to dst. If k <= 0 it does nothing. The result is the number of characters moved, except for the closing NUL. src may be "" but may not of course be NullS. */ #include "strings.h" int strrpt(dst, src, k) register char *dst; char *src; int k; { char *save; for (save = dst; --k >= 0; --dst) { register char *p; for (p = src; *dst++ = *p++; ) ; } return dst-save; } 'EOF' echo strspn.c cat >strspn.c <<'EOF' /* File : strspn.c Author : Richard A. O'Keefe. Updated: 11 April 1984 Defines: strspn() strspn(s1, s2) returns the length of the longest prefix of s1 consisting entirely of characters in s2. NUL is not considered to be in s2, and _str2set will not include it in the set. */ #include "strings.h" #include "_str2set.h" int strspn(str, set) register _char_ *str; char *set; { register int L; _str2set(set); for (L = 0; _set_vec[*str++] == _set_ctr; L++) ; return L; } 'EOF' echo strsuff.c cat >strsuff.c <<'EOF' /* File : strsuff.c Author : Richard A. O'Keefe. Updated: 11 April 1984 Defines: strsuff() strsuff(src, suffix) checks whether suffix is a suffix of src. If it is not, the result is NullS. If it is, the result is a pointer to the character of src where suffix starts (which is the same as src+strlen(src)-strlen(prefix) ). */ #include "strings.h" char *strsuff(src, suffix) register char *src, *suffix; { register int L; /* length of suffix */ for (L = 0; *suffix++; L++) if (!*src++) return NullS; while (*src++) ; for (--src, --suffix; --L >= 0; ) if (*--src != *--suffix) return NullS; return src; } 'EOF' echo strtok.c cat >strtok.c <<'EOF' /* File : strtok.c Author : Richard A. O'Keefe. Updated: 11 April 1984 Defines: istrtok(), strtok() strtok(src, set) skips over initial characters of src[] which occur in set[]. The result is a pointer to the first character of src[] which does not occur in set[]. It then skips until it finds a character which does occur in set[], and changes it to NUL. If src is NullS, it is as if you had specified the place just after the last NUL was written. If src[] contains no characters which are not in set[] (e.g. if src == "") the result is NullS. To read a sequence of words separated by spaces you might write p = strtok(sequence, " "); while (p) {process_word(p); p = strtok(NullS, " ");} This is unpleasant, so there is also a function istrtok(src, set) which builds the set and notes the source string for future reference. With this function, you can write for (istrtok(wordlist, " \t"); p = strtok(NullS, NullS); ) process_word(p); */ #include "strings.h" #include "_str2set.h" static char *oldSrc = ""; void istrtok(src, set) char *src, *set; { _str2set(set); if (src != NullS) oldSrc = src; } char *strtok(src, set) register char *src; char *set; { char *save; _str2set(set); if (src == NullS) src = oldSrc; while (_set_vec[*src] == _set_ctr) src++; if (!*src) return NullS; save = src; while (_set_vec[*++src] != _set_ctr) ; *src++ = NUL; oldSrc = src; return save; } 'EOF' echo strtrans.c cat >strtrans.c <<'EOF' /* File : strtrans.c Author : Richard A. O'Keefe. Updated: 11 April 1984 Defines: strtrans() strtrans(dst, src, from, to) copies characters from src[] to dst[], stopping when dst gets a NUL character, translating characters in from[] to corresponding characters in to[]. Courtesy of _str2map, if from or to is null its previous value will be used, and if both are NullS the table will not be rebuilt. Note that copying stops when a NUL is put into dst[], which can normally happen only when a NUL has been fetched from src[], but if you have built your own translation table it may be earlier (if some character is mapped to NUL) or later (if NUL is mapped to something else). No value is returned. */ #include "strings.h" #include "_str2map.h" void strtrans(dst, src, from, to) register _char_ *dst, *src; _char_ *from, *to; { _str2map(0, from, to); while (*dst++ = _map_vec[*src++]) ; } 'EOF' echo strtrim.c cat >strtrim.c <<'EOF' /* File : strtrim.c Author : Richard A. O'Keefe. Updated: 20 April 1984 Defines: strtrim() strtrim(dst, src, set, ends) copies src to dst, but will skip leading characters in set if "ends" is <= 0, and will skip trailing characters in set if ends is >= 0. Thus there are three cases: ends < 0 : trim a prefix ends = 0 : trim a prefix and a suffix both ends > 0 : trim a suffix To compress internal runs, see strpack. The normal use of this is strtrim(buffer, buffer, " \t", 0); The result is the address of the NUL which now terminates dst. */ #include "strings.h" #include "_str2set.h" char *strtrim(dst, src, set, ends) register char *dst, *src; char *set; int ends; { _str2set(set); if (ends <= 0) { while (_set_vec[*src] == _set_ctr) src++; } if (ends >= 0) { register int chr; register char *save = dst; while (chr = *src++) { *dst++ = chr; if (_set_vec[chr] != _set_ctr) save = dst; } dst = save, *dst = NUL; } else { while (*dst++ = *src++) ; --dst; } return dst; } 'EOF'