joel@gould9.UUCP (Joel West) (11/09/85)
I had to compare two versions of a file; one was mostly uppercase, while the 2nd had a lot of changes that were only case changes. While I wish "diff" had an ignore case option, it doesn't. So I spent 20 minutes and wrote this. (no cracks about my slowness). It converts upper to lower or vice versa, and should work on just about any system. Note on names: "case" was out for obvious reasons. "uc" implies upper more than "lc" implies lower, to my limited mind. And there's already ld, ln, ls, and I used the alias "lf", so "u" looked better. ***FLAME SUIT ON*** If BSD 4.2 or V/2.0 already does this, MAIL but do not FOLLOW-UP. ***FLAME SUIT OFF** As always, we welcome comments by spokespersons with opposing viewpoints. Joel West CACI, Inc. Federal, La Jolla {cbosgd,floyd,ihnp4,pyramid,sdcsvax,ucla-cs}!gould9!joel gould9!joel@nosc.ARPA ------------------cut me---------------beat me-------------slice me----------- #!/bin/sh # This is a shell archive, meaning: # 1. Remove everything above the #!/bin/sh line. # 2. Save the resulting text in a file. # 3. Execute the file with /bin/sh (not csh) to create the files: # uc.c # This archive created: Sat Nov 9 11:24:26 1985 export PATH; PATH=/bin:$PATH echo shar: extracting "'uc.c'" '(910 characters)' if test -f 'uc.c' then echo shar: over-writing existing file "'uc.c'" fi cat << \SHAR_EOF > 'uc.c' /* uc.c: change to upper (or lower) case Joel West 11/9/85 <ihnp4!gould9!joel, gould9!joel@NOSC.arpa> */ #include <stdio.h> #include <ctype.h> int optl=0; main(argc,argv) int argc; char **argv; { char *p,c; int i, up2low, fileargs; up2low = 'a'-'A'; i=1; for (i=1; i<argc; i++) { if (*argv[i] != '-') break; p = argv[i]; if (*++p == 'l') optl++; else { fprintf(stderr, "usage: %s [-l] [file1 file2 ...]\n", argv[0]); exit (1); } } fileargs = i<argc; /* there are file arguments */ if (! fileargs) /* minimum one trip for stdin */ i = argc-1; for ( ; i<argc; i++) { if (fileargs) { if (freopen(argv[i], "r", stdin) == NULL) { perror(argv[i]); continue; } } while ((c=getchar()) != EOF) { if (optl) { if (isupper(c)) c += up2low; } else { if (islower(c)) c -= up2low; } putchar(c); } } } SHAR_EOF if test 910 -ne "`wc -c 'uc.c'`" then echo shar: error transmitting "'uc.c'" '(should have been 910 characters)' fi # End of shell archive exit 0
hester@ICSE.UCI.EDU (Jim Hester) (11/12/85)
UNIX provides this facility with the 'tr' (translate characters) program.
To change everything to upper case, use
tr A-Z a-z
I don't know what effect this has if the letters are not contiguous (as
in an IBM character code I won't name). If that is a problem, you just
explicitly list the letters from A to Z in both upper and lower cases.
If the files are reasonably large, a more efficient algorithm (than
checking character types during input) is a table lookup scheme like
the following (which is the basic method used by tr):
#define NCHARS 256
int table[ NCHARS ], ch;
for ( ch = 0 ; ch < NCHARS ; ++ch ) {
if ( islower(ch) )
table[ch] = toupper(ch);
else
table[ch] = ch;
}
while ( EOF != (ch = getchar()) ) putchar( table[ch] );
Running a few quick tests, table lookup took 3/4 of the time of
checking character types for each input character.
When alphabetic characters are contiguous (which implies a constant
difference between case of characters, which you took advantage of), as
in ASCII, the initialization loop can be sped up by elimenating the 256
calls to islower() and 26 calls to toupper(). Simply remove the first
three lines in the loop and add a new loop:
shift = 'A'-'a';
for ( ch = 'a' ; ch <= 'z' ; ++ch ) table[ch] += shift;
Also, if the character code uses a single bit to distinguish character
case, you can speed it up even more by just ANDing or ORing a mask to
the appropriate locations in the table:
mask = ~('a'-'A');
for ( ch = 'a' ; ch <= 'z' ; ++ch ) table[ch] &= mask;
One or both of these two speedups have negligable effect on the runtime
for large inputs since, being only used during a constant initialization
step, they are independant of the input size. It's probably better to
stick with something closer to the original code I gave, for reasons of
simplicity and portability.
Jim