mike@whuxl.UUCP (BALDWIN) (09/08/86)
Chris, I'm surprised. You made some gratuitous changes to my code, but you also broke it! Please don't publish code unless you're reasonably sure it works. I've tested mine against the examples in Knuth's Vol 3. For everyone's edification, here is the text from p. 392 for the Soundex algorithm: 1. Retain the first letter of the name, and drop all occurrences of a, e, h, i, o, u, w, y in other positions. 2. Assign the following numbers to the remaining letters after the first: b, f, p, v -> 1 l -> 4 c, g, j, k, q, s, x, z -> 2 m, n -> 5 d, t -> 3 r -> 6 3. If two or more letters with the same code were adjacent in the original name (before step 1), omit all but the first. 4. Convert to the form ``letter, digit, digit, digit'' by adding trailing zeros (if there are less than three digits), or by dropping rightmost digits (if there are more than three). The examples given in the book are: Euler, Ellery E460 Gauss, Ghosh G200 Hilbert, Heilbronn H416 Knuth, Kant K530 Lloyd, Ladd L300 Lukasiewicz, Lissajous L222 Most algorithms fail in two ways: 1. they omit adjacent letters with the same code AFTER step 1, not before. 2. they do not omit adjacent letters with the same code at the beginning of the name. I.e., most will fail on Lloyd, Lukasiewicz, and Lissajous. Some comments on your comments on my code: > > register char c, lc, prev = '0'; > `register int' generates better code on my compiler, and still works. Those variables are used as characters, not integers. I'm sorry that your compiler is deficient, but I like to declare variables the way they are used. > > if (isalpha(*name)) { > First you should test isascii(*name) (a nit). Um, isalpha() returns false for all non-ASCII characters already. > > lc = tolower(*name); > Watch out! Some tolower()s fail miserably if !isupper(c). I should have said my code conforms to the SVID; according to it and all other stds, tolower(c) will always work correctly. The only things I would change would be to add <string.h> and to cast strcpy to (void). Those are the only things my lint complains about. > #ifdef lint > /* lint cannot tell that prev is set before used */ > prev = 0; > #endif Mine can. Here is the CORRECT code again: ----- #include <ctype.h> #include <string.h> #define SDXLEN 4 char * soundex(name) char *name; { static char buf[SDXLEN+1]; register char c, lc, prev = '0'; register int i; (void) strcpy(buf, "a000"); for (i = 0; *name && i < SDXLEN; name++) if (isalpha(*name)) { lc = tolower(*name); c = "01230120022455012623010202" [lc-'a']; if (i == 0 || (c != '0' && c != prev)) { buf[i] = i ? c : lc; i++; } prev = c; } return buf; } ----- With the caveat that your tolower() may need an isupper() test in front of it, and you may have <strings.h> or none at all. Please don't change it unless you're sure the new code still works! -- Michael Baldwin (not the opinions of) AT&T Bell Laboratories {at&t}!whuxl!mike