jwagner@booboo.UUCP (Joel Wagner) (08/22/86)
I would like any info pertaining to soundex search algorithms (phonetic grep). Source to a nifty, efficient algorithm would be great, but I'll take anything. Thanx in advance. Joel Wagner Gould Inc., Information Systems Computer Systems Division {pur-ee,brl-bmd,sun}!gould!jwagner
jeffm@chinet.UUCP (Jeff Marraccini) (08/26/86)
Joel, I have a routine written in BASIC on the Apple many moons ago that uses Soundex coding. I'll see if I can find it and mail it to you. A good source is your city library. There are often sample routines in articles pertaining to soundex. I found one in "Soundex Explained" which was an abstract by Joe A. Donovan. If I can find that too I'll mail it. -- Jeff Marraccini ...ihnp4!{chinet!jeffm, mb2c!edsdrd!ahxenix!jeff} 5582 Dvorak St. Clarkston, MI 48016 313/623-7115 voice 313/623-6309 modem
rlk@chinet.UUCP (Richard Klappal) (08/27/86)
See Knuth's Art of Computer Programming (Vol 1, I think) -- --- UUCP: ..!ihnp4!chinet!uklpl!rlk || MCIMail: rklappal || Compuserve: 74106,1021 ..!ihnp4!ihu1h!rlk ---
blob@bnrmtv.UUCP (Brian Bechtel) (08/27/86)
> > I would like any info pertaining to soundex search algorithms > (phonetic grep). Source to a nifty, efficient algorithm would > be great, but I'll take anything. Thanx in advance. > /*********************************************************\ * This program exemplifies the soundex algorithm. * * * * You type in a word and it spits out the soundex string * * that was produced for that word. * \*********************************************************/ #include <stdio.h> char table[] = { '0', /* A */ '1', /* B */ '2', /* C */ '3', /* D */ '0', /* E */ '1', /* F */ '2', /* G */ '0', /* H */ '0', /* I */ '2', /* J */ '2', /* K */ '4', /* L */ '5', /* M */ '5', /* N */ '0', /* O */ '1', /* P */ '2', /* Q */ '6', /* R */ '2', /* S */ '3', /* T */ '0', /* U */ '1', /* V */ '0', /* W */ '2', /* X */ '0', /* Y */ '2' /* Z */ }; main() { char line[81], reduced[25]; printf("type quit to terminate\n"); do { gets(line); if(strcmp(line,"quit")==0) break; soundex(reduced,line); printf("%s = %s\n",line,reduced); } while(1); printf("done\n"); } soundex(d,s) char *d, *s; { char last_char='#'; int c; char *dorig, *ptr, *ptr2; dorig = ptr = d; /* pick up the first char in the string */ *d++ = *s++; /* for the rest of characters in the string */ while(*s) { /* throw away nonalphabetic characters */ if(isalpha(*s)==0) continue; /* convert to upper case */ *s = toupper(*s); /* convert to group code and place into destination string */ c = (int) ( *s++ - 'A') ; *d++ = table[c]; } *d = 0; do { /* if character is '0' or character is same as last one */ if(*ptr=='0' || *ptr==last_char) { /* get rid of the character */ ptr2 = ptr; while(*ptr2=*(ptr2+1)) ptr2++; } else { /* set last character seen */ last_char = *ptr++; } } while(*ptr); /* while still a character left in the string */ /* make sure the string isn't more than 4 characters */ *(dorig+4) = 0; }
mike@whuxl.UUCP (BALDWIN) (09/04/86)
> > I would like any info pertaining to soundex search algorithms > > (phonetic grep). Source to a nifty, efficient algorithm would > > be great, but I'll take anything. Thanx in advance. > > > > /*********************************************************\ > * This program exemplifies the soundex algorithm. * > * * > * You type in a word and it spits out the soundex string * > * that was produced for that word. * > \*********************************************************/ Unfortunately, it doesn't generate correct Soundex codes. The algorithm is actually pretty tricky, and I've seen lots that don't handle names like Lloyd and Manning properly. Here's one that I believe is correct: ----- #include <ctype.h> #define SDXLEN 4 char * soundex(name) char *name; { static char buf[SDXLEN+1]; register char c, lc, prev = '0'; register int i; strcpy(buf, "a000"); for (i = 0; *name && i < SDXLEN; name++) if (isalpha(*name)) { lc = tolower(*name); c = "01230120022455012623010202" [lc-'a']; if (i == 0 || (c != '0' && c != prev)) { buf[i] = i ? c : lc; i++; } prev = c; } return buf; } ----- And a little driver for it: ----- main() { char line[64]; while (gets(line)) puts(soundex(line)); return 0; } -- Michael Baldwin (not the opinions of) AT&T Bell Laboratories {at&t}!whuxl!mike
chris@umcp-cs.UUCP (Chris Torek) (09/04/86)
In article <1239@whuxl.UUCP> mike@whuxl.UUCP (BALDWIN) writes: > register char c, lc, prev = '0'; `register int' generates better code on my compiler, and still works. > if (isalpha(*name)) { First you should test isascii(*name) (a nit). > lc = tolower(*name); Watch out! Some tolower()s fail miserably if !isupper(c). Anyway, assuming that the basic algorithm is ... sound, I would change the driver routine, so: #include <ctype.h> #define SDXLEN 4 char * soundex(name) register char *name; { static char buf[SDXLEN+1]; static char codes[] = "01230120022455012623010202"; register int c, i = 0, prev; char *strcpy(); #ifdef lint /* lint cannot tell that prev is set before used */ prev = 0; #endif (void) strcpy(buf, "a000"); while ((c = *name++) != 0 && i < SDXLEN) { /* * Throw out non-alphabetics, and convert upper case * to lower. */ if (!isascii(c) || !isalpha(c)) continue; if (isupper(c)) c = tolower(c); /* * Non-first characters must translate to non-zero codes * that are different from the previous code; throw out * those that translate to zero or to prev. */ if (i > 0 && ((c = codes[c - 'a']) == '0' || c == prev)) continue; buf[i++] = prev = c; } return (buf); } -- In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 1516) UUCP: seismo!umcp-cs!chris CSNet: chris@umcp-cs ARPA: chris@mimsy.umd.edu
falk@sun.uucp (Ed Falk) (09/05/86)
> > Unfortunately, it doesn't generate correct Soundex codes. > The algorithm is actually pretty tricky, and I've seen > lots that don't handle names like Lloyd and Manning > properly. Here's one that I believe is correct: > ----- Both this version and the previous version had the same bug, to wit: lc = tolower(*name); The 'tolower' function only works if the input was already upper case. (at least in bsd 4.2 that's the way it is). The line should read lc = isupper(*name) ? tolower(*name) : *name ; Also, this version left trailing "0"'s on the output. Is this right? Perhaps someone could just post the plain english description of soundex... thanx.. -- -ed falk, sun microsystems falk@sun.com sun!falk
garyp@cognos.UUCP (Gary Puckering) (09/09/86)
> > > Perhaps someone could just post the plain english description of > soundex... > Sure! Be glad too... The Soundex function creates a phoenetic code based on an input key value according to the following rules: 1) Always retain the first letter of the key. 2) Drop out A, E, I, O, U, Y, W, and H. 3) Assign the following numbers to the remaining similar-sounding sets of letters: B, F, P, V = 1 C, G, J, K, Q, S, X, Z = 2 D, T =3 L = 4 M, N = 5 R = 6 4) If there are insufficient letters in the key to complete the code, fill with zeroes. 5) Drop adjacent equivalent letters (e.g. KELLY = K400). 6) Drop out adjacent equivalents of the first letter (e.g. LLOYD = L300). -- -- Gary Puckering COGNOS Incorporated 3755 Riverside Dr. Ottawa, Ontario Canada K1G 3N3 Telephone: (613) 738-1440 Telex: 053-3341