ron@iconsys.UUCP (Ron Holt) (06/24/88)
I am considering writing an interactive spell checker/corrector for Unix similar to that implemented in WordPerfect. I would like to try using Soundex for the spell corrector portion. Does any one know where I can get source code to any Soundex routines? -- Ron Holt UUCP: {uunet,caeco}!iconsys!ron Software Development Manager ARPANET: icon%byuadam.bitnet@cunyvm.cuny.edu Icon International, Inc. BITNET: icon%byuadam.bitnet Orem, Utah 84058 PHONE: (801) 225-6888
leo@philmds.UUCP (Leo de Wit) (06/29/88)
In article <250@iconsys.UUCP> ron@iconsys.UUCP (Ron Holt) writes: > >I am considering writing an interactive spell checker/corrector for >Unix similar to that implemented in WordPerfect. I would like to try >using Soundex for the spell corrector portion. Does any one know >where I can get source code to any Soundex routines? Soundex is in fact so easy you should write it yourself. Here's what I read in an old Pascal exercise book (in Dutch, so I translated for you): ---------------- S T A R T Q U O T A T I O N ------------------ All characters belong to a group, as follows (ignoring case) 0: a,e,i,o,u,h,w,y, <all non-alpha characters> 1: b,f,p,v 2: c,g,j,k,q,s,x,z 3: d,t 4: l 5: m,n 6: r 1) First replace each character from the string to be encoded by the group. So 'This is a testcase' becomes '300200200030232020'. 2) Then replace all repetitions by one occurence. So the example becomes '30202030232020'. 3) Finally remove '0''s. So the example becomes '32232322'. ---------------- E N D Q U O T A T I O N ------------------ Because there are 7 groups (with only 6 used) you can use a nibble (4 bits) to encode the group number. If your strcmp() does not ignore bit 7, you can use it for comparing encoded soundex strings, otherwise use memcmp(). Below I put an implementation that should work (haven't tested it). The class[] char array should contain a 0 on most places, only class['b'] = class['f'] = class['p'] = class['v'] = class['B'] = class['F'] = class['P'] = class['V'] = 1; etc. for the other non-0 classes. The encoded string is null-terminated to allow standard str... functions. static char class[] = { /* put correct (256) initializers here */ }; void soundex(src,dest) char *src, *dest; { char lastclass = 0, newclass; int even = 1; for ( ; *src != '\0'; src++) { if ((newclass = class[*src]) != lastclass) { lastclass = newclass; if (newclass != 0) { if (even) { *dest = newclass << 4; } else { *dest++ |= newclass; } even = !even; } } } if (!even) dest++; *dest = '\0'; } Hope this will satisfy your need --- success! Leo.
jackm@devvax.JPL.NASA.GOV (Jack Morrison) (07/01/88)
In article <538@philmds.UUCP> leo@philmds.UUCP (L.J.M. de Wit) writes: >In article <250@iconsys.UUCP> ron@iconsys.UUCP (Ron Holt) writes: >> >>I am considering writing an interactive spell checker/corrector for >>Unix similar to that implemented in WordPerfect. I would like to try >>using Soundex for the spell corrector portion. Does any one know [...] >Soundex is in fact so easy you should write it yourself. Here's what I >read in an old Pascal exercise book (in Dutch, so I translated for you): > [algorithm and implementation deleted...] I wonder, if you had space, whether a better version could be written using basic text-to-speech pattern matching. It would try, for example, to determine whether a 'c' meant an 'S' sound or a 'K' sound based on letter context. Build up the 'soundex+' string based on a larger set of classes roughly equivalent to phonemes. For example, see Ciarcia's speech synthesizer project from BYTE magazine about two years back. Just a thought... -- Jack C. Morrison Jet Propulsion Laboratory (818)354-1431 jackm@jpl-devvax.jpl.nasa.gov "The paycheck is part government property, but the opinions are all mine."