dickow@ui3.UUCP (01/31/90)
>/ ui3:comp.lang.forth / ForthNet@willett.UUCP (ForthNet articles from GEnie) / 4:52 pm Jan 25, 1990 / >W.BADEN1 [Wil] at 18:55 PST >Forth enlightenment: The principal input method in Forth is not KEY or >EXPECT, but INTERPRET. Yeah, that's the ideal, but quite a few forth systems can not be distributed freely without disabling the interpreter, scrambling the dictionary names, etc. Often you have to write a pseudo-interpreter then to call your application words. Bob Dickow (...egg-id!ui3!dickow) (rdickow@groucho.mrc.uidaho.edu) (dickow@idui1.bitnet)
ForthNet@willett.pgh.pa.us (ForthNet articles from GEnie) (09/12/90)
Date: 09-09-90 (21:56) Number: 3743 (Echo) To: ALL Refer#: NONE From: ZAFAR ESSAK Read: (N/A) Subj: SOUNDEX Status: PUBLIC MESSAGE I have been experimenting with the utility SOUNDEX described by Ron Braithwaite in FD X/3 & 4 in 1988. I modified it slightly for use without a string stack and to be compatible with F-PC as follows: \ SOUNDEX.TXT Ron Braithwaite "Using A String Stack" FD X/3 p.15 (1988) (( The whole idea of SOUNDEX dates back to the 1894 U.S. census when they wanted to be able to find names that sounded alike. The algorithm for $SOUNDEX came from Guy Kelly. )) ONLY FORTH ALSO DEFINITIONS DECIMAL : C>SNDX ( ascii--char2) DUP 97 > IF 32 - THEN \ convert to uppercase 65 - 0 MAX 26 MIN ( ABCDEFGHIJKLMNOPQRSTUVWXYZ ) " 012301200224550126230102020" DROP + C@ ; CREATE sndx.buf ( --$adr) ," 0000" : >SOUNDEX ( adr1,n--$adr2) \ 0000 <= $adr2 <= Z999 0 sndx.buf C! sndx.buf 1+ 4 ASCII 0 FILL ?DUP IF OVER C@ DUP 97 > IF 32 - THEN \ convert to uppercase DUP sndx.buf 1+ C! \ store first character 1 sndx.buf C+! \ as start of $soundex C>SNDX -ROT \ earlier character's sndx BOUNDS 1+ ?DO I C@ C>SNDX \ old,new TUCK = OVER ASCII 0 = OR 0= IF DUP sndx.buf COUNT + C! 1 sndx.buf C+! THEN sndx.buf C@ 4 = ?LEAVE LOOP THEN DROP sndx.buf 4 OVER C! ; : $SOUNDEX ( $adr1--$adr2) \ 0000 <= $adr2 <= Z999 COUNT >SOUNDEX ; CR .( cr pad dup 20 expect cr span @ ) CR CR CR .( >SOUNDEX cr count type space ) CR ====================================================== Now I am wondering if anyone can tell me if I have inadvertantly introduced any errors in this translation? Assuming I have not I have taken the above code and applied it to 2,000 names from an existing database and have been examining the results. At the moment I am not sure exactly how this function can be useful. It does group names which at times seems close: e.g. SCHMIDT, SMITH, SMYTH are all S530 But other times names such as: ACTON, ASHDOWN, AUSTIN are grouped as A235. I have wondered if the ethnic origin of names might affect the weighting used in the definitions above. Any comments would be welcomed. Zafar. --- * Via Qwikmail 2.01 NET/Mail : British Columbia Forth Board - Burnaby BC - (604)434-5886 ----- This message came from GEnie via willett through a semi-automated process. Report problems to: uunet!willett!dwp or dwp@willett.pgh.pa.us
ForthNet@willett.pgh.pa.us (ForthNet articles from GEnie) (09/14/90)
Date: 09-11-90 (13:10) Number: 3758 (Echo) To: ZAFAR ESSAK Refer#: 3743 From: JACK BROWN Read: NO Subj: SOUNDEX Status: PUBLIC MESSAGE ZE>I have been experimenting with the utility SOUNDEX described by Ron ZE>Braithwaite in FD X/3 & 4 in 1988. I modified it slightly for use ZE>without a string stack and to be compatible with F-PC as follows: Ralph Dean had a Forth implementation of SOUNDEX in Dr Dobbs #50 You can get his complete implementation in the file BSTRING.SEQ that can be found in L6.ZIP of Jack Brown's F-PC 3.5 Tutorial. [ Lesson's 1 - 7 are on wsmr-simtel20.army.mil and wuarchive.wustl.edu. The file is called fpcl1-7.zip. -dwp ] Below is the last section of this file. You could use Ralph's implementation to check your own. You will need to get the file BSTRING.SEQ from L6.ZIP to compile the code below. \ Ralph Dean's FORTH implementation of SOUNDEX program that \ originally appeared in the May 1980 Byte Magazine. \ \ Executing SOUND will cause a prompt for the name. \ The name is terminated after 30 characters or <enter>. \ The soundex code is then computed and typed out. \ The string variable S$ conatains the code produced. \ For more information on Soundex codes see the original \ Byte article. FORTH DEFINITIONS DECIMAL 30 STRING N$ \ Input string whose soundex code is to be found. 4 STRING S$ \ Output string containing soundex code. 1 STRING K$ 1 STRING L$ : NAME ( -- ) \ Prompt for input of last name. CR ." Last Name? " N$ $IN ; : FIRST1 ( -- ) \ Move first character to S$ 1 N$ LEFT$ S$ S! ; : ITH ( n m -- k ) N$ MID$ DROP C@ 64 - ; : KTH ( k -- ) DUP " 01230120022455012623010202" MID$ K$ S! ; : BLS ( -- ) S$ K$ S+ S$ S! ; : TEST ( -- flag ) K$ L$ S= K$ " 0" S= OR 0= ; : IST ( n n flag ) DUP 1 < OVER 26 > OR 0= ; \ Compute soundex code : COMP ( -- ) N$ LEN 1+ 2 DO I I ITH IST IF KTH TEST IF BLS THEN ELSE DROP THEN K$ L$ S! LOOP ; \ This is the Program. BROWN , BRUN , BRAWN all give B650 : SOUNDEX ( -- ) NAME FIRST1 N$ LEN 2 > IF COMP THEN S$ " 0000" S+ S$ S! CR ." Soundex Code = " S$ TYPE CR ; --- * QDeLuxe 1.01 #260s Are you a member of FIG? Why not join today! NET/Mail : British Columbia Forth Board - Burnaby BC - (604)434-5886 ----- This message came from GEnie via willett through a semi-automated process. Report problems to: uunet!willett!dwp or dwp@willett.pgh.pa.us
ForthNet@willett.pgh.pa.us (ForthNet articles from GEnie) (09/14/90)
Date: 09-12-90 (00:46) Number: 3761 (Echo) To: ZAFAR ESSAK Refer#: 3743 From: KENNETH O'HESKIN Read: NO Subj: SOUNDEX Status: PUBLIC MESSAGE ZE>Assuming I have not I have taken the above code and applied it to 2,0 ZE>names from an existing database and have been examining the results. ZE>At the moment I am not sure exactly how this function can be useful. I havn't yet applied Soundex to any serious use, since its utility seems to contigent on two preconditions... (1) very large databases of proper nouns, and (2) unreliable methods of data entry, especially systems prone to misspellings due to operator error. Both conditions are more likely to occur in the corporate- governmental mainframe environments rather than on single-user microcomputers. Most of us probably have had the experience of our name being misspelled, say on a magazine label, and as this "sucker list" is sold to other databases, the error gets cloned and we start getting junk mail from all and sundry with the identical error. The data in that kind of environment may have been gathered over the phone, or taken from forms with little boxes far too small to print legibly in, and often the operator may be some underpaid drudge who has no motivation to do accurate work. Since an exact match may not yield a successful search, a Soundex type of pattern matching might get you in the ballpark. --- ~ EZ 1.26 ~ NET/Mail : British Columbia Forth Board - Burnaby BC - (604)434-5886 ----- This message came from GEnie via willett through a semi-automated process. Report problems to: uunet!willett!dwp or dwp@willett.pgh.pa.us
ForthNet@willett.pgh.pa.us (ForthNet articles from GEnie) (09/15/90)
Date: 09-13-90 (09:45) Number: 3768 (Echo) To: KENNETH O'HESKIN Refer#: 3761 From: STEVE PALINCSAR Read: NO Subj: SOUNDEX Status: PUBLIC MESSAGE If you ever come to the National Archives to do any genealogical research you get to use SOUNDEX when you use the indexes to the old Censuses. It's useful in consolidating the various attempts that were made at spelling foreign names. ----- This message came from GEnie via willett through a semi-automated process. Report problems to: uunet!willett!dwp or dwp@willett.pgh.pa.us
ForthNet@willett.pgh.pa.us (ForthNet articles from GEnie) (09/15/90)
Date: 09-13-90 (10:57) Number: 3769 (Echo) To: ZAFAR ESSAK Refer#: 3743 From: GENE LEFAVE Read: NO Subj: SOUNDEX Status: PUBLIC MESSAGE ZE>At the moment I am not sure exactly how this function can be useful. ZE>It does group names which at times seems close: ZE> e.g. SCHMIDT, SMITH, SMYTH are all S530 Although I don't pretend to be a SOUNDEX expert I have some experience using it. First, the state of Illinois uses it to generate driver license numbers. A license number is the SOUNDEX code for your last name, a first name code, ( I don't know where that comes from), and a coded birth date. I used to use SOUNDEX code to retrieve entries in a database. Using SOUNDEX made the program very tolerant of spelling errors. I seem to recall that certain database programs had this function built in. However, English has so many short words that I found that in many cases I was essentially searching on the first character. So I went to a string search. As to the basic algorithm, the idea is to use the first letter, then drop all vowels, then group the remaining consonants into 6 sound alike classes. These classes are English specific, not necessarily ethnic. adjacent duplicates are dropped. SCHMIDT = S530 because S first character. C dropped because its same class as S and adjacent. H always dropped M class 5 I dropped vowel D class 3 T dropped, adjacent class 3 You can easily work out the other names. Its useful for names because most last names are long enough to generate a meaningful code. Assuming a list of 1,000,000 names SOUNDEX hashes to 5616 codes, for 180 average collisions, which would not be difficult to resolve with a first name and birthdate, or some other type of qualifier. You have to remember that it was originally set up for manual searching. --- ~ EZ-Reader 1.13 ~ ----- This message came from GEnie via willett through a semi-automated process. Report problems to: uunet!willett!dwp or dwp@willett.pgh.pa.us
ForthNet@willett.pgh.pa.us (ForthNet articles from GEnie) (09/18/90)
Date: 09-15-90 (12:46) Number: 3786 (Echo) To: STEVE PALINCSAR Refer#: 3768 From: KENNETH O'HESKIN Read: NO Subj: SOUNDEX Status: PUBLIC MESSAGE SP>research you get to use SOUNDEX when you use the indexes to the old SP>Censuses. It's useful in consolidating the various attempts that wer SP>made at spelling foreign names. Agreed, SOUNDEX is a powerful and impressive tool for uses such as this. But like Zafar, when implementing databases I _wanted_ to find a legitimate application for it but couldn't. In a small operation where 3 or 4 people have a variety of tasks to do on a computer (ie: arn't bored to death doing mindless data entry for 8 hours every working day), and the databases arn't likely to be larger than a few thousand records anyway, there is not much need for this kind of tool. Errors are less likely to occur, and are usually caught and corrected by someone else. --- ~ EZ 1.26 ~ ----- This message came from GEnie via willett through a semi-automated process. Report problems to: uunet!willett!dwp or dwp@willett.pgh.pa.us
rueter_a@wums2.wustl.edu (09/24/90)
> Agreed, SOUNDEX is a powerful and impressive tool for uses > such as this. But like Zafar, when implementing databases I > _wanted_ to find a legitimate application for it but couldn't. > > In a small operation where 3 or 4 people have a variety of > tasks to do on a computer (ie: arn't bored to death doing > mindless data entry for 8 hours every working day), and the > databases arn't likely to be larger than a few thousand records > anyway, there is not much need for this kind of tool. Errors > are less likely to occur, and are usually caught and corrected > by someone else. I agree, but SOUNDEX with the birthdate ( in month/day/year order) of a person is very powerful. We keep track of 5 years worth of patients (400,000) and have a miss match rate of 1/50 verses 1/10 for other group at the medical school clinics. Allen Rueter Mallinckrodt Insitute of Radiology allen@cisco!wugate.wustl.edu <- I think, Cisco is decnet, wugate is a unix rtr
ForthNet@willett.pgh.pa.us (ForthNet articles from GEnie) (10/15/90)
Date: 10-11-90 (11:10) Number: 8 of 10 To: ZAFAR ESSAK Refer#: NONE From: GENE LEFAVE Read: NO Subj: SOUNDEX Status: PUBLIC MESSAGE Conf: FORTH (58) Read Type: GENERAL (+) I think the proper approach depends on the size of your list. On our systems we rarely have more then 3000 names. I'm using a straight string match on an alpha list. I actually use the built in block editor from polyFORTH. Then convert the located strings back into record numbers. I've never had a user complaint this way, and they can find names usually with just three characters of the name. When I was using soundex I was always answering questions about why a totally unrelated name would come up. And they always had to get the first letter right. Using the string search also lets them search on first names if it's unusual. I would only recommend soundex if your database is very large (>20,000) and then I would just display the hits. It's pretty unlikely that a name with a close spelling does not hit. Another experiment I tried involved hashing but the results were so weird I'm still trying to think up a use for it. --- ~ EZ-Reader 1.13 ~ NET/Mail : East Coast Forth Board -- McLean, VA -- 703-442-8695 PCRelay:DCINFO -> #16 MetroLink (tm) International Network 4.10 DC Info Exchange MetroLink International Hub ----- This message came from GEnie via willett through a semi-automated process. Report problems to: dwp@willett.pgh.pa.us or uunet!willett!dwp
ForthNet@willett.pgh.pa.us (ForthNet articles from GEnie) (10/16/90)
Date: 10-07-90 (23:29) Number: 3996 (Echo) To: GENE LEFAVE Refer#: 3769 From: ZAFAR ESSAK Read: 10-11-90 (11:14) Subj: SOUNDEX Status: PUBLIC MESSAGE Gene thanks for the synopsis of the SOUNDEX algorithm and your comments. GL> I used to use SOUNDEX code to retrieve entries in a database. GL> Using SOUNDEX made the program very tolerant of spelling errors. GL> I seem to recall that certain database programs had this function GL> built in. This is exactly what I had in mind. i.e. Using the SOUNDEX code as the index into the list of names, and then displaying appropriate names for selection by the User in an alphabetic listing. Hoping that this would be tolerant of spelling errors. However, with the experimental run I have done and the range of names that may have the same codes I am wondering what to do? For example: Code B200 includes: BAUGH, BEGGS, BOSSE, BOYCE, BUKSH Code B520 includes: BAINS, BANJI, BINGA Code K400 includes: KALE, KELLY, KUHL, KYLE Now if every name in the range is displayed, little if anything is gained. Should the selection only display the other names with matching codes, Ignoring others with spelling that may be alphabetically adjacent? Or ... I have experience with using a straight string comparison but would like to find something that might provide additional tolerance. Thanks for your comments. Zafar. --- * Via Qwikmail 2.01 NET/Mail : British Columbia Forth Board - Burnaby BC - (604)434-5886 ----- This message came from GEnie via willett through a semi-automated process. Report problems to: dwp@willett.pgh.pa.us or uunet!willett!dwp
ForthNet@willett.pgh.pa.us (ForthNet articles from GEnie) (10/16/90)
Date: 10-07-90 (23:29) Number: 3998 (Echo) To: KENNETH O'HESKIN Refer#: 3786 From: ZAFAR ESSAK Read: NO Subj: SOUNDEX Status: PUBLIC MESSAGE KO'H> In a small operation where 3 or 4 people have a variety of KO'H> tasks to do on a computer (ie: arn't bored to death doing KO'H> mindless data entry for 8 hours every working day), and the KO'H> databases arn't likely to be larger than a few thousand records KO'H> anyway, there is not much need for this kind of tool. Errors KO'H> are less likely to occur, and are usually caught and corrected KO'H> by someone else. Sorry, Kenneth I cannot agree with you. Life's not like that. It's real easy to type a name incorrectly, maybe only once in a day if you're real good. So, is it possible for the machine to be forgiving and lend a hand? That is the issue I would like to address with a routine like SOUNDEX. --- * Via Qwikmail 2.01 NET/Mail : British Columbia Forth Board - Burnaby BC - (604)434-5886 ----- This message came from GEnie via willett through a semi-automated process. Report problems to: dwp@willett.pgh.pa.us or uunet!willett!dwp
ForthNet@willett.pgh.pa.us (ForthNet articles from GEnie) (10/21/90)
Date: 10-11-90 (12:10) Number: 4034 (Echo) To: ZAFAR ESSAK Refer#: 3996 From: GENE LEFAVE Read: NO Subj: SOUNDEX Status: PUBLIC MESSAGE I think the proper approach depends on the size of your list. On our systems we rarely have more then 3000 names. I'm using a straight string match on an alpha list. I actually use the built in block editor from polyFORTH. Then convert the located strings back into record numbers. I've never had a user complaint this way, and they can find names usually with just three characters of the name. When I was using soundex I was always answering questions about why a totally unrelated name would come up. And they always had to get the first letter right. Using the string search also lets them search on first names if it's unusual. I would only recommend soundex if your database is very large (>20,000) and then I would just display the hits. It's pretty unlikely that a name with a close spelling does not hit. Another experiment I tried involved hashing but the results were so weird I'm still trying to think up a use for it. --- ~ EZ-Reader 1.13 ~ ----- This message came from GEnie via willett through a semi-automated process. Report problems to: dwp@willett.pgh.pa.us or uunet!willett!dwp