[comp.lang.forth] other forth applications

dickow@ui3.UUCP (01/31/90)

>/ ui3:comp.lang.forth / ForthNet@willett.UUCP (ForthNet articles from GEnie) /  4:52 pm  Jan 25, 1990 /
>W.BADEN1 [Wil]               at 18:55 PST
>Forth enlightenment:  The  principal input method in Forth is not KEY or
>EXPECT, but  INTERPRET.  

   Yeah, that's the ideal, but quite a few forth systems can not be
distributed freely without disabling the interpreter, scrambling the
dictionary names, etc. Often you have to write a pseudo-interpreter then
to call your application words.

Bob Dickow (...egg-id!ui3!dickow)
           (rdickow@groucho.mrc.uidaho.edu)
           (dickow@idui1.bitnet)

ForthNet@willett.pgh.pa.us (ForthNet articles from GEnie) (09/12/90)

 Date: 09-09-90 (21:56)              Number: 3743 (Echo)
   To: ALL                           Refer#: NONE
 From: ZAFAR ESSAK                     Read: (N/A)
 Subj: SOUNDEX                       Status: PUBLIC MESSAGE

 I have been experimenting with the utility SOUNDEX described by Ron 
 Braithwaite in FD X/3 & 4 in 1988.  I modified it slightly for use 
 without a string stack and to be compatible with F-PC as follows: 

 \ SOUNDEX.TXT  Ron Braithwaite "Using A String Stack" FD X/3 p.15 
 (1988) 

 (( 
 The whole idea of SOUNDEX dates back to the 1894 U.S. census when they 
 wanted to be able to find names that sounded alike.  The algorithm for 
 $SOUNDEX came from Guy Kelly. 

 )) 

 ONLY FORTH ALSO DEFINITIONS 

 DECIMAL 

 : C>SNDX ( ascii--char2) 
     DUP 97 > IF 32 - THEN               \ convert to uppercase 
     65 - 0 MAX 26 MIN 
     ( ABCDEFGHIJKLMNOPQRSTUVWXYZ ) 
     " 012301200224550126230102020" DROP + C@ ; 

 CREATE sndx.buf ( --$adr) ," 0000" 

 : >SOUNDEX ( adr1,n--$adr2)              \ 0000 <= $adr2 <= Z999 
     0 sndx.buf C!   sndx.buf 1+ 4 ASCII 0 FILL 
     ?DUP 
         IF  OVER C@ 
             DUP 97 > IF 32 - THEN       \ convert to uppercase 
             DUP sndx.buf 1+ C!          \ store first character 
                 1 sndx.buf C+!          \ as start of $soundex 
             C>SNDX -ROT                 \ earlier character's sndx 
             BOUNDS 1+ 
             ?DO I C@ C>SNDX             \ old,new 
                 TUCK = 
                 OVER ASCII 0 = OR 0= 
                     IF DUP sndx.buf COUNT + C! 1 sndx.buf C+! 
                     THEN sndx.buf C@ 4 = ?LEAVE 
             LOOP 
         THEN DROP 
     sndx.buf 4 OVER C! ; 

 : $SOUNDEX ( $adr1--$adr2)              \ 0000 <= $adr2 <= Z999 
     COUNT >SOUNDEX ; 


 CR .( cr pad dup 20 expect cr span @ ) CR CR 
 CR .( >SOUNDEX cr count type space ) CR 

 ====================================================== 

 Now I am wondering if anyone can tell me if I have inadvertantly 
 introduced any errors in this translation? 

 Assuming I have not I have taken the above code and applied it to 2,000 
 names from an existing database and have been examining the results. 
 At the moment I am not sure exactly how this function can be useful. 
 It does group names which at times seems close: 
         e.g. SCHMIDT, SMITH, SMYTH are all S530 
 But other times names such as: 
         ACTON, ASHDOWN, AUSTIN are grouped as A235. 

 I have wondered if the ethnic origin of names might affect the 
 weighting used in the definitions above.  Any comments would be 
 welcomed. 

 Zafar. 
 ---
  * Via Qwikmail 2.01

 NET/Mail : British Columbia Forth Board - Burnaby BC - (604)434-5886   
-----
This message came from GEnie via willett through a semi-automated process.
Report problems to: uunet!willett!dwp or dwp@willett.pgh.pa.us

ForthNet@willett.pgh.pa.us (ForthNet articles from GEnie) (09/14/90)

 Date: 09-11-90 (13:10)              Number: 3758 (Echo)
   To: ZAFAR ESSAK                   Refer#: 3743
 From: JACK BROWN                      Read: NO
 Subj: SOUNDEX                       Status: PUBLIC MESSAGE

 ZE>I have been experimenting with the utility SOUNDEX described by Ron
 ZE>Braithwaite in FD X/3 & 4 in 1988.  I modified it slightly for use 
 ZE>without a string stack and to be compatible with F-PC as follows: 

 Ralph Dean had a Forth implementation of SOUNDEX in Dr Dobbs #50
 You can get his complete implementation in the file BSTRING.SEQ that
 can be found in L6.ZIP  of Jack Brown's F-PC 3.5 Tutorial.

[ Lesson's 1 - 7 are on wsmr-simtel20.army.mil and wuarchive.wustl.edu.
  The file is called fpcl1-7.zip.  -dwp ]

 Below is the last section of this file.   You could use Ralph's
 implementation to check your own.  You will need to get the
 file BSTRING.SEQ from L6.ZIP to compile the code below.

 \  Ralph Dean's FORTH implementation of SOUNDEX program that
 \  originally  appeared in the May 1980 Byte Magazine.
 \
 \  Executing SOUND will cause a prompt for the name.
 \  The name is terminated after 30 characters or <enter>.
 \  The soundex code is then computed and typed out.
 \  The string variable S$ conatains the code produced.
 \  For more information on Soundex codes see the original
 \  Byte article.

 FORTH DEFINITIONS DECIMAL
 30 STRING N$   \ Input string whose soundex code is to be found.
  4 STRING S$   \ Output string containing soundex code.
  1 STRING K$   1 STRING L$

 : NAME ( --  )  \ Prompt for input of last name.
         CR ." Last Name? "  N$  $IN ;

 : FIRST1 ( -- ) \ Move first character to S$
         1 N$ LEFT$ S$ S! ;

 : ITH  ( n m  --  k )
         N$  MID$ DROP C@ 64 - ;

 : KTH ( k -- )
         DUP " 01230120022455012623010202"
         MID$ K$ S! ;

 : BLS ( -- )
         S$ K$ S+ S$ S! ;

 : TEST ( -- flag )
         K$ L$ S= K$ " 0" S= OR 0= ;

 : IST  ( n   n flag )
         DUP 1 < OVER 26 > OR 0= ;

 \ Compute soundex code
 : COMP ( -- )
         N$ LEN 1+ 2
         DO I I ITH IST
            IF   KTH TEST IF BLS THEN
            ELSE DROP
            THEN
         K$ L$ S!
         LOOP ;

 \ This is the Program.   BROWN , BRUN , BRAWN  all give B650
 : SOUNDEX ( -- )
         NAME FIRST1 N$ LEN 2 >
         IF COMP THEN S$ " 0000" S+ S$ S!
         CR ." Soundex Code =  " S$ TYPE CR ;
 ---
  * QDeLuxe 1.01 #260s  Are you a member of FIG? Why not join today!

 NET/Mail : British Columbia Forth Board - Burnaby BC - (604)434-5886   
-----
This message came from GEnie via willett through a semi-automated process.
Report problems to: uunet!willett!dwp or dwp@willett.pgh.pa.us

ForthNet@willett.pgh.pa.us (ForthNet articles from GEnie) (09/14/90)

 Date: 09-12-90 (00:46)              Number: 3761 (Echo)
   To: ZAFAR ESSAK                   Refer#: 3743
 From: KENNETH O'HESKIN                Read: NO
 Subj: SOUNDEX                       Status: PUBLIC MESSAGE

 ZE>Assuming I have not I have taken the above code and applied it to 2,0
 ZE>names from an existing database and have been examining the results.
 ZE>At the moment I am not sure exactly how this function can be useful.

         I havn't yet applied Soundex to any serious use, since its
         utility seems to contigent on two preconditions...
         (1) very large databases of proper nouns, and
         (2) unreliable methods of data entry, especially systems
             prone to misspellings due to operator error.

         Both conditions are more likely to occur in the corporate-
         governmental mainframe environments rather than on single-user
         microcomputers. Most of us probably have had the experience
         of our name being misspelled, say on a magazine label, and
         as this "sucker list" is sold to other databases, the error
         gets cloned and we start getting junk mail from all and sundry
         with the identical error.

         The data in that kind of environment may have been gathered
         over the phone, or taken from forms with little boxes far too
         small to print legibly in, and often the operator may be some
         underpaid drudge who has no motivation to do accurate work.
         Since an exact match may not yield a successful search, a
         Soundex type of pattern matching might get you in the ballpark.
 ---
  ~ EZ 1.26 ~ 

 NET/Mail : British Columbia Forth Board - Burnaby BC - (604)434-5886   
-----
This message came from GEnie via willett through a semi-automated process.
Report problems to: uunet!willett!dwp or dwp@willett.pgh.pa.us

ForthNet@willett.pgh.pa.us (ForthNet articles from GEnie) (09/15/90)

 Date: 09-13-90 (09:45)              Number: 3768 (Echo)
   To: KENNETH O'HESKIN              Refer#: 3761
 From: STEVE PALINCSAR                 Read: NO
 Subj: SOUNDEX                       Status: PUBLIC MESSAGE

 If you ever come to the National Archives to do any genealogical 
 research you get to use SOUNDEX when you use the indexes to the old 
 Censuses.  It's useful in consolidating the various attempts that were 
 made at spelling foreign names.
-----
This message came from GEnie via willett through a semi-automated process.
Report problems to: uunet!willett!dwp or dwp@willett.pgh.pa.us

ForthNet@willett.pgh.pa.us (ForthNet articles from GEnie) (09/15/90)

 Date: 09-13-90 (10:57)              Number: 3769 (Echo)
   To: ZAFAR ESSAK                   Refer#: 3743
 From: GENE LEFAVE                     Read: NO
 Subj: SOUNDEX                       Status: PUBLIC MESSAGE

 ZE>At the moment I am not sure exactly how this function can be useful. 
 ZE>It does group names which at times seems close: 
 ZE>        e.g. SCHMIDT, SMITH, SMYTH are all S530 

 Although I don't pretend to be a SOUNDEX expert I have some experience
 using it.  First, the state of Illinois uses it to generate driver
 license numbers.  A license number is the SOUNDEX code for your last
 name, a first name code, ( I don't know where that comes from), and a
 coded birth date. 

 I used to use SOUNDEX code to retrieve entries in a database.  Using
 SOUNDEX made the program very tolerant of spelling errors.  I seem
 to recall that certain database programs had this function built in.
 However, English has so many short words that I found that in many cases
 I was essentially searching on the first character.   So I went to
 a string search.

 As to the basic algorithm, the idea is to use the first letter, then
 drop all vowels, then group the remaining consonants into 6 sound alike
 classes.  These classes are English specific, not necessarily ethnic.
 adjacent duplicates are dropped. 
        SCHMIDT  =  S530      because
              S    first character.
              C    dropped because its same class as S and adjacent.
              H    always dropped
              M    class  5
              I    dropped vowel
              D    class 3
              T    dropped, adjacent class 3

 You can easily work out the other names.   Its useful for names because
 most last names are long enough to generate a meaningful code.  Assuming
 a list of 1,000,000 names SOUNDEX hashes to 5616 codes, for 180 average
 collisions, which would not be difficult to resolve with a first name
 and birthdate, or some other type of qualifier.  You have to remember 
 that it was originally set up for manual searching.

 ---
  ~ EZ-Reader 1.13 ~ 
-----
This message came from GEnie via willett through a semi-automated process.
Report problems to: uunet!willett!dwp or dwp@willett.pgh.pa.us

ForthNet@willett.pgh.pa.us (ForthNet articles from GEnie) (09/18/90)

 Date: 09-15-90 (12:46)              Number: 3786 (Echo)
   To: STEVE PALINCSAR               Refer#: 3768
 From: KENNETH O'HESKIN                Read: NO
 Subj: SOUNDEX                       Status: PUBLIC MESSAGE

 SP>research you get to use SOUNDEX when you use the indexes to the old
 SP>Censuses.  It's useful in consolidating the various attempts that wer
 SP>made at spelling foreign names.

         Agreed, SOUNDEX is a powerful and impressive tool for uses
         such as this. But like Zafar, when implementing databases I
         _wanted_ to find a legitimate application for it but couldn't.

         In a small operation where 3 or 4 people have a variety of
         tasks to do on a computer (ie: arn't bored to death doing
         mindless data entry for 8 hours every working day), and the
         databases arn't likely to be larger than a few thousand records
         anyway, there is not much need for this kind of tool. Errors
         are less likely to occur, and are usually caught and corrected
         by someone else.
 ---
  ~ EZ 1.26 ~ 

-----
This message came from GEnie via willett through a semi-automated process.
Report problems to: uunet!willett!dwp or dwp@willett.pgh.pa.us

rueter_a@wums2.wustl.edu (09/24/90)

>          Agreed, SOUNDEX is a powerful and impressive tool for uses
>          such as this. But like Zafar, when implementing databases I
>          _wanted_ to find a legitimate application for it but couldn't.
> 
>          In a small operation where 3 or 4 people have a variety of
>          tasks to do on a computer (ie: arn't bored to death doing
>          mindless data entry for 8 hours every working day), and the
>          databases arn't likely to be larger than a few thousand records
>          anyway, there is not much need for this kind of tool. Errors
>          are less likely to occur, and are usually caught and corrected
>          by someone else.

I agree, but SOUNDEX with the birthdate ( in month/day/year order) of a person
is very powerful. We keep track of 5 years worth of patients (400,000) and have
a miss match rate of 1/50 verses 1/10 for other group at the medical school 
clinics.

Allen Rueter
Mallinckrodt Insitute of Radiology

allen@cisco!wugate.wustl.edu <- I think, Cisco is decnet, wugate is a unix rtr

ForthNet@willett.pgh.pa.us (ForthNet articles from GEnie) (10/15/90)

 Date: 10-11-90 (11:10)              Number: 8 of 10
   To: ZAFAR ESSAK                   Refer#: NONE
 From: GENE LEFAVE                     Read: NO
 Subj: SOUNDEX                       Status: PUBLIC MESSAGE
 Conf: FORTH (58)                 Read Type: GENERAL (+)

 I think the proper approach depends on the size of your list.  On our
 systems we rarely have more then 3000 names.   I'm using a straight
 string match on an alpha list.  I actually use the built in block
 editor from polyFORTH.  Then convert the located strings back into
 record numbers.   I've never had a user complaint this way, and they
 can find names usually with just three characters of the name.

 When I was using soundex I was always answering questions about why
 a totally unrelated name would come up.  And they always had to get
 the first letter right.

 Using the string search also lets them search on first names if it's
 unusual.

 I would only recommend soundex if your database is very large (>20,000)
 and then I would just display the hits.   It's pretty unlikely that
 a name with a close spelling does not hit.

 Another experiment I tried involved hashing but the results were so
 weird I'm still trying to think up a use for it.
 ---
  ~ EZ-Reader 1.13 ~

 NET/Mail : East Coast Forth Board -- McLean, VA -- 703-442-8695

 PCRelay:DCINFO -> #16 MetroLink (tm) International Network
 4.10              DC Info Exchange MetroLink International Hub
-----
This message came from GEnie via willett through a semi-automated process.
Report problems to: dwp@willett.pgh.pa.us or uunet!willett!dwp

ForthNet@willett.pgh.pa.us (ForthNet articles from GEnie) (10/16/90)

 Date: 10-07-90 (23:29)              Number: 3996 (Echo)
   To: GENE LEFAVE                   Refer#: 3769
 From: ZAFAR ESSAK                     Read: 10-11-90 (11:14)
 Subj: SOUNDEX                       Status: PUBLIC MESSAGE

 Gene thanks for the synopsis of the SOUNDEX algorithm and your 
 comments. 
 GL> I used to use SOUNDEX code to retrieve entries in a database. 
 GL> Using SOUNDEX made the program very tolerant of spelling errors. 
 GL> I seem to recall that certain database programs had this function 
 GL> built in. 

 This is exactly what I had in mind.  i.e. Using the SOUNDEX code as the 
 index into the list of names, and then displaying appropriate names for 
 selection by the User in an alphabetic listing.  Hoping that this 
 would be tolerant of spelling errors.  However, with the experimental 
 run I have done and the range of names that may have the same codes I 
 am wondering what to do? 

 For example: 
         Code B200 includes: BAUGH, BEGGS, BOSSE, BOYCE, BUKSH 
         Code B520 includes: BAINS, BANJI, BINGA 
         Code K400 includes: KALE, KELLY, KUHL, KYLE 

 Now if every name in the range is displayed, little if anything is 
 gained.  Should the selection only display the other names with 
 matching codes, Ignoring others with spelling that may be 
 alphabetically adjacent?  Or ... 

 I have experience with using a straight string comparison but would 
 like to find something that might provide additional tolerance. 

 Thanks for your comments.  Zafar. 
 ---
  * Via Qwikmail 2.01

 NET/Mail : British Columbia Forth Board - Burnaby BC - (604)434-5886   
-----
This message came from GEnie via willett through a semi-automated process.
Report problems to: dwp@willett.pgh.pa.us or uunet!willett!dwp

ForthNet@willett.pgh.pa.us (ForthNet articles from GEnie) (10/16/90)

 Date: 10-07-90 (23:29)              Number: 3998 (Echo)
   To: KENNETH O'HESKIN              Refer#: 3786
 From: ZAFAR ESSAK                     Read: NO
 Subj: SOUNDEX                       Status: PUBLIC MESSAGE

 KO'H>   In a small operation where 3 or 4 people have a variety of 
 KO'H>   tasks to do on a computer (ie: arn't bored to death doing 
 KO'H>   mindless data entry for 8 hours every working day), and the 
 KO'H>   databases arn't likely to be larger than a few thousand records 
 KO'H>   anyway, there is not much need for this kind of tool. Errors 
 KO'H>   are less likely to occur, and are usually caught and corrected 
 KO'H>  by someone else. 

 Sorry, Kenneth I cannot agree with you.  Life's not like that.  It's 
 real easy to type a name incorrectly, maybe only once in a day if 
 you're real good.  So, is it possible for the machine to be forgiving 
 and lend a hand?  That is the issue I would like to address with a 
 routine like SOUNDEX. 
 ---
  * Via Qwikmail 2.01

 NET/Mail : British Columbia Forth Board - Burnaby BC - (604)434-5886   
-----
This message came from GEnie via willett through a semi-automated process.
Report problems to: dwp@willett.pgh.pa.us or uunet!willett!dwp

ForthNet@willett.pgh.pa.us (ForthNet articles from GEnie) (10/21/90)

 Date: 10-11-90 (12:10)              Number: 4034 (Echo)
   To: ZAFAR ESSAK                   Refer#: 3996
 From: GENE LEFAVE                     Read: NO
 Subj: SOUNDEX                       Status: PUBLIC MESSAGE

 I think the proper approach depends on the size of your list.  On our
 systems we rarely have more then 3000 names.   I'm using a straight
 string match on an alpha list.  I actually use the built in block
 editor from polyFORTH.  Then convert the located strings back into
 record numbers.   I've never had a user complaint this way, and they
 can find names usually with just three characters of the name.

 When I was using soundex I was always answering questions about why
 a totally unrelated name would come up.  And they always had to get
 the first letter right.

 Using the string search also lets them search on first names if it's
 unusual.

 I would only recommend soundex if your database is very large (>20,000)
 and then I would just display the hits.   It's pretty unlikely that
 a name with a close spelling does not hit.

 Another experiment I tried involved hashing but the results were so
 weird I'm still trying to think up a use for it.
 ---
  ~ EZ-Reader 1.13 ~ 
-----
This message came from GEnie via willett through a semi-automated process.
Report problems to: dwp@willett.pgh.pa.us or uunet!willett!dwp