ccel@community-chest.uucp (CCEL) (08/10/89)
I have two things that I need a little help on ... The first is a certain algorithm that i'm looking for, it's called fuzzy string-matching. Apparently you give it two strings and a percentage, and through various and sundry means it determines if the two strings are that percentage alike. A guy named Glenn Snow told me on CompuServe that the source was available in 8086 Assembly on a BBS local to me, but I only got through to it once in the past few weeks and didn't see the code on there. Does anyone know what i'm talking about, and does anyone have any code like this? The second is something that came up in a programming contest a while back. Write a program that will display an exact copy of its own source code. I'm not too sure but I think it had to be written in C or Pascal. Anyone? Thanks -------------------------------------- Randy Tidd MITRE-McLean CCEL Lab rtidd@mitre.arpa ccel%community-chest@gateway.mitre.org
gwyn@smoke.BRL.MIL (Doug Gwyn) (08/12/89)
In article <63195@linus.UUCP> rtidd@mitre.arpa writes: >Does anyone know what i'm talking about, ... Seems doubtful :-) > ... and does anyone have any code like this? I can dream up implementable meanings for "fuzzy string matching", but without knowing your application I'd hesitate to make specific recommendations. Our shell has a "cd spelling corrector" that makes several passes searching for a matching subdirectory name in the directories in the CDPATH, first for exact spelling, then for a single omitted or extra character, then for transposition of two adjacent characters, in each path component. I could send you the code for that algorithm if it fits your needs. Cryptanalysts use various probability (or information) measures to determine how well patterns match. These are usually based on extremely simple probability theory and are quite obvious. If you can formulate precisely what you wish to determine, and what ancillary information (such as: assumed language) is available, then lots of people should be able to provide the formula you seek. (Ask a statistician..) >The second is something that came up in a programming contest a while >back. Write a program that will display an exact copy of its own >source code. I'm not too sure but I think it had to be written in >C or Pascal. Anyone? main(){system("cat foo.c");}
walter@hpclwjm.HP.COM (Walter Murray) (08/15/89)
Randy Tidd writes: > The first is a certain algorithm that i'm looking for, it's called > fuzzy string-matching. Apparently you give it two strings and a > percentage, and through various and sundry means it determines if > the two strings are that percentage alike. > Does anyone know what i'm talking about, and does anyone have any > code like this? The Levenstein metric could probably be used quite nicely to provide what you want. I once worked on a name lookup system for a hospital. The object was to provide a patient's chart number given a name which was likely to be misspelled. Given a name, we extracted from the data base all names with the same Soundex code, then ranked those names according to how similar they were to the requested name. Levenstein's algorithm is rather simple. It tells you how many characters have to be deleted, added, or changed to transform one character sequence to another. You can assign different weighting factors to the probabilities of a character being deleted, added, or changed. I can't provide code, unless you will take COBOL. I can provide bibliographic references, but they would be rather dated. I suspect someone else will be able to provide the exact code you want. If not, contact me and I'll see what I can dig up. I wrote a paper on name searching for an Information Theory course I took a few years ago. Walter Murray walter@hpda.HP.COM (408) 447-6129 --------------
jamespw@mqcomp.oz (Warlow) (08/15/89)
In article <63195@linus.UUCP> rtidd@mitre.arpa writes: >... > Write a program that will display an exact copy of its own >source code. I lifted the following from someone's signature a while back: char*p="char*p=%c%s%c;main(){printf(p,34,p,34);}";main(){printf(p,34,p,34);} (NB 34 == '"')
wiml@blake.acs.washington.edu (William Lewis) (08/16/89)
In article <63195@linus.UUCP> rtidd@mitre.arpa writes: >The first is a certain algorithm that i'm looking for, it's called >fuzzy string-matching. Apparently you give it two strings and a >percentage, and through various and sundry means it determines if >the two strings are that percentage alike. > >A guy named Glenn Snow told me on CompuServe that the source was >available in 8086 Assembly on a BBS local to me, but I only got >through to it once in the past few weeks and didn't see the code >on there. > >Does anyone know what i'm talking about, and does anyone have any >code like this? What you're looking for is the article in the July 1988 issue of Dr. Dobb's Journal, named "Pattern Matching by Gestalt". The function is int simil(a,b) char *a, *b; and returns the percent similarity of the two strings passed to it. The article includes a discussion of the algorithm and 80x8x assembly source code. (Which, by the way, trashes the index registers; if your compiler wants them preserved over function calls (perhaps as register variables -- both MWC and I think MSC want this) you'll have to modify it slightly. --- phelliax "Heresy is the lifeblood of religions. It is faith that begets heretics. There are no heresies in a dead religion."