[comp.lang.c] Help w/2 things ...

ccel@community-chest.uucp (CCEL) (08/10/89)

I have two things that I need a little help on ...

The first is a certain algorithm that i'm looking for, it's called
fuzzy string-matching.  Apparently you give it two strings and a
percentage, and through various and sundry means it determines if
the two strings are that percentage alike.

A guy named Glenn Snow told me on CompuServe that the source was
available in 8086 Assembly on a BBS local to me, but I only got
through to it once in the past few weeks and didn't see the code
on there.

Does anyone know what i'm talking about, and does anyone have any
code like this?

The second is something that came up in a programming contest a while
back.  Write a program that will display an exact copy of its own 
source code.  I'm not too sure but I think it had to be written in
C or Pascal.  Anyone?

Thanks


--------------------------------------
Randy Tidd                   MITRE-McLean CCEL Lab
rtidd@mitre.arpa             ccel%community-chest@gateway.mitre.org

gwyn@smoke.BRL.MIL (Doug Gwyn) (08/12/89)

In article <63195@linus.UUCP> rtidd@mitre.arpa writes:
>Does anyone know what i'm talking about, ...

Seems doubtful :-)

> ... and does anyone have any code like this?

I can dream up implementable meanings for "fuzzy string matching",
but without knowing your application I'd hesitate to make specific
recommendations.  Our shell has a "cd spelling corrector" that
makes several passes searching for a matching subdirectory name in
the directories in the CDPATH, first for exact spelling, then for
a single omitted or extra character, then for transposition of two
adjacent characters, in each path component.  I could send you the
code for that algorithm if it fits your needs.

Cryptanalysts use various probability (or information) measures to
determine how well patterns match.  These are usually based on
extremely simple probability theory and are quite obvious.  If you
can formulate precisely what you wish to determine, and what
ancillary information (such as: assumed language) is available,
then lots of people should be able to provide the formula you seek.
(Ask a statistician..)

>The second is something that came up in a programming contest a while
>back.  Write a program that will display an exact copy of its own 
>source code.  I'm not too sure but I think it had to be written in
>C or Pascal.  Anyone?

main(){system("cat foo.c");}

walter@hpclwjm.HP.COM (Walter Murray) (08/15/89)

Randy Tidd writes:

> The first is a certain algorithm that i'm looking for, it's called
> fuzzy string-matching.  Apparently you give it two strings and a
> percentage, and through various and sundry means it determines if
> the two strings are that percentage alike.

> Does anyone know what i'm talking about, and does anyone have any
> code like this?

The Levenstein metric could probably be used quite nicely to provide
what you want.  I once worked on a name lookup system for a hospital.
The object was to provide a patient's chart number given a
name which was likely to be misspelled.  Given a name, we extracted
from the data base all names with the same Soundex code, then ranked
those names according to how similar they were to the requested name.

Levenstein's algorithm is rather simple.  It tells you how many
characters have to be deleted, added, or changed to transform one
character sequence to another.  You can assign different weighting
factors to the probabilities of a character being deleted, added,
or changed.

I can't provide code, unless you will take COBOL.

I can provide bibliographic references, but they would be rather dated.

I suspect someone else will be able to provide the exact code you want.
If not, contact me and I'll see what I can dig up.  I wrote a paper on
name searching for an Information Theory course I took a few years ago.

Walter Murray
walter@hpda.HP.COM
(408) 447-6129
--------------

jamespw@mqcomp.oz (Warlow) (08/15/89)

In article <63195@linus.UUCP> rtidd@mitre.arpa writes:
>...
>  Write a program that will display an exact copy of its own 
>source code.

I lifted the following from someone's signature a while back:

char*p="char*p=%c%s%c;main(){printf(p,34,p,34);}";main(){printf(p,34,p,34);}

(NB 34 == '"')

wiml@blake.acs.washington.edu (William Lewis) (08/16/89)

In article <63195@linus.UUCP> rtidd@mitre.arpa writes:
>The first is a certain algorithm that i'm looking for, it's called
>fuzzy string-matching.  Apparently you give it two strings and a
>percentage, and through various and sundry means it determines if
>the two strings are that percentage alike.
>
>A guy named Glenn Snow told me on CompuServe that the source was
>available in 8086 Assembly on a BBS local to me, but I only got
>through to it once in the past few weeks and didn't see the code
>on there.
>
>Does anyone know what i'm talking about, and does anyone have any
>code like this?

   What you're looking for is the article in the July 1988 issue of
Dr. Dobb's Journal, named "Pattern Matching by Gestalt". The function
is

   int simil(a,b) char *a, *b;

  and returns the percent similarity of the two strings passed to it.
The article includes a discussion of the algorithm and 80x8x assembly
source code. (Which, by the way, trashes the index registers; if your
compiler wants them preserved over function calls (perhaps as 
register variables -- both MWC and I think MSC want this) you'll have
to modify it slightly.


     --- phelliax
         "Heresy is the lifeblood of religions. It is faith that begets
          heretics. There are no heresies in a dead religion."