[comp.text] Wanted: info on fuzzy text matching

erf@progress.COM (Eric Feigenson) (02/19/91)

[I hope this is an appropriate group to post this to.  If not, I apologize
 and would appreciate pointers other more appropriate groups.  Also, apologies
 if this topic has already been beaten to death]

I am looking for any references to algorithms that compare two strings and
return an indication of how similar they are.  The context would be two strings
from a given language (any language will do, but English would do fine as a start).
What "similar" means is itself kind of fuzzy, but something like "Can't find file"
and "Cannot find file" would be pretty similar, "Unable to find file" is not
as similar and "File not found" is just a teeny bit similar.
I know nothing about this subject, so please excuse me if this makes no sense
(I'd be enlightened to know if there's a name to this sort of algorithm).
The algorithms need not be written in any particular computer language (an
English description would be fine).

Any and all references and pointers to information in this area will be
greeted with open arms.


Thanks in advance!

-EricF
--
Eric R. Feigenson			UUCP: mit-eddie!progress!erf
Progress Software Corp.		    Internet: erf@progress.com
5 Oak Park
Bedford, MA  01730