pete@BONNIE.ICS.UCI.EDU (Peter O'Leary) (01/11/90)
I am attempting to find an efficient approximate string matching algorithm and so far have had a frustrating time. I have an paper by Landau and Vishkin from a 1986 ACM Proceedings that describes an O(kn) algorithm, however, the paper refers to other works that I have not been able to locate. Does anyone know of a good survey paper or textbook with a good treatment of this subject? Peter O'Leary.
johnk@cs.arizona.edu (John Kececioglu) (01/12/90)
In article <9001092236.aa13533@PARIS.ICS.UCI.EDU> Peter O'Leary <pete%BONNIE.ICS.UCI.EDU@VM1.NoDak.EDU> writes: > I am attempting to find an efficient approximate string matching > algorithm and so far have had a frustrating time. I have an paper > by Landau and Vishkin from a 1986 ACM Proceedings that describes an > O(kn) algorithm, however, the paper refers to other works that I have not > been able to locate. Does anyone know of a good survey paper or textbook > with a good treatment of this subject? There are myriad approximate string matching problems. The two survey papers I am familiar with are A. Apostolico and C. Guerra, "The longest common subsequence problem revisited," Algorithmica 2 (1987) 315-336. Z. Galil and R. Giancarlo, "Data structures and algorithms for approximate string matching," J. of Complexity 4 (1988) 33-72. The only book I know of is David Sankoff and Joseph Kruskal, editors, _Time Warps, String Edits, and Macromolecules: The Theory and Practice of Sequence Comparison_, Addison-Wesley, Reading Massachusetts, 1983. Eugene Myers has a new practical approximate string matching algorithm with sublinear expected time, though it is unpublished. If you are interested in the more restricted problem of finding the edit distance between two strings, I recommend the following not well-known paper Eugene Myers, "An O(ND) difference algorithm and its variations," Algorithmica 1 (1986) 251-266. The algorithm described is simple, and has O(D^2 + N) expected time. (Here D is the minimum number of single-character insertions, deletions, and substitutions to convert one string into the other.) An impractical variation has O(D^2 + N log N) worst-case time. For implementation issues, see Webb Miller and Eugene Myers, "A file comparison program," Software Practice and Experience 15:11 (November 1985) 1025-1040. Hope this helps. -- John Kececioglu johnk@cs.arizona.edu or arizona!johnk.uucp Department of Computer Science, University of Arizona, Tucson, Arizona 85721 "Through small apertures we glimpse abysses whose somber depths turn us faint."