hsu_wh@jhunix.HCF.JHU.EDU (William H Hsu) (07/18/90)
I am trying to implement some good partial string matching functions for use with strictly binary files (specifically, the resource forks of Macintosh files). I have encountered several compiler and language specific problems while trying to get two algorithms (one generic dynamic pattern matching scheme and one randomized version of the Boyer-Moore/BMG algorithm) to work in C. First, I'm looking for a way to parallelize my string matching. I have seen references to a concurrent version of Boyer-Moore and dynamic algorithms that run in O(kn) time for k- approximate matches in text length n (Landau and Vishkin, 18th Annual ACM Symposium), but have been unable to get my hands on real or pseudocode. Since I am looking for 10-100 short, random strings, it would boost my program efficiency greatly to be able to search for multiple patterns at once. Has anyone implemented a concurrent version of BMG or a dynamic partial-matching algorithm in C (without parallel hardware)? Second, I'm sure that random number generation is a universally known problem, but I have had problems using srand() in the right place to get truly "random" pattern strings. rand() is used to generate the starting location for the source string. But I frequently get consecutive duplicate strings (four or five "two in a row" strings per hundred). My randomization call is: srand((unsigned)time(NULL)); How can the "randomness" be improved? I have observed that moving the line deeper into my nested while/for loops causes a decrease in randomness. Please E-mail responses to: hsu_wh@jhunix.hcf.jhu.edu (BITNET), hsu@cs.jhu.edu (ARPAnet)