plummer@hercules.cs.uregina.ca (Dave Plummer) (06/04/91)
Would anyone happen to have source code for a (very) fast search routine? Basically, what I need is something like strstr(), but I need to be able to modify it to make it non-case sensitive. I'm basically looking through a small (512-2048 bytes) buffer for an ASCII string. Your code or directions greatly appreciated! - Dave Why does this only append my sig 1/6 times? I know not why.
ryan@amix.commodore.com (Ryan Sheftel) (06/07/91)
plummer@hercules.cs.uregina.ca (Dave Plummer) writes: > Would anyone happen to have source code for a (very) fast search routine? > Basically, what I need is something like strstr(), but I need to be able > to modify it to make it non-case sensitive. I'm basically looking through > a small (512-2048 bytes) buffer for an ASCII string. > > Your code or directions greatly appreciated! > > - Dave > > Why does this only append my sig 1/6 times? I know not why. Well, the fastest search I know of for a sorted list is the binary search. But is the list is unsorted, I believe the only way to go about it is for (X = 0; X < MAX; X++) ---------- Ryan Sheftel UUCP: uunet!cbmvax!amix!undrground!ryan Internet: undrground!ryan@amix.commodore.com
UH2@psuvm.psu.edu (Lee Sailer) (06/08/91)
>> Would anyone happen to have source code for a (very) fast search routine? >> Basically, what I need is something like strstr(), but I need to be able >> to modify it to make it non-case sensitive. I'm basically looking through >> a small (512-2048 bytes) buffer for an ASCII string. >> If you mean what I think you mean, you probably want Boyer-Moore search. There is probably source for it buried deep in a copy of egrep, or somewhere, but I'm too lazy to dig it out. You could look in a good data structures book---I think Sedgewick covers it. Another likely source would be a compiler book. Look in the chapter that covers token parsing. There are techniques where you compile your search string into a FSM, and then run it using the buffer as the tape.
mjl@alison.at (Martin J. Laubach) (06/08/91)
| Would anyone happen to have source code for a (very) fast search routine? | Basically, what I need is something like strstr() You might have a look at: author = {Ricardo A. Baeza-Yates}, title = {Algorithms for String Searching: A Survey}, pages = {34 - 58}, journal = {Sigir Forum}, publisher = {acm Press}, number = {3,4}, volume = 23, year = 1989 and follow the cross-references of that article. mjl PS: Sorry about the strange format, that's how I have it online.
S_ASCHMIDT@iravcl.ira.uka.de (|S| Angela Schmidt) (06/10/91)
In <91158.180643UH2@psuvm.psu.edu> Lee writes: > >> Would anyone happen to have source code for a (very) fast search routine? > >> Basically, what I need is something like strstr(), but I need to be able > >> to modify it to make it non-case sensitive. I'm basically looking through > >> a small (512-2048 bytes) buffer for an ASCII string. > >> > > If you mean what I think you mean, you probably want Boyer-Moore search. > There is probably source for it buried deep in a copy of egrep, or > somewhere, but I'm too lazy to dig it out. You could look > in a good data structures book---I think Sedgewick covers it. > > Another likely source would be a compiler book. Look in the chapter that > covers token parsing. There are techniques where you compile your > search string into a FSM, and then run it using the buffer as the tape. I've already implemented Boyer & Moore. It works very fast with long searching-strings. Let me describe what Boyer & Moore does: "A" shall be the string you are searching for, "B" the string you are searching in. 1) Create a field (here called h) with 256 elements, all set to FALSE. 2) for (i=strlen (A)-1; i>=0; i--) h[A[i]]=TRUE; So afterwards, if there appears the character "A" in your searchingstring, the field 65 will be set to TRUE and so you'll get a table, which tells you wether there exists a special character or not. Set k to 0. 3) Now say, "A" is "a" characters long. Let c be B[a-1+k] ;;;;;;; (String A) => a=7 ,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,... (B) ^ ^ k c=B[a-1+k] (0) (6) Now look if the field hv[c] is FALSE. If it's FALSE, the string must not touch B[a-1+k], so let k=k+a, and continue with (3) else there might be any touch. Test this: for (i=0; i<a; i++) if (A[i]==B[i+k]) ... If the string does not match, let k=k+1 and continue with (3) I hope there were no mistakes. Sorry, my routine is at home, otherwise I would simply post it. Especially with long strings and a lot of different characters, it works very fast. I use it in my diskmonitor DisKey. If the string is long enough, the LED of my harddrive doesn't blink, it just burns! Enjoy Angela ---------------------------------------------------------+--------------------- Angela Schmidt Internet: S_ASchmidt@iravcl.ira.uka.de | // (Nessy in IRC) BITNET: UK8B@DKAUNI2.BITNET | Amiga // Phone: +49 731 712316 & 721 6904-263 | \\ // only 1000 MEZ: (10am-9pm) (12pm-10pm) | \X/ the real one! ---------------------------------------------------------+---------------------