[comp.sys.amiga.programmer] Source code for search

plummer@hercules.cs.uregina.ca (Dave Plummer) (06/04/91)

Would anyone happen to have source code for a (very) fast search routine?
Basically, what I need is something like strstr(), but I need to be able
to modify it to make it non-case sensitive.  I'm basically looking through
a small (512-2048 bytes) buffer for an ASCII string.

Your code or directions greatly appreciated!

- Dave

Why does this only append my sig 1/6 times?  I know not why.

ryan@amix.commodore.com (Ryan Sheftel) (06/07/91)

plummer@hercules.cs.uregina.ca (Dave Plummer) writes:

> Would anyone happen to have source code for a (very) fast search routine?
> Basically, what I need is something like strstr(), but I need to be able
> to modify it to make it non-case sensitive.  I'm basically looking through
> a small (512-2048 bytes) buffer for an ASCII string.
> 
> Your code or directions greatly appreciated!
> 
> - Dave
> 
> Why does this only append my sig 1/6 times?  I know not why.

Well, the fastest search I know of for a sorted list is the binary 
search. But is the list is unsorted, I believe the only way to go about 
it is for (X = 0; X < MAX; X++)

----------
Ryan Sheftel

UUCP: uunet!cbmvax!amix!undrground!ryan
Internet: undrground!ryan@amix.commodore.com

UH2@psuvm.psu.edu (Lee Sailer) (06/08/91)

>> Would anyone happen to have source code for a (very) fast search routine?
>> Basically, what I need is something like strstr(), but I need to be able
>> to modify it to make it non-case sensitive.  I'm basically looking through
>> a small (512-2048 bytes) buffer for an ASCII string.
>>

If you mean what I think you mean, you probably want Boyer-Moore search.
There is probably source for it buried deep in a copy of egrep, or
somewhere, but I'm too lazy to dig it out.  You could look
in a good data structures book---I think Sedgewick covers it.

Another likely source would be a compiler book.  Look in the chapter that
covers token parsing.  There are techniques where you compile your
search string into a FSM, and then run it using the buffer as the tape.

mjl@alison.at (Martin J. Laubach) (06/08/91)

| Would anyone happen to have source code for a (very) fast search routine?
| Basically, what I need is something like strstr()

  You might have a look at:

	author		= {Ricardo A. Baeza-Yates},
	title			= {Algorithms for String Searching: A Survey},
	pages			= {34 - 58},
	journal		= {Sigir Forum},
	publisher	= {acm Press},
	number		= {3,4},
	volume		= 23,
	year			= 1989

  and follow the cross-references of that article.

	mjl

PS: Sorry about the strange format, that's how I have it online.

S_ASCHMIDT@iravcl.ira.uka.de (|S| Angela Schmidt) (06/10/91)

In <91158.180643UH2@psuvm.psu.edu> Lee writes:

> >> Would anyone happen to have source code for a (very) fast search routine?
> >> Basically, what I need is something like strstr(), but I need to be able
> >> to modify it to make it non-case sensitive.  I'm basically looking through
> >> a small (512-2048 bytes) buffer for an ASCII string.
> >>
> 
> If you mean what I think you mean, you probably want Boyer-Moore search.
> There is probably source for it buried deep in a copy of egrep, or
> somewhere, but I'm too lazy to dig it out.  You could look
> in a good data structures book---I think Sedgewick covers it.
> 
> Another likely source would be a compiler book.  Look in the chapter that
> covers token parsing.  There are techniques where you compile your
> search string into a FSM, and then run it using the buffer as the tape.

I've already implemented Boyer & Moore. It works very fast with long
searching-strings. Let me describe what Boyer & Moore does: "A" shall be
the string you are searching for, "B" the string you are searching in.

1) Create a field (here called h) with 256 elements, all set to FALSE.
2) for (i=strlen (A)-1; i>=0; i--) h[A[i]]=TRUE;
   So afterwards, if there appears the character  "A" in your
   searchingstring, the field 65 will be set to TRUE and so you'll get
   a table, which tells you wether there exists a special character or not.
   Set k to 0.
3) Now say, "A" is "a" characters long.
   Let c be B[a-1+k]

   ;;;;;;; (String A) => a=7
   ,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,... (B)
   ^     ^
   k     c=B[a-1+k]
   (0)   (6)

   Now look if the field hv[c] is FALSE. If it's FALSE, the string must
   not touch B[a-1+k], so let k=k+a, and continue with (3)
   else there might be any touch. Test this:
   for (i=0; i<a; i++) if (A[i]==B[i+k]) ...
   If the string does not match, let k=k+1 and continue with (3)

I hope there were no mistakes. Sorry, my routine is at home, otherwise I would
simply post it. Especially with long strings and a lot of different characters,
it works very fast. I use it in my diskmonitor DisKey. If the string is long
enough, the LED of my harddrive doesn't blink, it just burns!

Enjoy

Angela   

---------------------------------------------------------+---------------------
Angela Schmidt   Internet: S_ASchmidt@iravcl.ira.uka.de  |        //
(Nessy in IRC)   BITNET:   UK8B@DKAUNI2.BITNET           | Amiga //
                 Phone:    +49 731 712316 & 721 6904-263 |   \\ // only 1000
                           MEZ: (10am-9pm)  (12pm-10pm)  |    \X/ the real one!
---------------------------------------------------------+---------------------