[comp.theory] String prefix matching

dewo1@cbnewsd.ATT.COM (douglas.e.whitten) (02/25/90)

I am looking for an efficient algorithm for longest-prefix matching.

My application can be described as follows:
    INIT:
	1. Given a large number of variable-length binary strings.
		Call them "keys".
	2. Build a search data structure: For instance, a trie.
    RUN:
	1. Given a string.  Call it a "word".
	2. Search the data structure to find the longest "key" that is
		a prefix (aka initial substring) of "word".

The data structure, once built, is only searched: There is no add or
delete required.  I am currently planning to completely rebuild the
data structure when changes are necessary.

The "keys" and "words" are binary strings, with byte values from 0x00
to 0xff.  They range in length from one byte to 40 bytes.

It is possible for no "key" to be a prefix, as well as a "key" to
completely match a "word".

Note that I need the longest match: Some "keys" will themselves be
prefixes of other "keys".

My implementation language will be C.  

I am interested (first) in fast search time, (second) in reasonable
memory requirements, and (third) in fast data-structure construction
time.

Any suggestions?



		Doug Whitten
		AT&T Bell Labs
		dewo1@cbnewsd.ATT.COM