[comp.lang.c] Recognizing abbreviated commands using `tries'

scott@med.ge.com (Scott Woskoff) (01/10/90)

I'm posting this for a friend, so please reply directly to him at:

	uunet!crdgw1!gemed!pete

I have a need to recognize one of a set of keywords, when given as
input string either the complete keyword, the shortest prefix of the
keyword which is unique in the set, or any string in between. The
actual application is a command recognizer which allows the user to
abbreviate the command names.

Example: given the keyword set {abel, absalom}, I should get
the following (input-result) pairs:

	input		result
	abs		absalom
	absal		absalom
	absalom		absalom
	abe		abel
	ab		error: ambiguous input
	box		error: no such keyword


The appropriate data structure seems to be a slightly modified trie.

Because the set of keywords never changes at runtime, but does usually
change with each release of the program, I want to build the trie at
compile-time, using C #defines, if possible. It's not clear to me,
however, that it is possible to do so.  A less attractive alternative
is to build the table at runtime by inserting the keywords one at a
time into the trie during initialization.

Can anyone offer any help? Any alternative solutions? Good
advice?

Peter Schiltz
CSE, W-824
GE Medical Systems
3200 N. Grandview
Waukesha WI 53188
uunet!crdgw1!gemed!pete

schmidt@crimee.ics.uci.edu (Doug Schmidt) (01/10/90)

In article <SCOTT.90Jan9153157@harpo.med.ge.com>, scott@med (Scott Woskoff) writes:
>I have a need to recognize one of a set of keywords, when given as
>input string either the complete keyword, the shortest prefix of the
>keyword which is unique in the set, or any string in between. The
>actual application is a command recognizer which allows the user to
>abbreviate the command names.
>
>Example: given the keyword set {abel, absalom}, I should get
>the following (input-result) pairs:
>
>	input		result
>	abs		absalom
>	absal		absalom
>	absalom		absalom
>	abe		abel
>	ab		error: ambiguous input
>	box		error: no such keyword
>
>
>The appropriate data structure seems to be a slightly modified trie.
>
>Because the set of keywords never changes at runtime, but does usually
>change with each release of the program, I want to build the trie at
>compile-time, using C #defines, if possible. It's not clear to me,
>however, that it is possible to do so.  A less attractive alternative
>is to build the table at runtime by inserting the keywords one at a
>time into the trie during initialization.
>
>Can anyone offer any help? Any alternative solutions? Good
>advice?

If you've got access to anonymous ftp you should check out the file
trie-gen-1.0.Z on ics.uci.edu (128.195.1.1).  This tar file implements
a minimal-prefix trie generator, which should do pretty much exactly
what you are looking for here.

The program is a beta-release, but there is sufficient documentation
to make it usable!

I solicit comments and suggestions for enhancements.

Doug
--
Douglas C. Schmidt

roy@prism.gatech.EDU (Roy Mongiovi) (01/10/90)

This doesn't use tries, but I did approximately the same thing in a
text editor I wrote many years ago.  I kept a sorted table of all the
keywords I wanted to recognize.  To search the table, I did a binary
search basically using "strncmp(keyword, table[i], strlen(keyword))".
If I didn't find a match, I knew the keyword wasn't in the table.
Otherwise I probed linearly backwards and forwards in the table
looking for other matches.  If there weren't any the keyword is a
unique abbreviation, otherwise it's ambiguous.  I also augmented
this to look for an exact match (strlen(keyword) == strlen(table[i]))
which allowed me to enter very short (i.e. single character) entries
in the table as a shorthand for certain frequently used commands.

It worked really well, it was relatively fast, and it's simple.
What more can you ask for?
-- 
Roy J. Mongiovi     Systems Support Specialist     Office of Computing Services
Georgia Institute of Technology	  Atlanta, Georgia  30332-0275   (404) 894-4660
	uucp: ...!{allegra,amd,hplabs,ut-ngp}!gatech!prism!roy
	ARPA: roy@prism.gatech.edu

bill@twwells.com (T. William Wells) (01/10/90)

In article <SCOTT.90Jan9153157@harpo.med.ge.com> <scott@gemed.med.ge.com> (Scott Woskoff) writes:
: Because the set of keywords never changes at runtime, but does usually
: change with each release of the program, I want to build the trie at
: compile-time, using C #defines, if possible. It's not clear to me,
: however, that it is possible to do so.

Write a program to convert your word list to source for a C data
structure.

---
Bill                    { uunet | novavax | ankh } !twwells!bill
bill@twwells.com

brnstnd@stealth.acf.nyu.edu (01/11/90)

Speaking of tries, I've found a method of storing tries in constant
space per node with the usual operations (searching down one level,
inserting a new node) running in constant time, independent of the
alphabet size. Has this been done before? I haven't found anything
in the usual journals.

---Dan