[comp.lang.icon] partial input completion utility

goer@quads.uchicago.edu (Richard L. Goerwitz) (04/11/91)

I just wrote this as part of a larger project, and I believe it would
be of general interest.  To anyone who tries it out:  Please let me
know if you find problems or make modifications that significantly
affect performance (ahem - for the *better*).

-Richard


############################################################################
#
#	Name:	 complete.icn
#
#	Title:	 complete partial input string
#
#	Author:	 Richard L. Goerwitz
#
#	Version: 1.2
#
############################################################################
#
#  This file contains a single procedure, complete(s,st), which
#  completes a string (s) relative to a set or list of strings (st).
#  Put differently, complete() lets you supply a partial string, s,
#  and get back those strings in st that s is either equal to or a
#  substring of.
#
#  Lots of command interfaces allow completion of partial input.
#  Complete() simply represents my personal sentiments about how this
#  might best be done in Icon.  If you strip away the profuse comments
#  below, you end up with less than thirty lines of actual source
#  code.
#
#  I have arranged things so that only that portion of an automaton
#  which is needed to complete a given string is actually created and
#  stored.  Storing automata for later use naturally makes complete()
#  eat up more memory.  The performance gains make it worth the
#  trouble, though.  If, for some reason, there comes a time when it
#  is advisable to reclaim the space occupied by complete's static
#  structures, you can just call it without arguments.  This
#  "resets" complete() and forces an immediate garbage collection.
#  
#  Example code:
#
#      commands := ["run","stop","quit","save","load","continue"]
#      while line := read(&input) do {
#          cmds := list()
#          every put(cmds, complete(line, commands))
#          case *cmds of {
#              0 : input_error(line)
#              1 : do_command(cmds[1])
#              default : display_possible_completions(cmds)
#          }
#          etc...
#
#  More Iconish methods might include displaying successive
#  alternatives each time the user presses the tab key (this would,
#  however, require using the nonportable getch() routine).  Another
#  method might be to use the first string suspended by complete().
#
############################################################################
#
#  Links: none
#
############################################################################


procedure complete(s,st)

    local dfstn, c, l, old_char, newtbl, str, strset
    static t
    initial t := table()

    # No-arg invocation wipes out static structures & causes an
    # immediate garbage collection.
    if /s & /st then {
	t := table()
	collect()		# do it NOW
	fail
    }
    type(st) == ("list"|"set") |
	stop("error (complete):  list or set expected for arg2")

    # Seriously, all that's being done here is that possible states
    # are being represented by sets containing possible completions of
    # s relative to st.  Each time a character is snarfed from s, we
    # check to see what strings in st might represent possible
    # completions, and store these in a set for future use.  At some
    # point, we either run into a character in s that makes comple-
    # tion impossible (fail), or we run out of characters in s (in
    # which case we succeed, & suspend each of the possible
    # completions).

    # Store any sets we have to create in a static structure for later
    # re-use.
    /t[st] := table()

    # We'll call the table entry for the current set dfstn.  (It really
    # does enable us to do things deterministically.)
    dfstn := t[st]

    # Snarf one character at a time from s.
    every c := !s do {

	# The state we're in is represented by the set of all possible
	# completions before c was read.  If we haven't yet seen char
	# c in this state, run through the current-possible-completion
	# set, popping off the first character of each possible
	# completion, and then construct a table which uses these
	# initial as keys, and the completions that are possible for
	# each of these characters are the values for those keys.
	if /dfstn[st] then {

	    # To get strings that start with the same char together,
	    # sort the current string set (st).
	    l := sort(st)
	    newtbl := table()
	    old_chr := ""
	    every str := !l do {
		str ?:= (chr := move(1), tab(0)) | next
		if old_chr ~==:= chr then
		    strset := set()
		insert(newtbl, chr, insert(strset, str))
	    }
	    insert(dfstn, st, newtbl)
	}

	# What we've done essentially is to create a table in which
	# the keys represent labeled arcs out of the current state,
	# and the values represent possible completion sets for those
	# paths.  What we need to do now is store that table in dfstn
	# as the value of the current state-set (i.e. the current
	# range of possible completions).  Once stored, we can then
	# see if there is any arc from the current state (dfstn[st])
	# with the label c (dfstn[st][c]).  If so, its value becomes
	# the new current state (st), and we cycle around again for
	# yet another c.
	st := \(\dfstn[st])[c] | fail     # see NB below
    }

    # Eventually we run out of characters in c.  The current state
    # (i.e. the set of possible completions) can simply be suspended
    # one element at a time, with s prefixed to each element.  If, for
    # instance, st had contained ["hello","help","hear"] at the outset
    # and s was equal to "hel", we would now be suspending "hel" ||
    # !set(["lo","p"]).
    suspend s || !st

    # NB: Slight performance gains could be had in certain cases by
    # checking the size of st.  If size = 1 and s is a substring of
    # its only element (or else matches in its entirety), then we are
    # essentially finished.  No need to go on and on creating nodes
    # and arcs.  Storage space gains could be had by only inserting
    # arcs needed in order to handle the current character, c, instead
    # of creating all possible arcs out the current state whenever a
    # new arc is needed.

end
-- 

   -Richard L. Goerwitz              goer%sophist@uchicago.bitnet
   goer@sophist.uchicago.edu         rutgers!oddjob!gide!sophist!goer