[comp.lang.icon] partial input ...

nowlin@isidev.UUCP (04/11/91)

I know I should resist but it's like a challenge.  I just can't let a
verbose Icon program go by without wanting to terse it up.  I think of Icon
as making it easier on me as a programmer.  If Icon is going to do all this
work for me why not let it?  That's only partly rhetorical.

Anyway, what follows contains a terser version of the complete() procedure.
I included the original version since the program below tests both.  I
took the comments (which were useful by the way) out of the original since
it was 40 lines even without them:

procedure main()

	l := ["idaho","ohio","utah","indiana","illinois","texas","iowa"]

	write("\nProgrammer does the work:")
	every write("\t",complete("i",l))

	write("\nIcon does the work:")
	every write("\t",terscomp("i",l))
end

procedure terscomp(s,st)
	suspend match(s,p := !st) & p
end

procedure complete(s,st)

    local dfstn, c, l, old_char, newtbl, str, strset
    static t
    initial t := table()

    if /s & /st then {
	t := table()
	collect()
	fail
    }
    type(st) == ("list"|"set") |
	stop("error (complete):  list or set expected for arg2")

    /t[st] := table()

    dfstn := t[st]

    every c := !s do {

	if /dfstn[st] then {

	    l := sort(st)
	    newtbl := table()
	    old_chr := ""
	    every str := !l do {
		str ?:= (chr := move(1), tab(0)) | next
		if old_chr ~==:= chr then
		    strset := set()
		insert(newtbl, chr, insert(strset, str))
	    }
	    insert(dfstn, st, newtbl)
	}

	st := \(\dfstn[st])[c] | fail
    }

    suspend s || !st

end

--- ---
 | S | Iconic Software, Inc.  -  Jerry Nowlin  -  uunet!isidev!nowlin
--- ---

goer@quads.uchicago.edu (Richard L. Goerwitz) (04/11/91)

In article <9104111325.AA08584@relay1.UU.NET> nowlin@isidev.UUCP writes:
>I know I should resist...

Jerry, the whole reason I posted was that I had hoped people would
not be able to resist :-) !

>Anyway, what follows contains a terser version of the complete() procedure.

I fooled with my program a while, and got it to run faster than the
"let Icon do the work" approach, but let me point out that the speed
difference is less than 5% in most instances, and never more than 10%.

If anyone wants what I came up with, I'll gladly send it out.  I don't
really see any reason why anyone should want it, though, when there is
a three-line equivalent that does pretty much the same thing (and uses
less memory to boot).

-Richard
-- 

   -Richard L. Goerwitz              goer%sophist@uchicago.bitnet
   goer@sophist.uchicago.edu         rutgers!oddjob!gide!sophist!goer

alex@LAGUNA.METAPHOR.COM (Bob Alexander) (04/12/91)

I, too, was interested in seeing how the obvious terser version of
Richard G's complete() procedure would work.  My terse version was much
like Jerry N's (before I saw Jerry's).  So I did some timings, and
here's what I found:  The terse version was faster when used with a
list with a size on the order of Richard's or Jerry's sample program
(how much faster depended on the size of input string).  *However*,
when the list gets large (~100 entries), Richard's wins hands down.

I guess that makes sense -- Richard's run time will vary
(predominantly) with the size of the abbreviated string, where the
terse method varies with the size of the list.

-- Bob Alexander

Metaphor Computer Systems   (415) 961-3600 x751   alex@metaphor.com
====^=== Mountain View, CA  ...{uunet}!{decwrl,apple}!metaphor!alex

goer@ellis.uchicago.edu (Richard L. Goerwitz) (04/12/91)

alex@LAGUNA.METAPHOR.COM (Bob Alexander) writes:
>I, too, was interested in seeing how the obvious terser version of
>Richard G's complete() procedure would work.  My terse version was much
>like Jerry N's (before I saw Jerry's).  So I did some timings, and
>here's what I found:  The terse version was faster when used with a
>list with a size on the order of Richard's or Jerry's sample program
>(how much faster depended on the size of input string).  *However*,
>when the list gets large (~100 entries), Richard's wins hands down.
>
>I guess that makes sense -- Richard's run time will vary
>(predominantly) with the size of the abbreviated string, where the
>terse method varies with the size of the list.

I managed to get performance of the version I posted to a level where
it offers an acceptable increase over Jerry's version.  I confess that
one of the big reasons I wrote it was just to determine whether
Icon could offer me a fairly fast, deterministic way of handling
such problems.

At one point several months ago I actually wrote an Icon-only egrep-
type program that actually constructed a full-blown deterministic au-
tomaton, constructing only what it needed in order to accept or re-
ject a line.  Trouble is that it actually ran much slower than a simi-
lar program I wrote that goes to a nondeterministic pushdown automaton
under certain circumstances.  I would guess that for very complex fstns
the deterministic algorithm would be faster, but in practical use it
just didn't cut the mustard.  (The nondeterministic version of this
program is in one or another IPL update as findre.icn.)

Anyway, it frustrated me that Icon didn't really let me get at anything
on a low enough level to solve basic pattern-matching problems deter-
ministically and with reasonable storage requirements (I still wish
Icon had *some* form of pointers for simple data types).  This little
string completion utility was really just another attempt on my part
to solve this problem of performance and deterministic automata in
Icon.  The fact that it now runs faster than the "obvious" solution
only gratifies me in the sense that I might be able to apply the tech-
nique used here to problems where the performance gain would be a lot
more significant.

Since a couple of people asked for the "fixed" version, I'm reposting
it.

-Richard

############################################################################
#
#	Name:	 complete.icn
#
#	Title:	 complete partial input string
#
#	Author:	 Richard L. Goerwitz
#
#	Version: 1.3
#
############################################################################
#
#  This file contains a single procedure, complete(s,st), which
#  completes a string (s) relative to a set or list of strings (st).
#  Put differently, complete() lets you supply a partial string, s,
#  and get back those strings in st that s is either equal to or a
#  substring of.
#
#  Lots of command interfaces allow completion of partial input.
#  Complete() simply represents my personal sentiments about how this
#  might best be done in Icon.  If you strip away the profuse comments
#  below, you end up with less than thirty lines of actual source
#  code.
#
#  I have arranged things so that only that portion of an automaton
#  which is needed to complete a given string is actually created and
#  stored.  Storing automata for later use naturally makes complete()
#  eat up more memory.  The performance gains make it worth the
#  trouble, though.  If, for some reason, there comes a time when it
#  is advisable to reclaim the space occupied by complete's static
#  structures, you can just call it without arguments.  This
#  "resets" complete() and forces an immediate garbage collection.
#  
# Example code:
#
#      commands := ["run","stop","quit","save","load","continue"]
#      while line := read(&input) do {
#          cmds := list()
#          every put(cmds, complete(line, commands))
#          case *cmds of {
#              0 : input_error(line)
#              1 : do_command(cmds[1])
#              default : display_possible_completions(cmds)
#          }
#          etc...
#
#  More Iconish methods might include displaying successive
#  alternatives each time the user presses the tab key (this would,
#  however, require using the nonportable getch() routine).  Another
#  method might be to use the first string suspended by complete().
#
#  NOTE: This entire shebang could be replaced with a slightly slower
#  and much smaller program suggested to me by Jerry Nowlin and Bob
#  Alexander.
#
#      procedure terscompl(s, st)
#          suspend match(s, p := !st) & p
#      end
#
#  This program will work fine for lists with just a few members, and
#  will use a lot less memory.
#
############################################################################
#
#  Links: none
#
############################################################################

procedure complete(s,st)

    local dfstn, c, l, old_char, newtbl, str, strset
    static t
    initial t := table()

    # No-arg invocation wipes out static structures & causes an
    # immediate garbage collection.
    if /s & /st then {
	t := table()
	collect()		# do it NOW
	fail
    }
    type(st) == ("list"|"set") |
	stop("error (complete):  list or set expected for arg2")

    # Seriously, all that's being done here is that possible states
    # are being represented by sets containing possible completions of
    # s relative to st.  Each time a character is snarfed from s, we
    # check to see what strings in st might represent possible
    # completions, and store these in a set for future use.  At some
    # point, we either run into a character in s that makes comple-
    # tion impossible (fail), or we run out of characters in s (in
    # which case we succeed, & suspend each of the possible
    # completions).

    # Store any sets we have to create in a static structure for later
    # re-use.
    /t[st] := table()

    # We'll call the table entry for the current set dfstn.  (It really
    # does enable us to do things deterministically.)
    dfstn := t[st]

    # Snarf one character at a time from s.
    every c := !s do {

	# The state we're in is represented by the set of all possible
	# completions before c was read.  If we haven't yet seen char
	# c in this state, run through the current-possible-completion
	# set, popping off the first character of each possible
	# completion, and then construct a table which uses these
	# initial as keys, and the completions that are possible for
	# each of these characters are the values for those keys.
	if /dfstn[st] then {

	    # To get strings that start with the same char together,
	    # sort the current string set (st).
	    l := sort(st)
	    newtbl := table()
	    old_chr := ""
	    every str := !l do {
		str ? { chr := move(1) | next; str := tab(0) }
		if old_chr ~==:= chr then {
		    strset := set([str])
		    insert(newtbl, chr, strset)
		}
		else insert(strset, str)
	    }
	    insert(dfstn, st, newtbl)
	}

	# What we've done essentially is to create a table in which
	# the keys represent labeled arcs out of the current state,
	# and the values represent possible completion sets for those
	# paths.  What we need to do now is store that table in dfstn
	# as the value of the current state-set (i.e. the current
	# range of possible completions).  Once stored, we can then
	# see if there is any arc from the current state (dfstn[st])
	# with the label c (dfstn[st][c]).  If so, its value becomes
	# the new current state (st), and we cycle around again for
	# yet another c.
	st := \dfstn[st][c] | fail
	if *st = 1 & match(s,!st)
	then break
    }

    # Eventually we run out of characters in c.  The current state
    # (i.e. the set of possible completions) can simply be suspended
    # one element at a time, with s prefixed to each element.  If, for
    # instance, st had contained ["hello","help","hear"] at the outset
    # and s was equal to "hel", we would now be suspending "hel" ||
    # !set(["lo","p"]).
    suspend s || !st

end
-- 

   -Richard L. Goerwitz              goer%sophist@uchicago.bitnet
   goer@sophist.uchicago.edu         rutgers!oddjob!gide!sophist!goer