[comp.lang.icon] Comment on longstr.icn

alex@LAGUNA.METAPHOR.COM (Bob Alexander) (03/20/91)

Gee, this is fun.  I have a couple of comments to throw into the frey.

1)  Perhaps instead of the stop() if there is a problem with the
arguments, a runerr() would be more consistent with the builtin
string-analysis procedures -- "115: list, set, or table expected" would
be reasonably appropriate, since the algorithm could work with tables
too (records, too, for that matter, but that's probably a bit much).

2)  Things could be simplified quite a bit by not messing with defaults
for s, i, and j and letting Icon do the work by eliminating the string
scanning.  In this version, the m := -1 initial value can revert back
to 0, since matching an empty string returns 1.

So here's my entry in the series of suggestions.  Somehow I can't help
but wonder if I've missed something obvious -- but if not, do I get my
name in the growing list of credits? :-)


procedure longstr(l,s,i,j)

    local m

    #
    # Is l a list, set, table?
    #
    type(l) == ("list" | "set" | "table") |
        runerr(115,l)

    #
    # Find longest match()-ing string in l.
    #
    m := 0                      # Attempt to match() each member in l (=!l).
    every m <:= match(!l,s,i,j) # Produce the length of each match that suc-
                                # ceeds, and store its value in m iff it is
                                # greater than m's current value.
    #
    # Return i + the length of the longest match.  Fail if there was
    # no match (i.e. m still has its original value).
    #
    return 0 ~= m

end


-- Bob Alexander

Metaphor Computer Systems   (415) 961-3600 x751   alex@metaphor.com
====^=== Mountain View, CA  ...{uunet}!{decwrl,apple}!metaphor!alex

goer@ellis.uchicago.edu (Richard L. Goerwitz) (03/21/91)

alex@LAGUNA.METAPHOR.COM (Bob Alexander) writes:

>Gee, this is fun.

Sure is.

>So here's my entry in the series of suggestions.  Somehow I can't help
>but wonder if I've missed something obvious -- but if not, do I get my
>name in the growing list of credits? :-)

The credits don't fit onto one line anymore :-).

Jerry Nowlin gave this procedure its name, and started the whole dis-
cussion of whether backtracking though a list was not better than my
original (terrible) solution.  Steve Wampler made the code somewhat
terser and more elegant.  Ken Walker suggested turning while into
every, to make it possible to go through the list only once.  Bob
Alexander pointed out that explicit handling of offsets i/j was not
necessary, as long as we were using match().

In this most recent post, I've created a static structure to hold re-
verse-sorted versions of arg 1 (l).  Under normal circumstances, each in-
vocation will not contain a completely different first argument.  In
most cases, the same l will be used at least two or three times.  On
my machine, the magic number of re-uses for a five or six-element l,
with strings of 3-8 characters, is about 3.  With three or more invo-
cations of longstr(l, s, i, j) with the same first arg, it becomes
worthwhile to sort l in reverse order, and store this list for later
use.

The worst-case scenario for this version would be if it were called
repeatedly, with different first arguments, each of which was a list
with elements of the same length.  The performance, even in this case,
however, is close enough to Bob Alexander's version that it's worth
using it.

-Richard


############################################################################
#
#	Name:	 longstr.icn
#
#	Title:	 match longest string in a list or set of strings
#
#	Authors: Jerry Nowlin, Steve Wampler, Kenneth Walker, Bob
#                Alexander, and Richard Goerwitz
#
#	Version: 1.8
#
############################################################################
#
#  longstr(l,s,i,j) works like any(), except that instead of taking a
#  cset as its first argument, it takes instead a list or set of
#  strings (l).  Returns i + *x, where x is the longest string in l
#  for which match(x,s,i,j) succeeds.  Fails if no match occurs.
#
#  Defaults:
#      s     &subject
#      i     &pos if s is defaulted, otherwise 1
#      j     0
#
#  Errors:
#      The only manual error-checking that is done is to test l to
#      be sure it is, in fact, a list or set.  Errors such as non-
#      string members in l, and non-integer i/j parameters, are
#      caught by the normal Icon built-in string processing and sub-
#      scripting mechanisms.
#
############################################################################
#
#  Links: none
#
############################################################################


procedure longstr(l,s,i,j)

    local elem, tmp_table
    static l_table
    initial l_table := table()

    #
    # No-arg invocation wipes out all static structures, and forces an
    # immediate garbage collection.
    #
    if (/l, /s) then {
	l_table := table()
	collect()		# do it NOW
	return			# return &null
    }

    #
    # Is l a list, set, or table?
    #
    type(l) == ("list"|"set"|"table") |
	stop("longstr:  list, table, or set expected (arg 1)")

    #
    # Sort l longest-to-shortest, and keep a copy of the resulting
    # structure in l_table[l] for later use.
    #
    if /l_table[l] := [] then {

	tmp_table := table()
	# keys = lengths of elements, values = elements
	every elem := !l do {
	    /tmp_table[*elem] := []
	    put(tmp_table[*elem], elem)
	}
	# sort by key; stuff values, in reverse order, into a list
	every put(l_table[l], !sort(tmp_table,3)[*tmp_table*2 to 2 by -2])

    }

    #
    # First element in l_table[l] to match is the longest match (it's
    # sorted longest-to-shortest, remember?).
    #
    return match(!l_table[l],s,i,j)

end