[comp.lang.icon] terrible code

goer@ellis.uchicago.edu (Richard L. Goerwitz) (03/18/91)

I wrote the following code because I figured that using sets or
lists alone to accomplish what it does would be inefficient.  It
turns out that, even though the code below produces a determinis-
tic automaton (with some small cheating in one spot), it's about
a third as fast as just using a set or list all by itself to do
the same thing.

If there are any wizards online, who feel like bending their minds
over some obscure code, I wouldn't mind a bit if they'd comment
on how this might be done more efficiently.  Non-wizards beware
:-).

-Richard


------------------------------------------------------------------

procedure anystr(l,s,i,j)

    static done_tbl
    initial done_tbl := table()

    #
    # Make defaults work like those for built-in string-handling
    # functions.
    #
    /s := &subject
    if \i then {
	if i < 1 then
	    i := *s + (i+1)
    }
    else i := \&pos | 1
    if \j then {
	if j < 1 then
	    j := *s + (j+1)
    }
    else j := *s+1

    #
    # Create table to sort and hold characters for list l (if it
    # does not already exist.  Then return the position in s of
    # the longest string in l that matches at position i.
    #
    /done_tbl[l] := table(sort(set(l)))
    return (i-1) + (s[i:j] ? walk_table(done_tbl[l]))

    # NB:  longest possible match approach!

end


procedure walk_table(t)

    local val, chr, ntbl, nlst, empty_string_present

    while val := t[move(1)] do {

	case type(val) of {
	    "table" : {
		if "" == key(val) then {
		    POS := &pos
		    return (walk_table(val) | POS)
		}
		else return walk_table(val)
	    }
	    "list"  : {
		case *val of {
		    0      : fail
		    1      : if (move(-1),=val[1]) then return .&pos else fail
		    default: {
			nkey := "impossible key"
			while pop(val) ? {
			    empty_string_present := pos(0)
			    chr := move(1) | ""
			    if nkey ~==:= chr then {
				nlst := list()
				ntbl := table(nlst)
				insert(t, nkey, ntbl)
			    }
			    put(nlst, tab(0))
			}
			move(-1)
			if \empty_string_present then {
			    POS := &pos
			    return (walk_table(t) | POS)
			}
			else return walk_table(t)
		    }
		}
	    }
	}
    }

end

nowlin@isidev.UUCP (03/18/91)

You're right.  Terrible code :-)  I'm not a wizard so could you please
stick an illustrative main procedure on there so we non-wizards can figure
out what this program is supposed to do in the first place?  It looks like
it scans a string for the longest match on any one of a list of strings but
I got nowhere trying to make that work.  If that's really what is going on
I have some ideas.

Thanks.

+-------------------------------------------------------------------------+
|  --- ---                                                                |
|   | S |  Iconic Software, Inc.  -  Jerry Nowlin  - uunet!isidev!nowlin  |
|  --- ---                                                                |
+-------------------------------------------------------------------------+

nowlin@isidev.UUCP (03/18/91)

I figured out what I was doing wrong and got the program originally posted
to work.  I came up with a solution that got the same answer for my test
data.  Is this a possible solution to what you were trying to do?  Notice I
just borrowed the argument processing code from the original anystr()
procedure for my longstr() procedure:

procedure main(args)

	l := ["th","that ","not close","tha","that b","t"]
	s := "that begins this string"

	#write(anystr(l,s))
	write(longstr(l,s))

end

procedure longstr(l,s,i,j)

    ##### borrowed #####
    /s := &subject
    if \i then {
	if i < 1 then
	    i := *s + (i+1)
    }
    else i := \&pos | 1
    if \j then {
	if j < 1 then
	    j := *s + (j+1)
    }
    else j := *s+1
    ##### borrowed #####

    m := 0
    while *(p := (s ? =!l)) > m do m := *p
    return m + 1
end

I didn't include the original posted code here but you can add it to this
program and test the two solutions together.  I haven't a clue which is
faster but this one is somewhat shorter.

+-------------------------------------------------------------------------+
|  --- ---                                                                |
|   | S |  Iconic Software, Inc.  -  Jerry Nowlin  - uunet!isidev!nowlin  |
|  --- ---                                                                |
+-------------------------------------------------------------------------+

nowlin@isidev.UUCP (03/19/91)

Last time...promise.  I just forgot to use the 'i' and 'j' arguments in my
solution and thought as long as I'm posting it again why not comment it
some since it is a little cryptic:

procedure main(args)

	s := "that begins this string"

	l := ["th","that beg","not close","","tha","begins t","that b","t"]

	write(longstr(l,s)) | write("nomatch")
	write(longstr(l,s,6)) | write("nomatch")
	write(longstr(l,s,6,12)) | write("nomatch")

	l := ["th","that beg","not close","tha","begins t","that b","t"]

	write(longstr(l,s)) | write("nomatch")
	write(longstr(l,s,6)) | write("nomatch")
	write(longstr(l,s,6,12)) | write("nomatch")

end

procedure longstr(l,s,i,j)
    ##### borrowed #####
    /s := &subject
    if \i then {
	if i < 1 then
	    i := *s + (i+1)
    }
    else i := \&pos | 1
    if \j then {
	if j < 1 then
	    j := *s + (j+1)
    }
    else j := *s+1
    ##### borrowed #####

    # initialize the match length
    m := 0

    # while there is a pattern 'p' in list 'l' that matches string 's[i:j]'
    # and that pattern 'p' is longer than the current match length 'm' reset
    # the match length 'm' to the length of pattern 'p' -- goal directed
    # evaluation in the 'while' expression is the key here
    while *(p := (s[i:j] ? =!l)) > m do m := *p

    # if no pattern matched fail
    if /p then fail

    # if a pattern matched return the position in 's' past the pattern
    else return i + m

end

+-------------------------------------------------------------------------+
|  --- ---                                                                |
|   | S |  Iconic Software, Inc.  -  Jerry Nowlin  - uunet!isidev!nowlin  |
|  --- ---                                                                |
+-------------------------------------------------------------------------+

goer@ellis.uchicago.edu (Richard L. Goerwitz) (03/19/91)

In article <9103181558.AA20389@uunet.uu.net> nowlin@isidev.UUCP writes:
>    m := 0
>    while *(p := (s ? =!l)) > m do m := *p
>    return m + 1

Very clever.  Try matching each member of l, keeping a record of the
length of the match.  The longest match wins.

This sort of code does exactly what my code does.  Here's the problem.
Solution of the type exemplified above involve mindless interation
through the entire list, l.  I decided that it would be sensible to
write a little program that made this process deterministic.  I used
tables of tables to accomplish this.  Read a char, then see if the
char is in the lookup table.  Whatever strings begin with that char
become possible matches.  Then read another char.  Of the strings con-
sidered possible matches before, only those whose 2nd character matches
the one just read are possible matches, etc.

I added a cheat.  If at any time we run into "" (some string runs out
of chars), we remember that spot, and continue with the remaining
strings.  If nothing else matches beyond this point, we backtrack to
it and return the position we were at when we ran out of characters in
on of the strings.

Anyway, this is pretty much a deterministic process (with that one
cheat described immediately above).  It's slow as mud, though.  And
so your code, Jerry, while seemingly "dumb but elegant" turns out
better than mine!

There must be a way to do the kinds of things we're talking about here
in Icon, and do it with somewhat greater speed than the =!l approach.

-Richard

nowlin@isidev.UUCP (03/19/91)

>From article <1991Mar18.171946.28280@midway.uchicago.edu> (Richard L. Goerwitz)
> In article <9103181558.AA20389@uunet.uu.net> nowlin@isidev.UUCP writes:
> >
> >    m := 0
> >    while *(p := (s ? =!l)) > m do m := *p
> >    return m + 1
>
> Very clever.  Try matching each member of l, keeping a record of the
> length of the match.  The longest match wins.
> 
> This sort of code does exactly what my code does.  Here's the problem.
> Solution of the type exemplified above involve mindless interation
> through the entire list, l.  I decided that it would be sensible to
> ...
> There must be a way to do the kinds of things we're talking about here
> in Icon, and do it with somewhat greater speed than the =!l approach.

I've included the source from a subsequent reposting of the piece of code
being discussed since it was modified to follow the original program more
closely:

   m := 0
   while *(p := (s[i:j] ?  =!l)) > m do m := *p
   if /p then fail
   else return i + m

The key to this is that it's not a "mindless iteration through the entire
list".  It's an iteration, but in any language but Icon you'd have to
explicitly do a lot more to control this iteration than in the simple
expression above.  Simple is in the eyes of the beholder :-)

Any expression that follows the 'while' control structure must try to
succeed due to goal directed evaluation.  That means if the expression
contains a generator, in this case the !l, results are generated until the
generator is exhausted or the expression succeeds in the context of one of
the results.  If the expression succeeds the result is used to modify the
expression.  Eventually none of the generator's results will cause the
expression to succeed and the loop is exited.

This Icon stuff is pretty slick.  I fail to see why a few of dozen lines of
admittedly "terrible" code make a better solution than these four lines.
I could do a matching table in C that would blow this away in terms of
speed but this is Icon.  It should be done Iconishly.

+-------------------------------------------------------------------------+
|  --- ---                                                                |
|   | S |  Iconic Software, Inc.  -  Jerry Nowlin  - uunet!isidev!nowlin  |
|  --- ---                                                                |
+-------------------------------------------------------------------------+

sbw@TURING.CSE.NAU.EDU (Steve Wampler) (03/19/91)

On Mar 18 at 19:47, isidev!nowlin@uunet.uu.net writes:
} 
}    m := 0
}    while *(p := (s[i:j] ?  =!l)) > m do m := *p
}    if /p then fail
}    else return i + m
} 

Out of curiousity (and because I like bizarre code), couldn't
the above be written:

    m := 0
    while m >:= *(s[i:j] ? =!l)
    return i + (0 ~= m)

instead?  (No, I haven't tried it, but it seems the same to me...)

-- 
	Steve Wampler
	{....!arizona!naucse!sbw}
	{sbw@turing.cse.nau.edu}

sbw@TURING.CSE.NAU.EDU (Steve Wampler) (03/19/91)

On Mar 18 at 19:58, Steve Wampler writes:
} On Mar 18 at 19:47, isidev!nowlin@uunet.uu.net writes:
} } 
} }    m := 0
} }    while *(p := (s[i:j] ?  =!l)) > m do m := *p
} }    if /p then fail
} }    else return i + m
} } 
} 
} Out of curiousity (and because I like bizarre code), couldn't
} the above be written:
} 
}     m := 0
}     while m >:= *(s[i:j] ? =!l)
}     return i + (0 ~= m)
} 
} instead?  (No, I haven't tried it, but it seems the same to me...)

Sigh.  Make that a <:=, not >:=.


-- 
	Steve Wampler
	{....!arizona!naucse!sbw}
	{sbw@turing.cse.nau.edu}

goer@ellis.uchicago.edu (Richard L. Goerwitz) (03/19/91)

In article <9103190245.AA11982@uunet.uu.net> nowlin@isidev.UUCP writes:
>
>   m := 0
>   while *(p := (s[i:j] ?  =!l)) > m do m := *p
>   if /p then fail
>   else return i + m
>
>The key to this is that it's not a "mindless iteration through the entire
>list".  It's an iteration, but in any language but Icon you'd have to
>explicitly do a lot more to control this iteration than in the simple
>expression above.  Simple is in the eyes of the beholder :-)

You know, reading over my last posting, you could take the word "mindless"
to refer to the programmer, and not to the =!l method of looking for mat-
ches.  That's not what I indended, of course.  I should really try to be
a bit more gracious.  The overall solution was really very clean.

Anyway, elegance of expression is certainly a plus with Icon.  So it its
economy.  The question is whether this economy doesn't sometimes involve
some serious, serious performance penalties.  In the case of my "terrible
code" (self-admitted), I was trying to use two Icon features (hash tables
and sets) to coax more performance out of a routine than the backtrack-
ing mechanisms, unhindered, allowed.

>This Icon stuff is pretty slick.  I fail to see why a few of dozen lines of
>admittedly "terrible" code make a better solution than these four lines.
>I could do a matching table in C that would blow this away in terms of
>speed but this is Icon.  It should be done Iconishly.

Well, that again is in the eyes of the beholder.  Does "Iconishly" mean
"in an extremely compact and elegant, but computationally clumsy and inef-
ficient, manner"?  Perhaps in many cases, yes.  In this case, I had hoped
no.  My attempt at doing things more deterministically, though, failed.
So in the end I was left with something both slow and inelegant.

I guess one could argue that this sort of thing should be done in C, and
that the easiest way to do things would be just to use the regexp rou-
tines.  Often, though, I work on larger-scale projects mainly in Icon,
and am very, very reluctant to hack in C code, either through extcall/
callout, or, worse yet, via shell scripts and pipes.  Unix isn't the only
platform I operate on, and there is something esthetically displeasing
about having to hack every interpreter I want to run software on.

I guess the big question is this:  How easy will inline C code be able
to be incorporated into compiled Icon code??

(Jerry, thanks for responding; I'll probably end up using your code, a
la Steve Wampler's modifications.)

-Richard (goer@sophist.uchicago.edu)

nowlin@isidev.UUCP (03/19/91)

In message <9103190300.AA10894@turing.cse.nau.edu> From: (Steve Wampler)
> } Out of curiousity (and because I like bizarre code), couldn't
> } the above be written:
> } 
> }     m := 0
> }     while m >:= *(s[i:j] ? =!l)
> }     return i + (0 ~= m)
> } 
> } instead?  (No, I haven't tried it, but it seems the same to me...)
> 
> Sigh.  Make that a <:=, not >:=.

Good reduction.  The only problem with this is that if someone were to
include an empty string in the list they're matching this solution would
fail even if it matched the empty string.  (I know...who cares!)  A simple
modification fixes that though.  I added enough to test this.  Look at an
earlier posting to see comments:

procedure main(args)
	s := "that begins this string"
	l := ["th","that beg","not close","","tha","begins t","that b","t"]
	write(longstr(l,s,6,12)) | write("nomatch")
	l := ["th","that beg","not close","tha","begins t","that b","t"]
	write(longstr(l,s,6,12)) | write("nomatch")
end

procedure longstr(l,s,i,j)
    ##### borrowed #####
    /s := &subject
    if \i then {
	if i < 1 then
	    i := *s + (i+1)
    }
    else i := \&pos | 1
    if \j then {
	if j < 1 then
	    j := *s + (j+1)
    }
    else j := *s+1
    ##### borrowed #####

    m := 0
    while m <:= *(p := (s[i:j] ? =!l))
    if /p then fail
    else return i + m
end

+-------------------------------------------------------------------------+
|  --- ---                                                                |
|   | S |  Iconic Software, Inc.  -  Jerry Nowlin  - uunet!isidev!nowlin  |
|  --- ---                                                                |
+-------------------------------------------------------------------------+