goer@ellis.uchicago.edu (Richard L. Goerwitz) (03/18/91)
I wrote the following code because I figured that using sets or lists alone to accomplish what it does would be inefficient. It turns out that, even though the code below produces a determinis- tic automaton (with some small cheating in one spot), it's about a third as fast as just using a set or list all by itself to do the same thing. If there are any wizards online, who feel like bending their minds over some obscure code, I wouldn't mind a bit if they'd comment on how this might be done more efficiently. Non-wizards beware :-). -Richard ------------------------------------------------------------------ procedure anystr(l,s,i,j) static done_tbl initial done_tbl := table() # # Make defaults work like those for built-in string-handling # functions. # /s := &subject if \i then { if i < 1 then i := *s + (i+1) } else i := \&pos | 1 if \j then { if j < 1 then j := *s + (j+1) } else j := *s+1 # # Create table to sort and hold characters for list l (if it # does not already exist. Then return the position in s of # the longest string in l that matches at position i. # /done_tbl[l] := table(sort(set(l))) return (i-1) + (s[i:j] ? walk_table(done_tbl[l])) # NB: longest possible match approach! end procedure walk_table(t) local val, chr, ntbl, nlst, empty_string_present while val := t[move(1)] do { case type(val) of { "table" : { if "" == key(val) then { POS := &pos return (walk_table(val) | POS) } else return walk_table(val) } "list" : { case *val of { 0 : fail 1 : if (move(-1),=val[1]) then return .&pos else fail default: { nkey := "impossible key" while pop(val) ? { empty_string_present := pos(0) chr := move(1) | "" if nkey ~==:= chr then { nlst := list() ntbl := table(nlst) insert(t, nkey, ntbl) } put(nlst, tab(0)) } move(-1) if \empty_string_present then { POS := &pos return (walk_table(t) | POS) } else return walk_table(t) } } } } } end
nowlin@isidev.UUCP (03/18/91)
You're right. Terrible code :-) I'm not a wizard so could you please stick an illustrative main procedure on there so we non-wizards can figure out what this program is supposed to do in the first place? It looks like it scans a string for the longest match on any one of a list of strings but I got nowhere trying to make that work. If that's really what is going on I have some ideas. Thanks. +-------------------------------------------------------------------------+ | --- --- | | | S | Iconic Software, Inc. - Jerry Nowlin - uunet!isidev!nowlin | | --- --- | +-------------------------------------------------------------------------+
nowlin@isidev.UUCP (03/18/91)
I figured out what I was doing wrong and got the program originally posted to work. I came up with a solution that got the same answer for my test data. Is this a possible solution to what you were trying to do? Notice I just borrowed the argument processing code from the original anystr() procedure for my longstr() procedure: procedure main(args) l := ["th","that ","not close","tha","that b","t"] s := "that begins this string" #write(anystr(l,s)) write(longstr(l,s)) end procedure longstr(l,s,i,j) ##### borrowed ##### /s := &subject if \i then { if i < 1 then i := *s + (i+1) } else i := \&pos | 1 if \j then { if j < 1 then j := *s + (j+1) } else j := *s+1 ##### borrowed ##### m := 0 while *(p := (s ? =!l)) > m do m := *p return m + 1 end I didn't include the original posted code here but you can add it to this program and test the two solutions together. I haven't a clue which is faster but this one is somewhat shorter. +-------------------------------------------------------------------------+ | --- --- | | | S | Iconic Software, Inc. - Jerry Nowlin - uunet!isidev!nowlin | | --- --- | +-------------------------------------------------------------------------+
nowlin@isidev.UUCP (03/19/91)
Last time...promise. I just forgot to use the 'i' and 'j' arguments in my solution and thought as long as I'm posting it again why not comment it some since it is a little cryptic: procedure main(args) s := "that begins this string" l := ["th","that beg","not close","","tha","begins t","that b","t"] write(longstr(l,s)) | write("nomatch") write(longstr(l,s,6)) | write("nomatch") write(longstr(l,s,6,12)) | write("nomatch") l := ["th","that beg","not close","tha","begins t","that b","t"] write(longstr(l,s)) | write("nomatch") write(longstr(l,s,6)) | write("nomatch") write(longstr(l,s,6,12)) | write("nomatch") end procedure longstr(l,s,i,j) ##### borrowed ##### /s := &subject if \i then { if i < 1 then i := *s + (i+1) } else i := \&pos | 1 if \j then { if j < 1 then j := *s + (j+1) } else j := *s+1 ##### borrowed ##### # initialize the match length m := 0 # while there is a pattern 'p' in list 'l' that matches string 's[i:j]' # and that pattern 'p' is longer than the current match length 'm' reset # the match length 'm' to the length of pattern 'p' -- goal directed # evaluation in the 'while' expression is the key here while *(p := (s[i:j] ? =!l)) > m do m := *p # if no pattern matched fail if /p then fail # if a pattern matched return the position in 's' past the pattern else return i + m end +-------------------------------------------------------------------------+ | --- --- | | | S | Iconic Software, Inc. - Jerry Nowlin - uunet!isidev!nowlin | | --- --- | +-------------------------------------------------------------------------+
goer@ellis.uchicago.edu (Richard L. Goerwitz) (03/19/91)
In article <9103181558.AA20389@uunet.uu.net> nowlin@isidev.UUCP writes: > m := 0 > while *(p := (s ? =!l)) > m do m := *p > return m + 1 Very clever. Try matching each member of l, keeping a record of the length of the match. The longest match wins. This sort of code does exactly what my code does. Here's the problem. Solution of the type exemplified above involve mindless interation through the entire list, l. I decided that it would be sensible to write a little program that made this process deterministic. I used tables of tables to accomplish this. Read a char, then see if the char is in the lookup table. Whatever strings begin with that char become possible matches. Then read another char. Of the strings con- sidered possible matches before, only those whose 2nd character matches the one just read are possible matches, etc. I added a cheat. If at any time we run into "" (some string runs out of chars), we remember that spot, and continue with the remaining strings. If nothing else matches beyond this point, we backtrack to it and return the position we were at when we ran out of characters in on of the strings. Anyway, this is pretty much a deterministic process (with that one cheat described immediately above). It's slow as mud, though. And so your code, Jerry, while seemingly "dumb but elegant" turns out better than mine! There must be a way to do the kinds of things we're talking about here in Icon, and do it with somewhat greater speed than the =!l approach. -Richard
nowlin@isidev.UUCP (03/19/91)
>From article <1991Mar18.171946.28280@midway.uchicago.edu> (Richard L. Goerwitz) > In article <9103181558.AA20389@uunet.uu.net> nowlin@isidev.UUCP writes: > > > > m := 0 > > while *(p := (s ? =!l)) > m do m := *p > > return m + 1 > > Very clever. Try matching each member of l, keeping a record of the > length of the match. The longest match wins. > > This sort of code does exactly what my code does. Here's the problem. > Solution of the type exemplified above involve mindless interation > through the entire list, l. I decided that it would be sensible to > ... > There must be a way to do the kinds of things we're talking about here > in Icon, and do it with somewhat greater speed than the =!l approach. I've included the source from a subsequent reposting of the piece of code being discussed since it was modified to follow the original program more closely: m := 0 while *(p := (s[i:j] ? =!l)) > m do m := *p if /p then fail else return i + m The key to this is that it's not a "mindless iteration through the entire list". It's an iteration, but in any language but Icon you'd have to explicitly do a lot more to control this iteration than in the simple expression above. Simple is in the eyes of the beholder :-) Any expression that follows the 'while' control structure must try to succeed due to goal directed evaluation. That means if the expression contains a generator, in this case the !l, results are generated until the generator is exhausted or the expression succeeds in the context of one of the results. If the expression succeeds the result is used to modify the expression. Eventually none of the generator's results will cause the expression to succeed and the loop is exited. This Icon stuff is pretty slick. I fail to see why a few of dozen lines of admittedly "terrible" code make a better solution than these four lines. I could do a matching table in C that would blow this away in terms of speed but this is Icon. It should be done Iconishly. +-------------------------------------------------------------------------+ | --- --- | | | S | Iconic Software, Inc. - Jerry Nowlin - uunet!isidev!nowlin | | --- --- | +-------------------------------------------------------------------------+
sbw@TURING.CSE.NAU.EDU (Steve Wampler) (03/19/91)
On Mar 18 at 19:47, isidev!nowlin@uunet.uu.net writes: } } m := 0 } while *(p := (s[i:j] ? =!l)) > m do m := *p } if /p then fail } else return i + m } Out of curiousity (and because I like bizarre code), couldn't the above be written: m := 0 while m >:= *(s[i:j] ? =!l) return i + (0 ~= m) instead? (No, I haven't tried it, but it seems the same to me...) -- Steve Wampler {....!arizona!naucse!sbw} {sbw@turing.cse.nau.edu}
sbw@TURING.CSE.NAU.EDU (Steve Wampler) (03/19/91)
On Mar 18 at 19:58, Steve Wampler writes: } On Mar 18 at 19:47, isidev!nowlin@uunet.uu.net writes: } } } } m := 0 } } while *(p := (s[i:j] ? =!l)) > m do m := *p } } if /p then fail } } else return i + m } } } } Out of curiousity (and because I like bizarre code), couldn't } the above be written: } } m := 0 } while m >:= *(s[i:j] ? =!l) } return i + (0 ~= m) } } instead? (No, I haven't tried it, but it seems the same to me...) Sigh. Make that a <:=, not >:=. -- Steve Wampler {....!arizona!naucse!sbw} {sbw@turing.cse.nau.edu}
goer@ellis.uchicago.edu (Richard L. Goerwitz) (03/19/91)
In article <9103190245.AA11982@uunet.uu.net> nowlin@isidev.UUCP writes: > > m := 0 > while *(p := (s[i:j] ? =!l)) > m do m := *p > if /p then fail > else return i + m > >The key to this is that it's not a "mindless iteration through the entire >list". It's an iteration, but in any language but Icon you'd have to >explicitly do a lot more to control this iteration than in the simple >expression above. Simple is in the eyes of the beholder :-) You know, reading over my last posting, you could take the word "mindless" to refer to the programmer, and not to the =!l method of looking for mat- ches. That's not what I indended, of course. I should really try to be a bit more gracious. The overall solution was really very clean. Anyway, elegance of expression is certainly a plus with Icon. So it its economy. The question is whether this economy doesn't sometimes involve some serious, serious performance penalties. In the case of my "terrible code" (self-admitted), I was trying to use two Icon features (hash tables and sets) to coax more performance out of a routine than the backtrack- ing mechanisms, unhindered, allowed. >This Icon stuff is pretty slick. I fail to see why a few of dozen lines of >admittedly "terrible" code make a better solution than these four lines. >I could do a matching table in C that would blow this away in terms of >speed but this is Icon. It should be done Iconishly. Well, that again is in the eyes of the beholder. Does "Iconishly" mean "in an extremely compact and elegant, but computationally clumsy and inef- ficient, manner"? Perhaps in many cases, yes. In this case, I had hoped no. My attempt at doing things more deterministically, though, failed. So in the end I was left with something both slow and inelegant. I guess one could argue that this sort of thing should be done in C, and that the easiest way to do things would be just to use the regexp rou- tines. Often, though, I work on larger-scale projects mainly in Icon, and am very, very reluctant to hack in C code, either through extcall/ callout, or, worse yet, via shell scripts and pipes. Unix isn't the only platform I operate on, and there is something esthetically displeasing about having to hack every interpreter I want to run software on. I guess the big question is this: How easy will inline C code be able to be incorporated into compiled Icon code?? (Jerry, thanks for responding; I'll probably end up using your code, a la Steve Wampler's modifications.) -Richard (goer@sophist.uchicago.edu)
nowlin@isidev.UUCP (03/19/91)
In message <9103190300.AA10894@turing.cse.nau.edu> From: (Steve Wampler) > } Out of curiousity (and because I like bizarre code), couldn't > } the above be written: > } > } m := 0 > } while m >:= *(s[i:j] ? =!l) > } return i + (0 ~= m) > } > } instead? (No, I haven't tried it, but it seems the same to me...) > > Sigh. Make that a <:=, not >:=. Good reduction. The only problem with this is that if someone were to include an empty string in the list they're matching this solution would fail even if it matched the empty string. (I know...who cares!) A simple modification fixes that though. I added enough to test this. Look at an earlier posting to see comments: procedure main(args) s := "that begins this string" l := ["th","that beg","not close","","tha","begins t","that b","t"] write(longstr(l,s,6,12)) | write("nomatch") l := ["th","that beg","not close","tha","begins t","that b","t"] write(longstr(l,s,6,12)) | write("nomatch") end procedure longstr(l,s,i,j) ##### borrowed ##### /s := &subject if \i then { if i < 1 then i := *s + (i+1) } else i := \&pos | 1 if \j then { if j < 1 then j := *s + (j+1) } else j := *s+1 ##### borrowed ##### m := 0 while m <:= *(p := (s[i:j] ? =!l)) if /p then fail else return i + m end +-------------------------------------------------------------------------+ | --- --- | | | S | Iconic Software, Inc. - Jerry Nowlin - uunet!isidev!nowlin | | --- --- | +-------------------------------------------------------------------------+