goer@sophist.uucp (Richard Goerwitz) (08/03/90)
I'm having fits posting here because of mail reorganizations at the U of Chicago. I assume nothing I've posted in the last week has gotten through. Anyway, I've just been trying to post a newer version of the list scanning routines I posted a couple of weeks back. This new ver- sion fixes a couple of bugs and provides the "missing" l_bal() function. I want very much to prototype this new scanning method using user- defined control structures, but it's proving very hard. Let's say you want to write - repeat { do something l_scan(l) { expresion1, expression2 & break } } Okay, the l_scan(l) routine sets pos and subject for the list, and then returns a procedure geared for {}-style invocation. Expres- sion1 and expression2 & break are the arguments to this procedure. The trouble is that, even if we could say "create break" (which we can't), we end up with a break in an invalid context. The user-defined control structures seem not to allow real Iconish control. Is this the case? If so, no matter. If not (i.e. if I am overlooking something), could someone kindly enlighten me? Alter- natively, if someone has an idea about how to prototype new control structures, could they perhaps post a followup? I dearly hope that our news software & mail router doesn't barf on this one. Anyway, here's the code: ############################################################################ # # Name: lscan.icn # # Title: Quasi ? scanning routines for lists. # # Author: Richard L. Goerwitz # # Version: 1.16 # ############################################################################ # # Copyright (c) 1990, Richard L. Goerwitz, III # # This software is intended for free and unrestricted distribution. # I place only two conditions on its use: 1) That you clearly mark # any additions or changes you make to the source code, and 2) that # you do not delete this message from it. In order to protect # myself from spurious litigation, it must also be stated here that, # because this is free software, I, Richard Goerwitz, make no claim # about the applicability or fitness of this software for any # purpose, and disclaim any responsibility for any damages that # might be incurred in conjunction with its use. # ############################################################################ # # PURPOSE: String scanning is terrific, but often I am forced to # tokenize and work with lists. So as to make operations on these # lists as close to corresponding string operations as possible, I've # implemented a series of list analogues to any(), bal(), find(), # many(), match(), move(), pos(), tab(), and upto(). Their names are # just like corresponding string functions, except with a prepended # "l_" (e.g. l_any()). Functionally, the list routines parallel the # string ones closely, except that in place of strings, l_find and # l_match accept lists as their first argument. L_any(), l_many(), # and l_upto() all take either sets or lists of lists (e.g. # l_tab(l_upto([["a"],["b"],["j","u","n","k"]])). Note that l_bal(), # unlike the builtin bal(), has no defaults for the first four # arguments. This just seemed appropriate, given that no precise # list analogue to &cset, etc. occurs. # # Since Icon does not permit me to define new control structures via # infix operators, list scanning must be initialized via a procedure, # namely l_scan(l), where l is the list to be scanned. Scanning is # ended by calling end_l_scan(l). Nesting is permitted, but you'd # better make sure your indentation is correct. Since l_scan() does # not represent a real control structure, you must call end_l_scan() # before "breaking" from an l_scan. Otherwise l_POS and l_SUBJ will # not be handled properly. # # If someone can think of a way to do this more elegantly let me # know. Personally, I'd like p{a,b,c} to be extended to allow p{a S # b S c} (where S ::= ; | \n). I could then do something cute, like # have l_scan initialize l_POS and l_SUBJ for list l, and then return # a control procedure geared for {}-style invocation: # # l_scan(l) { # a # b # c # } # # I just think this *looks* like a control structure (much more so # than the kludge I give below). I'd just set up the control pro- # cedure to handle variable-length arguments. # # I envision l_scan being used (and in fact use it myself) to parse # trees and compare complicated list structures. One big time-saver # in parsing natural languages, for instance, is to store partial # trees. The l_scan routines enable you to take your list structures # and say, "Give me that portion of list 2 which corresponds # structurally to list 1." Or you can say, "Tell me if the first # portion of x list corresponds structurally to y." In effect, the # structure of the tree can be checked without necessitating a # reparse. # # BUGS: Can you say slow? I thought you could. # ############################################################################ # # Here's a trivial example of how one might utilize the lscan routines: # # procedure main() # # l := ["h","e","l","l","o"," ","t","t","t","h","e","r","e"] # # l_scan(l) # hello_list := l_tab(l_match(["h","e","l","l","o"])) # every writes(!hello_list) # write() # # l_scan(l_tab(0)) # l_tab(l_many([[" "],["t"]]) - 1) # every writes(!l_tab(0)) # write() # end_l_scan() # # end_l_scan() # # end # # The above program simply writes "hello" and "there" on successive # lines to the standard output. # # PITFALLS: In general, note that we are comparing lists here instead # of strings, so l_find("h", l), for instance, will yield an error # message (use l_find(["h"], l) instead). The point at which I # expect this nuance will be most confusing will be in cases where # one is looking for lists within lists. Suppose we have a list, # # l1 := ["junk",[["hello"]," ",["there"]],"!","m","o","r","e","junk"] # # and suppose, moreover, that we wish to find the position in l1 at # which the list # # [["hello"]," ",["there"]] # # occurs. If, say, we assign [["hello"]," ",["there"]] to the # variable l2, then our l_find() expression will need to look like # # l_scan(l1) # l_tab(l_find([l2])) # etc. # # The reason l2 must be enclosed within a list is that we are # searching for a single element within l1, not a series of three # elements. Just as l_find("h") would not work correctly above, so # l_find(l2) will not work here (although in this latter case, no # error message is generated; the expression merely fails when it # finds no sequence of elements in l1 corresponding to the sequence # of elements occurring in l2). # ############################################################################ global l_POS global l_SUBJ procedure l_scan(l) if /l_POS then { l_POS := [] l_SUBJ := [] } push(l_POS, 1) push(l_SUBJ, l) return end procedure end_l_scan() every pop(l_POS|l_SUBJ) if *l_POS = 0 then { l_POS := &null l_SUBJ := &null } return end procedure l_move(i) /i & stop("l_move: Null argument.") if /l_POS | /l_SUBJ then stop("l_move: Call l_scan first.") suspend l_SUBJ[1][.l_POS[1]:l_POS[1] <- (0 < (*l_SUBJ[1]+1 >= l_POS[1]+i))] end procedure l_tab(i) /i & stop("l_tab: Null argument.") if /l_POS | /l_SUBJ then stop("l_tab: Call l_scan first.") if i = 0 then suspend l_SUBJ[1][.l_POS[1]:l_POS[1] <- *l_SUBJ[1]+1] else { if i < 0 then suspend l_SUBJ[1][.l_POS[1]:l_POS[1] <- (0 < (*l_SUBJ[1]+1) + i)] else suspend l_SUBJ[1][.l_POS[1]:l_POS[1] <- (*l_SUBJ[1]+1 >= i)] } end procedure l_any(l1,l2,i,j) # Like any(c,s2,i,j) except that the string & cset arguments are # replaced by list arguments. l1 must be a list of one-element # lists, while l2 can be any list (l_SUBJ[1] by default). local sub_l /l1 & stop("l_any: Null first argument!") if type(l1) == "set" then l1 := sort(l1) /l2 := l_SUBJ[1] /i := \l_POS[1] | 1 /j := *l_SUBJ[1]+1 (i+1) > j & fail every sub_l := !l1 do { if not (type(sub_l) == "list", *sub_l = 1) then stop("l_any: Elements of l1 must be lists of length 1.") if x := l_match(sub_l,l2,i,i+1) then return x } end procedure l_match(l1,l2,i,j) # Analogous to match(s1,s2,i,j), except that s1 and s2 are lists, # and l_match returns the next position in l2 after that portion # (if any) which is structurally identical to l1. If a match is not # found, l_match fails. if /l1 then stop("l_match: Null first argument!") if type(l1) ~== "list" then stop("l_match: Call me with a list as the first arg.") /l2 := l_SUBJ[1] /i := \l_POS[1] | 1 /j := *l_SUBJ[1]+1 i + *l1 > j & fail if l_comp(l1,l2[i+:*l1]) then return i + *l1 end procedure l_comp(l1,l2) # List comparison routine basically taken from Griswold & Griswold # (1st ed.), p. 174. local i /l1 | /l2 & stop("l_comp: Null argument!") l1 === l2 & (return l2) if type(l1) == type(l2) == "list" then { *l1 ~= *l2 & fail every i := 1 to *l1 do l_comp(l1[i],l2[i]) | fail return l2 } end procedure l_find(l1,l2,i,j) local x /l1 & stop("l_find: Null first argument!") /l2 := l_SUBJ[1] /i := \l_POS[1] | 1 /j := *l_SUBJ[1]+1 every x := i to ((*l2+1) - *l1) do { if l_match(l1,l2,x,j) then suspend x } end procedure l_upto(l1,l2,i,j) local x /l1 & stop("l_upto: Null first argument!") if type(l1) == "set" then l1 := sort(l1) /l2 := l_SUBJ[1] /i := \l_POS[1] | 1 /j := *l_SUBJ[1]+1 every x := i to ((*l2+1) - *l1) do { if l_any(l1,l2,x,j) then suspend x } end procedure l_many(l1,l2,i,j) /l1 & stop("l_many: Null first argument!") if type(l1) == "set" then l1 := sort(l1) /l2 := l_SUBJ[1] /i := \l_POS[1] | 1 /j := *l_SUBJ[1]+1 l_scan(l2) while x := l_any(l1,,,j) do l_tab(x) end_l_scan(l2) return \x end procedure l_pos(i) local x if /l_POS | /l_SUBJ then stop("l_move: Call l_scan first.") if i <= 0 then x := 0 < (*l_SUBJ[1]+1 >= (*l_SUBJ[1]+1)+i) | fail else x := 0 < (*l_SUBJ[1]+1 >= i) | fail if x = l_POS[1] then return x else fail end procedure l_bal(l1,l2,l3,l,i,j) local l2_count, l3_count, x, position /l1 & stop("l_bal: Null first argument!") if type(l1) == "set" then l1 := sort(l1) # convert to a list if type(l2) == "set" then l1 := sort(l2) if type(l3) == "set" then l1 := sort(l3) /l := l_SUBJ[1] /i := \l_POS[1] | 1 /j := *l_SUBJ[1]+1 l2_count := l3_count := 0 every x := i to j-1 do { if l_any(l2, l, x, x+1) then { l2_count +:= 1 } if l_any(l3, l, x, x+1) then { l3_count +:= 1 } if l2_count = l3_count then { if l_any(l1,l,x,x+1) then suspend x } } end -Richard L. Goerwitz goer%sophist@uchicago.bitnet goer@sophist.uchicago.edu rutgers!oddjob!gide!sophist!goer
kwalker@CS.ARIZONA.EDU (Kenneth Walker) (08/04/90)
> Date: 3 Aug 90 16:19:17 GMT > From: ux1.cso.uiuc.edu!midway!sophist!goer@iuvax.cs.indiana.edu > I want very much to prototype this new scanning method using user- > defined control structures, but it's proving very hard. Emulating regular string scanning using user-defined control structures is also awkward. There is a way to emulate string scanning using nested procedure calls. It does not correctly handle breaking out of string scanning (but then neither did the Icon intepreter until V7!). The idea is to translate expr1 ? expr2 into escan(bscan(expr1), expr2) (Of course, expr2 can be a compound expression, if you want it to be.) This allows the setup routine, bscan, to comunicate with the termination routine, escan. It allows nesting of scanning along with full backtracking into and out of scanning. In addition, escan returns the value of the scanning expression. Saved scanning environments are stored in records record scan_env(subject, pos) procedure bscan(expr1) local outer_env outer_env := scan_env(&subject, &pos) &subject := expr1 &pos := 1 suspend outer_env # pass saved environment to escan &subject = outer_env.subject &pos = outer_env.pos fail # resume expression for expr1 end procedure escan(outer_env, expr2) local inner_env inner_env := scan_env(&subject, &pos) &subject = outer_env.subject &pos = outer_env.pos suspend expr2 outer_env.subject = &subject outer_env.pos = &pos &subject = inner_env.subject &pos = inner_env.pos fail # resume expression for expr2 end (disclaimer: these routines are based on code that works, but I typed them in without trying to translate or run them) This bscan-escan expression works well if you are using a preprocess (for example, one bassed on the variant translator system) to translate from a notation that looks like a control structure, but looks rather odd if you code it directly. However, it may give you some ideas. The only way I know of to solve the "break" problem is to use something more primative than suspend (break just wipes out suspened generators without considering that something more might need to be done), but you can only do this in the run-time system routines written in C, not within Icon itself. Ken Walker / Computer Science Dept / Univ of Arizona / Tucson, AZ 85721 +1 602 621-4324 kwalker@cs.arizona.edu {uunet|allegra|noao}!arizona!kwalker