[comp.lang.icon] fits posting; new l_scan

goer@sophist.uucp (Richard Goerwitz) (08/03/90)

I'm having fits posting here because of mail reorganizations at
the U of Chicago.  I assume nothing I've posted in the last week
has gotten through.

Anyway, I've just been trying to post a newer version of the list
scanning routines I posted a couple of weeks back.  This new ver-
sion fixes a couple of bugs and provides the "missing" l_bal()
function.

I want very much to prototype this new scanning method using user-
defined control structures, but it's proving very hard.  Let's say
you want to write -

    repeat {
      do something
      l_scan(l) {
          expresion1,
          expression2 & break
      }
    }

Okay, the l_scan(l) routine sets pos and subject for the list, and
then returns a procedure geared for {}-style invocation.  Expres-
sion1 and expression2 & break are the arguments to this procedure.
The trouble is that, even if we could say "create break" (which we
can't), we end up with a break in an invalid context.

The user-defined control structures seem not to allow real Iconish
control.  Is this the case?  If so, no matter.  If not (i.e. if I
am overlooking something), could someone kindly enlighten me?  Alter-
natively, if someone has an idea about how to prototype new control
structures, could they perhaps post a followup?

I dearly hope that our news software & mail router doesn't barf on
this one.

Anyway, here's the code:



############################################################################
#
#	Name:	lscan.icn
#
#	Title:	Quasi ? scanning routines for lists.
#
#	Author:	Richard L. Goerwitz
#
#	Version: 1.16
#
############################################################################
#
#  Copyright (c) 1990, Richard L. Goerwitz, III
#
#  This software is intended for free and unrestricted distribution.
#  I place only two conditions on its use:  1) That you clearly mark
#  any additions or changes you make to the source code, and 2) that
#  you do not delete this message from it.  In order to protect
#  myself from spurious litigation, it must also be stated here that,
#  because this is free software, I, Richard Goerwitz, make no claim
#  about the applicability or fitness of this software for any
#  purpose, and disclaim any responsibility for any damages that
#  might be incurred in conjunction with its use.
#
############################################################################
#
#  PURPOSE: String scanning is terrific, but often I am forced to
#  tokenize and work with lists.  So as to make operations on these
#  lists as close to corresponding string operations as possible, I've
#  implemented a series of list analogues to any(), bal(), find(),
#  many(), match(), move(), pos(), tab(), and upto().  Their names are
#  just like corresponding string functions, except with a prepended
#  "l_" (e.g. l_any()).  Functionally, the list routines parallel the
#  string ones closely, except that in place of strings, l_find and
#  l_match accept lists as their first argument.  L_any(), l_many(),
#  and l_upto() all take either sets or lists of lists (e.g.
#  l_tab(l_upto([["a"],["b"],["j","u","n","k"]])).  Note that l_bal(),
#  unlike the builtin bal(), has no defaults for the first four
#  arguments.  This just seemed appropriate, given that no precise
#  list analogue to &cset, etc. occurs.
#
#  Since Icon does not permit me to define new control structures via
#  infix operators, list scanning must be initialized via a procedure,
#  namely l_scan(l), where l is the list to be scanned.  Scanning is
#  ended by calling end_l_scan(l).  Nesting is permitted, but you'd
#  better make sure your indentation is correct.  Since l_scan() does
#  not represent a real control structure, you must call end_l_scan()
#  before "breaking" from an l_scan.  Otherwise l_POS and l_SUBJ will
#  not be handled properly.
#
#  If someone can think of a way to do this more elegantly let me
#  know.  Personally, I'd like p{a,b,c} to be extended to allow p{a S
#  b S c} (where S ::= ; | \n).  I could then do something cute, like
#  have l_scan initialize l_POS and l_SUBJ for list l, and then return
#  a control procedure geared for {}-style invocation:
#
#      l_scan(l) {
#          a
#          b
#          c
#      }
#
#  I just think this *looks* like a control structure (much more so
#  than the kludge I give below).  I'd just set up the control pro-
#  cedure to handle variable-length arguments.
#
#  I envision l_scan being used (and in fact use it myself) to parse
#  trees and compare complicated list structures.  One big time-saver
#  in parsing natural languages, for instance, is to store partial
#  trees.  The l_scan routines enable you to take your list structures
#  and say, "Give me that portion of list 2 which corresponds
#  structurally to list 1."  Or you can say, "Tell me if the first
#  portion of x list corresponds structurally to y."  In effect, the
#  structure of the tree can be checked without necessitating a
#  reparse.
#
#  BUGS:  Can you say slow?  I thought you could.
#
############################################################################
#
#  Here's a trivial example of how one might utilize the lscan routines:
#
#  procedure main()
#
#      l := ["h","e","l","l","o"," ","t","t","t","h","e","r","e"]
#
#      l_scan(l)
#          hello_list := l_tab(l_match(["h","e","l","l","o"]))
#          every writes(!hello_list)
#          write()
#
#	   l_scan(l_tab(0))
#	       l_tab(l_many([[" "],["t"]]) - 1)
#              every writes(!l_tab(0))
#	       write()
#          end_l_scan()
#  
#      end_l_scan()
#  
#  end
#
#  The above program simply writes "hello" and "there" on successive
#  lines to the standard output.
#
#  PITFALLS: In general, note that we are comparing lists here instead
#  of strings, so l_find("h", l), for instance, will yield an error
#  message (use l_find(["h"], l) instead).  The point at which I
#  expect this nuance will be most confusing will be in cases where
#  one is looking for lists within lists.  Suppose we have a list,
#
#      l1 := ["junk",[["hello"]," ",["there"]],"!","m","o","r","e","junk"]
#
#  and suppose, moreover, that we wish to find the position in l1 at
#  which the list
#
#      [["hello"]," ",["there"]]
#
#  occurs.  If, say, we assign [["hello"]," ",["there"]] to the
#  variable l2, then our l_find() expression will need to look like
#
#      l_scan(l1)
#          l_tab(l_find([l2]))
#      etc.
#
#  The reason l2 must be enclosed within a list is that we are
#  searching for a single element within l1, not a series of three
#  elements.  Just as l_find("h") would not work correctly above, so
#  l_find(l2) will not work here (although in this latter case, no
#  error message is generated; the expression merely fails when it
#  finds no sequence of elements in l1 corresponding to the sequence
#  of elements occurring in l2).
#
############################################################################

global l_POS
global l_SUBJ



procedure l_scan(l)

    if /l_POS then {
	l_POS  := []
	l_SUBJ := []
    }

    push(l_POS, 1)
    push(l_SUBJ, l)
    return

end



procedure end_l_scan()

    every pop(l_POS|l_SUBJ)
    if *l_POS = 0 then {
	l_POS  := &null
	l_SUBJ := &null
    }
    return

end



procedure l_move(i)

    /i & stop("l_move:  Null argument.")
    if /l_POS | /l_SUBJ
    then stop("l_move:  Call l_scan first.")
    suspend l_SUBJ[1][.l_POS[1]:l_POS[1] <- (0 < (*l_SUBJ[1]+1 >= l_POS[1]+i))]

end



procedure l_tab(i)

    /i & stop("l_tab:  Null argument.")
    if /l_POS | /l_SUBJ
    then stop("l_tab:  Call l_scan first.")

    if i = 0
    then suspend l_SUBJ[1][.l_POS[1]:l_POS[1] <- *l_SUBJ[1]+1]
    else {
	if i < 0
	then suspend l_SUBJ[1][.l_POS[1]:l_POS[1] <- (0 < (*l_SUBJ[1]+1) + i)]
	else suspend l_SUBJ[1][.l_POS[1]:l_POS[1] <- (*l_SUBJ[1]+1 >= i)]
    }

end



procedure l_any(l1,l2,i,j)

    # Like any(c,s2,i,j) except that the string & cset arguments are
    # replaced by list arguments.  l1 must be a list of one-element
    # lists, while l2 can be any list (l_SUBJ[1] by default).

    local sub_l

    /l1 & stop("l_any:  Null first argument!")
    if type(l1) == "set" then l1 := sort(l1)

    /l2 := l_SUBJ[1]
    /i  := \l_POS[1] | 1
    /j  := *l_SUBJ[1]+1

    (i+1) > j & fail
    every sub_l := !l1 do {
	if not (type(sub_l) == "list", *sub_l = 1) then
	    stop("l_any:  Elements of l1 must be lists of length 1.")
	if x := l_match(sub_l,l2,i,i+1)
	then return x
    }
    
end



procedure l_match(l1,l2,i,j)

    # Analogous to match(s1,s2,i,j), except that s1 and s2 are lists,
    # and l_match returns the next position in l2 after that portion
    # (if any) which is structurally identical to l1.  If a match is not
    # found, l_match fails.

    if /l1
    then stop("l_match:  Null first argument!")
    if type(l1) ~== "list"
    then stop("l_match:  Call me with a list as the first arg.")

    /l2 := l_SUBJ[1]
    /i  := \l_POS[1] | 1
    /j  := *l_SUBJ[1]+1

    i + *l1 > j & fail
    if l_comp(l1,l2[i+:*l1])
    then return i + *l1

end

    

procedure l_comp(l1,l2)

    # List comparison routine basically taken from Griswold & Griswold
    # (1st ed.), p. 174.

    local i

    /l1 | /l2 & stop("l_comp:  Null argument!")
    l1 === l2 & (return l2)

    if type(l1) == type(l2) == "list" then {
	*l1 ~= *l2 & fail
	every i := 1 to *l1
	do l_comp(l1[i],l2[i]) | fail
	return l2
    }

end



procedure l_find(l1,l2,i,j)

    local x

    /l1 & stop("l_find:  Null first argument!")

    /l2 := l_SUBJ[1]
    /i  := \l_POS[1] | 1
    /j  := *l_SUBJ[1]+1

    every x := i to ((*l2+1) - *l1) do {
	if l_match(l1,l2,x,j)
	then suspend x
    }
    
end



procedure l_upto(l1,l2,i,j)

    local x

    /l1 & stop("l_upto:  Null first argument!")
    if type(l1) == "set" then l1 := sort(l1)

    /l2 := l_SUBJ[1]
    /i  := \l_POS[1] | 1
    /j  := *l_SUBJ[1]+1

    every x := i to ((*l2+1) - *l1) do {
	if l_any(l1,l2,x,j)
	then suspend x
    }
    
end



procedure l_many(l1,l2,i,j)

    /l1 & stop("l_many:  Null first argument!")
    if type(l1) == "set" then l1 := sort(l1)

    /l2 := l_SUBJ[1]
    /i  := \l_POS[1] | 1
    /j  := *l_SUBJ[1]+1

    l_scan(l2)
    while x := l_any(l1,,,j)
    do l_tab(x)
    end_l_scan(l2)

    return \x

end



procedure l_pos(i)

    local x

    if /l_POS | /l_SUBJ
    then stop("l_move:  Call l_scan first.")

    if i <= 0
    then x := 0 < (*l_SUBJ[1]+1 >= (*l_SUBJ[1]+1)+i) | fail
    else x := 0 < (*l_SUBJ[1]+1 >= i) | fail

    if x = l_POS[1]
    then return x
    else fail

end



procedure l_bal(l1,l2,l3,l,i,j)

    local l2_count, l3_count, x, position

    /l1 & stop("l_bal:  Null first argument!")
    if type(l1) == "set" then l1 := sort(l1)  # convert to a list
    if type(l2) == "set" then l1 := sort(l2)
    if type(l3) == "set" then l1 := sort(l3)

    /l := l_SUBJ[1]
    /i  := \l_POS[1] | 1
    /j  := *l_SUBJ[1]+1

    l2_count := l3_count := 0

    every x := i to j-1 do {

	if l_any(l2, l, x, x+1) then {
	    l2_count +:= 1
	}
	if l_any(l3, l, x, x+1) then {
	    l3_count +:= 1
	}
	if l2_count = l3_count then {
	    if l_any(l1,l,x,x+1)
	    then suspend x
	}
    }

end

   -Richard L. Goerwitz              goer%sophist@uchicago.bitnet
   goer@sophist.uchicago.edu         rutgers!oddjob!gide!sophist!goer

kwalker@CS.ARIZONA.EDU (Kenneth Walker) (08/04/90)

> Date: 3 Aug 90 16:19:17 GMT
> From: ux1.cso.uiuc.edu!midway!sophist!goer@iuvax.cs.indiana.edu

> I want very much to prototype this new scanning method using user-
> defined control structures, but it's proving very hard.

Emulating regular string scanning using user-defined control structures
is also awkward. There is a way to emulate string scanning using nested
procedure calls. It does not correctly handle breaking out of string
scanning (but then neither did the Icon intepreter until V7!).

The idea is to translate

   expr1 ? expr2

into

   escan(bscan(expr1), expr2)

(Of course, expr2 can be a compound expression, if you want it to be.)
This allows the setup routine, bscan, to comunicate with the termination
routine, escan. It allows nesting of scanning along with full backtracking
into and out of scanning. In addition, escan returns the value of the
scanning expression.

Saved scanning environments are stored in records

   record scan_env(subject, pos)

   procedure bscan(expr1)
      local outer_env

      outer_env := scan_env(&subject, &pos)
      &subject := expr1
      &pos := 1
      suspend outer_env   # pass saved environment to escan
      &subject = outer_env.subject
      &pos = outer_env.pos
      fail  # resume expression for expr1
   end

   procedure escan(outer_env, expr2)
      local inner_env

      inner_env := scan_env(&subject, &pos)
      &subject = outer_env.subject
      &pos = outer_env.pos
      suspend expr2
      outer_env.subject = &subject
      outer_env.pos = &pos
      &subject = inner_env.subject
      &pos = inner_env.pos
      fail   # resume expression for expr2
   end

(disclaimer: these routines are based on code that works, but I
typed them in without trying to translate or run them)

This bscan-escan expression works well if you are using a preprocess
(for example, one bassed on the variant translator system) to translate
from a notation that looks like a control structure, but looks
rather odd if you code it directly. However, it may give you some
ideas.

The only way I know of to solve the "break" problem is to use something
more primative than suspend (break just wipes out suspened generators
without considering that something more might need to be done), but you
can only do this in the run-time system routines written in C, not within
Icon itself.

  Ken Walker / Computer Science Dept / Univ of Arizona / Tucson, AZ 85721
  +1 602 621-4324  kwalker@cs.arizona.edu {uunet|allegra|noao}!arizona!kwalker