[comp.lang.icon] complicated sorting

day@grand.UUCP (Dave Yost) (01/01/88)

I need to sort a list of records
according to complicated sort criteria,
The standard icon sort() routine is
not making this task easy.  What I
need is something more like the C
library qsort routine that lets me
provide my own comparison routine.

Is there any hope of my finding something
like this in icon, or am I missing some
elegant way to do this with the existing
sort() routine?

  --dave yost

kwalker@ARIZONA.EDU ("Kenneth Walker") (01/04/88)

	Date: 1 Jan 88 05:33:51 GMT
	From: grand!day@uunet.uu.net  (Dave Yost)
	
	I need to sort a list of records
	according to complicated sort criteria,
	The standard icon sort() routine is
	not making this task easy.  What I
	need is something more like the C
	library qsort routine that lets me
	provide my own comparison routine.
	
	Is there any hope of my finding something
	like this in icon, or am I missing some
	elegant way to do this with the existing
	sort() routine?
	
The comparison routine used by the Icon sort function is built
into the run-time system and cannot be overriden without
modifying Icon.

Sometimes a comlicated sort criterion can be handled by constructing
artifical keys for records such that a key sorts the corrosponding
record where it needs to go in the sequence. The records can then
be put in a table using the keys and the table can be sorted to
produce the desired sequence of records. The artifical keys would
typically be strings. If such keys are not easy to produce, your
best bet is probably to write a sort procedure in Icon specific to
your application.

day@grand.UUCP (01/09/88)

    Date: Sun, 3 Jan 88 13:38:47 MST
    From: "Kenneth Walker" <arizona.edu!kwalker@uunet.UUCP>
    In-Reply-To: <397@grand.UUCP>
    To: day@grand.UUCP, icon-group@arizona.edu
    Subject: Re:  complicated sorting

    The comparison routine used by the Icon sort function is built
    into the run-time system and cannot be overriden without
    modifying Icon.

Right.

    Sometimes a comlicated sort criterion can be handled by constructing
    artifical keys for records such that a key sorts the corrosponding
    record where it needs to go in the sequence. The records can then
    be put in a table using the keys and the table can be sorted to
    produce the desired sequence of records. The artifical keys would
    typically be strings.

This is exactly what I am doing, and it's cumbersome.

    If such keys are not easy to produce, your best bet is probably
    to write a sort procedure in Icon specific to your application.

I believe the best solution is a more powerful sort function
to which the user supplies the comparison procedure, like qsort
in the C library.

The problem I am solving in icon (thank God and you people I'm
not trying to do it in C!) is the preparation of a book index.
The index is a list of entries consisting of a list of subentries
consisting of a list of page numbers.

A built-in function a la the C library qsort, would help.
It might look like this:

sortx(a, p)

    Sortlist is a stable sort which sorts list a and
    produces a new, sorted list.  Procedure p is called each
    time sortx needs to compare two list entries.  It takes
    two arguments and returns -1, 0, or 1 when the first
    argument is less than, equal to, or greater than the
    second argument, respectively.  If p is omitted, the
    standard icon sort order is used.

Since I wrote my first message on this subject, I now see
that even this proposed sortx doesn't really cut it for my
present problem.  It turns out to be the trivial case of what
I really need:

sortlist(a, p, depth)

    Sortlist is a stable sort which sorts list a and
    produces a new, sorted tree of lists of the specified
    depth.  In the simplest case, when the depth argument is
    1 or not specified, sortlist returns a simple sorted
    list.  If depth is 2, then a list of lists is returned,
    and so on for greater values of depth.

    Procedure p is called each time sortlist needs to
    compare two list entries.  If p is not specified, the
    standard icon sort order is used.

    When depth is not specified or is 1, procedure p must be
    in this form:

	procedure p(x1, x2)
	   return relation_result
	end

    where relation_result return value is -1, 0, or 1 when
    x1 is less than, equal to, or greater than x2,
    respectively.

    When depth is specified and is > 1, procedure p must be
    in this form:

	record sortcompare(relation, depth)

	procedure p(x1, x2)
	   return sortcompare(relation_result, depth_result)
	end

    The relation_result return value is -1, 0, or 1 when x1
    is less than, equal to, or greater than x2, respectively.

    The depth_result return value specifies how deep p had
    to go in making the comparison before deciding on the
    value for the relation field.  For efficiency, it is
    advisable to write procedure p so that it never bothers
    to compare to a depth greater than needed by its caller.

    For example, the list [ "aa", "ba", "bb", "c" ] sorted to
    a depth of 2 with the aid of a suitable comparison routine
    could yield: [ [ "aa" ], [ "ba", "bb" ], [ "c" ] ]

Such a function would be perfect for my application, and I
bet it would be a useful and powerful tool for other sorting
problems.

I don't have an algorithm to implement this.
I wish I did.  Any takers?

To be sure what I'm getting at is clear, here's an example:

procedure
cmp(x1, x2)
    if x2[1] ~== x1[1] then return sortcompare(x2[1] - x2[1], 1)
    if x2[2] ~== x1[2] then return sortcompare(x2[2] - x2[2], 2)
    return sortcompare(x2[3] - x2[3], 3)
end

Input:  ["aaa", "baa", "bba", "caa", "cab" ]

Call:   sortlist(Input, p, 1)
Output: [ "aaa", "baa", "bba", "caa", "cab" ]

Call:   sortlist(Input, cmp, 2)
Output: [
	  [ "aaa" ],
	  [ "baa", "bba" ],
	  [ "caa", "cab" ]
	]

Call:   sortlist(Input, cmp, 3)
Output: [
	  [
	    [ "aaa" ]
	  ],
	  [
	    [ "baa" ],
	    [ "bba" ]
	  ],
	  [
	    [ "caa", "cab" ]
	  ]
	]

Call:   sortlist(Input, cmp, 4)
Output: [
	  [
	    [
	      [ "aaa" ]
	    ]
	  ],
	  [
	    [
	      [ "baa" ]
	    ],
	    [
	      [ "bba" ]
	    ]
	  ],
	  [
	    [
	      [ "caa" ],
	      [ "cab" ]
	    ]
	  ]
	]

etc.

 --dave yost

kwalker@ARIZONA.EDU ("Kenneth Walker") (01/13/88)

	From: grand!day@uunet.UU.NET
	Date: Fri, 08 Jan 88 17:33:06 PST

        ...
	
	I believe the best solution is a more powerful sort function
	to which the user supplies the comparison procedure, like qsort
	in the C library.

This suggestion sounds to me like something worth putting on the list
of possible future enhancements to Icon. There is an actual list.
Inclusion in the list doesn't insure a feature will every get put in
the language, but it does mean that it will get consideration when (and if)
the next round of enhancements are implemented.
	
	Since I wrote my first message on this subject, I now see
	that even this proposed sortx doesn't really cut it for my
	present problem.  It turns out to be the trivial case of what
	I really need:
	
        [details omitted - essentially a routine to sort on multiple
         keys producing a tree structure]

I can beleive that such a routine is usefull occationally, but its need
doesn't seem to me to be wide spread enough to burden the language
with such a complex feature.

When I need to sort on multiple keys, I sometimes use multi-level tables.
Perhaps a modification of the following routine could be used for your
purposes.

#
# Sort records on 3 keys. The result is a list of lists of lists (that is,
#  a 3 level tree - one level for each key). The intermediate data structure
#  is a table of tables of tables; each level of tables is keyed on the
#  corrosponding record key.
#
record info(key1, key2, key3)

procedure main()
   local s, w_sp
   local rec
   local tbl, x
   local lst

   #
   # 1st level table
   #
   tbl := table()

   #
   # read records and store in intermediate table structure
   #
   while s := read() do {
      s ? rec := info(tab(many(&lcase)),
                      tab(upto(&lcase)) & tab(many(&lcase)),
                      tab(upto(&lcase)) & tab(many(&lcase)))
      #
      # put the record in the correct table, making sure the required 2nd
      # and 3rd level tables exist
      #
      /tbl[rec.key1] := table()
      x := tbl[rec.key1]
      /x[rec.key2] := table()
      x := x[rec.key2]
      x[rec.key3] := rec
      }

   #
   # convert each level of table into a sorted list
   #
   lst := sort_tbl(tbl)

   do_whatever(lst)
end

procedure sort_tbl(tbl)
   local l1, l2

   l1 := sort(tbl, 3)

   #
   # remove keys from list, sorting entries if they are tables
   #
   l2 := list()
   get(l1)
   if type(l1[1]) == "table" then
      while put(l2, sort_tbl(get(l1))) do
         get(l1)
   else
      while put(l2, get(l1)) do
         get(l1)
   return l2
end