[comp.lang.perl] default associative array sort function

bjaspan@athena.mit.edu (Barr3y Jaspan) (05/14/91)

I frequently have associative arrays that map strings to integers;
say, $users{$username} = # of pages printed on a printer.  Sorting
such an associative array requires writing a special-purpose function

sub sort_users {
	$users{$a} <=> $users{$b};
}

and then call sort sort_users %users.  This works fine but is a pain;
I have to create a separate sort function for each assoc array, when
really I shouldn't have to create one at all (this is a frequent
enough operation, I think...)

How about adding a default sort-assoc-array-{numerically,
lexigraphically} function, so that (sort MAGIC_FUNC %users) would
work?  Even better would be if the sort function recognized that it
was dealing with an assoc array, figured out whether the values were
numbers or strings, and choose the right value, so that (sort %users)
would do the right thing.  ("Figuring out whether the values are
numbers or strings" is, of course, not necessarily trivial.  You could
implement some specific heuristic and document it, given the
programmer the choice of overriding the heuristic when necessary.)

print((sort (' another', ' Just', ' perl ', 'hacker,')), "\n");

Barr3y

tchrist@convex.COM (Tom Christiansen) (05/14/91)

From the keyboard of bjaspan@athena.mit.edu:
:I frequently have associative arrays that map strings to integers;
:say, $users{$username} = # of pages printed on a printer.  Sorting
:such an associative array requires writing a special-purpose function
:
:sub sort_users {
:	$users{$a} <=> $users{$b};
:}
:
:and then call sort sort_users %users.  This works fine but is a pain;
:I have to create a separate sort function for each assoc array, when
:really I shouldn't have to create one at all (this is a frequent
:enough operation, I think...)

Well, not really.... 

:How about adding a default sort-assoc-array-{numerically,
:lexigraphically} function, so that (sort MAGIC_FUNC %users) would
:work?  Even better would be if the sort function recognized that it
:was dealing with an assoc array, figured out whether the values were
:numbers or strings, and choose the right value, so that (sort %users)
:would do the right thing.  ("Figuring out whether the values are
:numbers or strings" is, of course, not necessarily trivial.  You could
:implement some specific heuristic and document it, given the
:programmer the choice of overriding the heuristic when necessary.)

Since perl already makes you declare your numeric sorts, I don't expect it
to go automatically determining a numeric versus lexical collating
sequence.  It makes me a little nervous to try, and you'd never make the
Europeans happy no matter collating sequence you chose. :-)

I have in the past suggested that sort functions be allowed to be
anonymous functions declared in-line, like this:

    @x = sort { $b <=> $a; } @y;

or 

    @keys = sort { $ary{$a} cmp $ary{$b}; } keys %ary;


Larry has expressed reticence to this idea in the past for several reasons:
difficulty of implementation, possible obscurity for long sort functions,
and finally because he thinks it ok to make people name their abstractions.

Still, I do like my idea anyway.  Would its implementation satisfy your
complaints?  Maybe we could start a petition. :-)


New language features aside, you don't have to declare a separate function
for each object you need to sort.  Here's a generic assoc array sorter:

    sub assoc_sort {
	local(*ary) = @_;
	sub _assoc_sort { $ary{$a} cmp $ary{$b}; }
	sort _assoc_sort keys %ary;
    } 

With invocation like this: 

    for $elt (&assoc_sort(*users)) {
	print "user ", $elt, " is ", $users{$elt}, "\n";
    } 

Here's another idea:

    $sortsub = &gensort('$user{$b} <=> $users{$a}');
    for $elt (sort $sortsub keys %user) {
	print "user ", $elt, " is ", $users{$elt}, "\n";
    } 

where &gensort looks like this:

    $gensort = "gensort000000000000000";
    sub gensort {
	local($code) = @_;
	eval 'sub '.++$gensort.'{'.$code.';}';
	die "compilation error: $code: $@" if $@;
	$gensort;
    } 

You could even combine these two ideas to pass in a reference to the
array to make it more generic.  

I tend to use the first method.

Someday if you'd like, I'll show you my routines to generically and
efficiently sort N-key multi-dimensional arrays in arbitrary key orders.
I could easily write a book (or at least a chapter) on sorting.  I guess
this may argue that it means sorting is too hard. :-(

--tom
--
Tom Christiansen		tchrist@convex.com	convex!tchrist
		"So much mail, so little time." 

rbj@uunet.uu.net (Root Boy Jim) (05/16/91)

tchrist@convex.COM (Tom Christiansen) writes:

?I have in the past suggested that sort functions be allowed to be
?anonymous functions declared in-line, like this:
?
?    @x = sort { $b <=> $a; } @y;
?
?Larry has expressed reticence to this idea in the past for several reasons:
?difficulty of implementation, possible obscurity for long sort functions,
?and finally because he thinks it ok to make people name their abstractions.

I can see him now: "Nah, too much like LISP" :-)

?Someday if you'd like, I'll show you my routines ...

I show you mine if you show me yours :-)
-- 
		[rbj@uunet 1] stty sane
		unknown mode: sane