[comp.sources.d] Perl help requested

jimmy@pyramid.pyramid.com (Jimmy Aitken) (05/09/89)

Since perl was posted to comp.sources*, I've posted this request here.
I couldn't think of a more appropriate newsgroup so here goes.

I've got a perl program that requires an array to be sorted ignoring
case and non alpha-numeric characters. i.e. The equivalent of
'sort -fd'. Currently I do it by:

@sorted=sort fieldsort @list
	|
	|
sub fieldsort {
    local($x=$a, $y=$b);
    $x =~ tr/A-Z/a-z/;
    $x =~ tr/ -@{-}//;
    $y =~ tr/A-Z/a-z/;
    $y =~ tr/ -@{-}//;
    $x lt $y ? -1 : $x gt $y ? 1 : 0;
}

To my mind this is ugly and uses about twice the user time comapred to
a simple 'sort @list.'  Can anyone come up with a cleaner/faster way
of doing this operation?  Thanks for any help.

jimmy
-- 
      -m------   Jimmy Aitken
    ---mmm-----  On Loan from: Pyramid Technology Ltd., U.K.
  -----mmmmm---  To:           Pyramid Technology Corp, U.S.A
-------mmmmmmm-  {uunet, decwrl}!pyramid!jimmy

jgreely@previous.cis.ohio-state.edu (J Greely) (05/09/89)

In article <69461@pyramid.pyramid.com> jimmy@pyramid.pyramid.com
 (Jimmy Aitken) writes:
>I've got a perl program that requires an array to be sorted ignoring
>case and non alpha-numeric characters. i.e. The equivalent of
>'sort -fd'.

The *fastest* way may be to actually use "sort -fd", but that violates
the kitchen sink principle of perl, so here goes.

>sub fieldsort {
>    local($x=$a, $y=$b);
>    $x =~ tr/A-Z/a-z/;
>    $x =~ tr/ -@{-}//;
>    $y =~ tr/A-Z/a-z/;
>    $y =~ tr/ -@{-}//;
>    $x lt $y ? -1 : $x gt $y ? 1 : 0;
>}

The biggest problem with this is that you're munging both strings for
each comparison, which is where your time is wasted.  One
ugly-but-faster way to do the job would be to pre-chew the array, like
this:

# create sorting array, appending position in original array
# (so we can later move the real array into matching order)
#
@sort_list = ();
for ($line=0;$line <= $#list; $line++) {
  $temp = $list[$line];
  $temp =~ tr/A-Z/a-z/;
  $temp =~ tr/ -@{-}//;
  $temp .= "A$line"; # choose something not in any string.  We *know*
                     # there's no capital A's anymore.  You might prefer
                     # something like DEL, that would sort after anything
  push(@sort_list,$temp);
}
@sort_list = sort(@sort_list);
#
# loop through the sorted copy, replacing the hacked version with
# the original
#
for ($line=0;$line <= $#sort_list;$line++) {
  ($temp,$num) = split(/A/,$sort_list[$line]);
  $sort_list[$line] = $list[$num];
}
# sort_list now contains the original array, correctly sorted

This isn't perfect, and it *is* ugly, but it basically works.  Having
typed this, I would probably just break down and call sort, but it's
a nice exercise.

-=-
J Greely (jgreely@cis.ohio-state.edu; osu-cis!jgreely)