[comp.lang.perl] How to sort on right most column

rock@warp.Eng.Sun.COM (Bill Petro) (05/15/91)

How could I do this using either the sort command, or perl or awk?

I have the following data in this format - I want to sort it on the
right most column.  The columns are not tab delimited, and some of the
columns have arbitrary numbers of words (specifically the third column
from the left, column c).  Sort assumes that you start numbering on the
left, and count from 0.  How would you start with the right most
column, column e?

  0    1   2                              3                     4

  a    b   c                              c                     e

 FOO  BAR ACE CORPORATION                SUNNYVALE              2.00
 FOO  BAR ACER COMPUTED COMPANY          MILPITAS              20.00
 FOO  BAR APOLLO COMPUTER, INC.          CHELMSFORD             1.00
 FOO  BAR APPLE COMPUTER, INC.           CUPERTINO              8.00
 FOO  BAR BOEING                         TUKWILA               53.00
 FOO  BAR BOEING COMPUTER SERVICES       EDDYSTONE              2.00
 FOO  BAR CITIBANK N. A.                 ANDOVER                4.00
 FOO  BAR CITIBANK NORTH AMERICA         LONG ISLAND CITY      26.00


Thanks!


--
     Bill Petro  {decwrl,hplabs,ucbvax}!sun!Eng!rock
"UNIX for the sake of the kingdom of heaven"  Matthew 19:12

merlyn@iwarp.intel.com (Randal L. Schwartz) (05/15/91)

In article <13320@exodus.Eng.Sun.COM>, rock@warp (Bill Petro) writes:
| I have the following data in this format - I want to sort it on the
| right most column.  The columns are not tab delimited, and some of the
| columns have arbitrary numbers of words (specifically the third column
| from the left, column c).  Sort assumes that you start numbering on the
| left, and count from 0.  How would you start with the right most
| column, column e?
| 
|   0    1   2                              3                     4
| 
|   a    b   c                              c                     e
| 
|  FOO  BAR ACE CORPORATION                SUNNYVALE              2.00
|  FOO  BAR ACER COMPUTED COMPANY          MILPITAS              20.00
|  FOO  BAR APOLLO COMPUTER, INC.          CHELMSFORD             1.00
|  FOO  BAR APPLE COMPUTER, INC.           CUPERTINO              8.00
|  FOO  BAR BOEING                         TUKWILA               53.00
|  FOO  BAR BOEING COMPUTER SERVICES       EDDYSTONE              2.00
|  FOO  BAR CITIBANK N. A.                 ANDOVER                4.00
|  FOO  BAR CITIBANK NORTH AMERICA         LONG ISLAND CITY      26.00

Well, here's a terribly inefficient one, good enough for small amounts
of data:

##################################################
#!/usr/bin/perl

sub bylastcol {
	@a = split(/\s+/, $a);
	@b = split(/\s+/, $b);
	pop(@a) <=> pop(@b);
}

print sort bylastcol <>;
##################################################

This is inefficient because the "last column" is computed and
recomputed over and over again on each compare.  A better way would be
to compute it once and cache it:

##################################################
#!/usr/bin/perl

sub once {
	return $n if defined($n = $once{$_[0]});
	@a = split(/\s+/, $_[0]);
	$once{$_[0]} = pop(@a);
}


sub bylastcol {
	&once($a) <=> once($b);
}

print sort bylastcol <>;
##################################################

An even better way requires the cooperation of the calling routine to
make a normal array, rather than an associative array, as in:

##################################################
#!/usr/bin/perl

sub byaux {
	$aux[$a] <=> $aux[$b];
}

@data = <>;
for (@data) {
	@a = split(/\s+/, $_);
	push(@aux, pop(@a));
}

print @data[sort byaux $[..$#data];
##################################################

print "Just another Perl hacker,"
-- 
/=Randal L. Schwartz, Stonehenge Consulting Services (503)777-0095 ==========\
| on contract to Intel's iWarp project, Beaverton, Oregon, USA, Sol III      |
| merlyn@iwarp.intel.com ...!any-MX-mailer-like-uunet!iwarp.intel.com!merlyn |
\=Cute Quote: "Intel: putting the 'backward' in 'backward compatible'..."====/

bjaspan@ATHENA.MIT.EDU (Barr3y Jaspan) (05/16/91)

|> I have the following data in this format - I want to sort it on the
|> right most column.
|> 
|>   0    1   2                              3                     4
|> 
|>   a    b   c                              c                     e
|> 
|>  FOO  BAR ACE CORPORATION                SUNNYVALE              2.00
|>  FOO  BAR ACER COMPUTED COMPANY          MILPITAS              20.00

(Introduction: I'm not sure what the first two lines ("0  1   2 ...")
are for, so I'm ignoring them. :-)

Well, since you know the sort key is the last number on each line, you
could do something like this (untested):

while (<>) {
	/([\d.]+)$/;
	$lines{$1} = $_;
}

for (sort numerically keys %lines) {
	print $lines{$_};
}

sub numerically {
	$a <=> $b;
}

The idea is to use the number at the end of a line as a key in an
associative array, storing each entire line of text keyed on its final
number.  (You could do this more "efficiently" with a normal array if
you knew (1) that all your numbers were integers and (2) that all the
numbers were "small".)  Then you just sort the keys and print the
respective lines of text, in order.

(You could also split the entire line on whitespace and use the last
element in the returned array as your key; I suspect (but I'm not
sure) the first way is faster.)

Barr3y Jaspan, bjaspan@mit.edu

arielf@tasu8c.UUCP (Ariel Faigon) (05/16/91)

+--- In the referenced article, Barr3y Jaspan wrote:
||> I have the following data in this format - I want to sort it on the
||> right most column.
||> 
||>   0    1   2                              3                     4
||> 
||>   a    b   c                              c                     e
||> 
||>  FOO  BAR ACE CORPORATION                SUNNYVALE              2.00
||>  FOO  BAR ACER COMPUTED COMPANY          MILPITAS              20.00
|
| ... ]
|Well, since you know the sort key is the last number on each line, you
|could do something like this (untested):
|
|while (<>) {
|	/([\d.]+)$/;
|	$lines{$1} = $_;
|}
|
|for (sort numerically keys %lines) {
|	print $lines{$_};
|}
|
|sub numerically {
|	$a <=> $b;
|}
|
|The idea is to use the number at the end of a line as a key in an
|associative array, storing each entire line of text keyed on its final
|number. [more deleted ...]
+------

Barr3y's solution is of course most straight forward.
One problem remains though, since associative array keys are unique,
lines with the same key (same last column) will overwrite each other.
The solution is to concatenate lines with the same key (the '\n' is
left intact thanks to Perl). i.e. to replace the line:
>       $lines{$1} = $_;
With
	$lines{$1} .= $_;

This should do what the original poster wanted.
Peace, Ariel


Ariel Faigon
National Semiconductor Corp. (NSTA Design Center)
6 Maskit St.  P.O.B. 3007, Herzlia 46104, Israel   Tel. +972 52-522272
arielf@taux01.nsc.com