[comp.unix.shell] How to sort on right most column

rock@warp.Eng.Sun.COM (Bill Petro) (05/15/91)

How could I do this using either the sort command, or perl or awk?

I have the following data in this format - I want to sort it on the
right most column.  The columns are not tab delimited, and some of the
columns have arbitrary numbers of words (specifically the third column
from the left, column c).  Sort assumes that you start numbering on the
left, and count from 0.  How would you start with the right most
column, column e?

  0    1   2                              3                     4

  a    b   c                              c                     e

 FOO  BAR ACE CORPORATION                SUNNYVALE              2.00
 FOO  BAR ACER COMPUTED COMPANY          MILPITAS              20.00
 FOO  BAR APOLLO COMPUTER, INC.          CHELMSFORD             1.00
 FOO  BAR APPLE COMPUTER, INC.           CUPERTINO              8.00
 FOO  BAR BOEING                         TUKWILA               53.00
 FOO  BAR BOEING COMPUTER SERVICES       EDDYSTONE              2.00
 FOO  BAR CITIBANK N. A.                 ANDOVER                4.00
 FOO  BAR CITIBANK NORTH AMERICA         LONG ISLAND CITY      26.00


Thanks!


--
     Bill Petro  {decwrl,hplabs,ucbvax}!sun!Eng!rock
"UNIX for the sake of the kingdom of heaven"  Matthew 19:12

tchrist@convex.COM (Tom Christiansen) (05/15/91)

From the keyboard of rock@warp.Eng.Sun.COM (Bill Petro):
:How could I do this using either the sort command, or perl or awk?
:
:I have the following data in this format - I want to sort it on the
:right most column.  The columns are not tab delimited, and some of the
:columns have arbitrary numbers of words (specifically the third column
:from the left, column c).  Sort assumes that you start numbering on the
:left, and count from 0.  How would you start with the right most
:column, column e?
:
: FOO  BAR ACE CORPORATION                SUNNYVALE              2.00
: FOO  BAR ACER COMPUTED COMPANY          MILPITAS              20.00

How about?
    sort -t^A +0.60n

Assumptions:
    no control-A's in data
    the sort should be numeric
    the numbers don't begin until column 61

--tom
--
Tom Christiansen		tchrist@convex.com	convex!tchrist
		"So much mail, so little time." 

felps@convex.com (Robert Felps) (05/15/91)

In <13320@exodus.Eng.Sun.COM> rock@warp.Eng.Sun.COM (Bill Petro) writes:


>How could I do this using either the sort command, or perl or awk?

>I have the following data in this format - I want to sort it on the
>right most column.  The columns are not tab delimited, and some of the
>columns have arbitrary numbers of words (specifically the third column
>from the left, column c).  Sort assumes that you start numbering on the
>left, and count from 0.  How would you start with the right most
>column, column e?

>  0    1   2                              3                     4

>  a    b   c                              c                     e

> FOO  BAR ACE CORPORATION                SUNNYVALE              2.00
> FOO  BAR ACER COMPUTED COMPANY          MILPITAS              20.00
> FOO  BAR APOLLO COMPUTER, INC.          CHELMSFORD             1.00
> FOO  BAR APPLE COMPUTER, INC.           CUPERTINO              8.00
> FOO  BAR BOEING                         TUKWILA               53.00
> FOO  BAR BOEING COMPUTER SERVICES       EDDYSTONE              2.00
> FOO  BAR CITIBANK N. A.                 ANDOVER                4.00
> FOO  BAR CITIBANK NORTH AMERICA         LONG ISLAND CITY      26.00


I don't know of any way to sort based on right to left. But you might try

    $ sort +0.60bn -0.70bn data

Which sorts columns 60 thru 70 ignoring blanks and using numeric values to
sort on.

>Thanks!


>--
>     Bill Petro  {decwrl,hplabs,ucbvax}!sun!Eng!rock
>"UNIX for the sake of the kingdom of heaven"  Matthew 19:12

Hope it helps,
Robert Felps            I do not speak for  felps@convex.com
Convex Computer Corp    Convex and I seldom Product Specialist
3000 Waterview Parkway  speak for myself.   Tech. Assistant Ctr
Richardson, Tx.  75080  VMS? What's that?   1(800) 952-0379

tchrist@convex.COM (Tom Christiansen) (05/15/91)

From the keyboard of felps@convex.com (Robert Felps):
:I don't know of any way to sort based on right to left. But you might try

You reverse all the fields, sort on first field, then reverse them again.
This might screw up fixed column stuff though.

--tom
--
Tom Christiansen		tchrist@convex.com	convex!tchrist
		"So much mail, so little time." 

ziegast@eng.umd.edu (Eric W. Ziegast) (05/15/91)

Off the top of my head I did:

	#!/usr/bin/perl
	
	# Read in data from STDIN (or specified files) in to @lines
	@lines = (<>);
	
	# Subroutine to numerically compare last space-separated fields.
	sub rightmost {
		local(@a) = split(" ",$a);
		local(@b) = split(" ",$b);
		$a[$#a] <=> $b[$#b];
	}
	
	foreach (sort rightmost @lines) {
		print;
	}

After feeding it the following as standard input:

 FOO  BAR ACE CORPORATION                SUNNYVALE              2.00
 FOO  BAR ACER COMPUTED COMPANY          MILPITAS              20.00
 FOO  BAR APOLLO COMPUTER, INC.          CHELMSFORD             1.00
 FOO  BAR APPLE COMPUTER, INC.           CUPERTINO              8.00
 FOO  BAR BOEING                         TUKWILA               53.00
 FOO  BAR BOEING COMPUTER SERVICES       EDDYSTONE              2.00
 FOO  BAR CITIBANK N. A.                 ANDOVER                4.00
 FOO  BAR CITIBANK NORTH AMERICA         LONG ISLAND CITY      26.00

I got back:

 FOO  BAR APOLLO COMPUTER, INC.          CHELMSFORD             1.00
 FOO  BAR ACE CORPORATION                SUNNYVALE              2.00
 FOO  BAR BOEING COMPUTER SERVICES       EDDYSTONE              2.00
 FOO  BAR CITIBANK N. A.                 ANDOVER                4.00
 FOO  BAR APPLE COMPUTER, INC.           CUPERTINO              8.00
 FOO  BAR ACER COMPUTED COMPANY          MILPITAS              20.00
 FOO  BAR CITIBANK NORTH AMERICA         LONG ISLAND CITY      26.00
 FOO  BAR BOEING                         TUKWILA               53.00


There's probably a better way of doing this (isn't there always?), but
this is probably the most direct.

-- 
+=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-+
| Eric W. Ziegast      Internet: ziegast@eng.umd.edu |
| Univ. of Merryland   Phonenet: Eric@[301.405.3689] |
+=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-+

merlyn@iwarp.intel.com (Randal L. Schwartz) (05/15/91)

In article <13320@exodus.Eng.Sun.COM>, rock@warp (Bill Petro) writes:
| I have the following data in this format - I want to sort it on the
| right most column.  The columns are not tab delimited, and some of the
| columns have arbitrary numbers of words (specifically the third column
| from the left, column c).  Sort assumes that you start numbering on the
| left, and count from 0.  How would you start with the right most
| column, column e?
| 
|   0    1   2                              3                     4
| 
|   a    b   c                              c                     e
| 
|  FOO  BAR ACE CORPORATION                SUNNYVALE              2.00
|  FOO  BAR ACER COMPUTED COMPANY          MILPITAS              20.00
|  FOO  BAR APOLLO COMPUTER, INC.          CHELMSFORD             1.00
|  FOO  BAR APPLE COMPUTER, INC.           CUPERTINO              8.00
|  FOO  BAR BOEING                         TUKWILA               53.00
|  FOO  BAR BOEING COMPUTER SERVICES       EDDYSTONE              2.00
|  FOO  BAR CITIBANK N. A.                 ANDOVER                4.00
|  FOO  BAR CITIBANK NORTH AMERICA         LONG ISLAND CITY      26.00

Well, here's a terribly inefficient one, good enough for small amounts
of data:

##################################################
#!/usr/bin/perl

sub bylastcol {
	@a = split(/\s+/, $a);
	@b = split(/\s+/, $b);
	pop(@a) <=> pop(@b);
}

print sort bylastcol <>;
##################################################

This is inefficient because the "last column" is computed and
recomputed over and over again on each compare.  A better way would be
to compute it once and cache it:

##################################################
#!/usr/bin/perl

sub once {
	return $n if defined($n = $once{$_[0]});
	@a = split(/\s+/, $_[0]);
	$once{$_[0]} = pop(@a);
}


sub bylastcol {
	&once($a) <=> once($b);
}

print sort bylastcol <>;
##################################################

An even better way requires the cooperation of the calling routine to
make a normal array, rather than an associative array, as in:

##################################################
#!/usr/bin/perl

sub byaux {
	$aux[$a] <=> $aux[$b];
}

@data = <>;
for (@data) {
	@a = split(/\s+/, $_);
	push(@aux, pop(@a));
}

print @data[sort byaux $[..$#data];
##################################################

print "Just another Perl hacker,"
-- 
/=Randal L. Schwartz, Stonehenge Consulting Services (503)777-0095 ==========\
| on contract to Intel's iWarp project, Beaverton, Oregon, USA, Sol III      |
| merlyn@iwarp.intel.com ...!any-MX-mailer-like-uunet!iwarp.intel.com!merlyn |
\=Cute Quote: "Intel: putting the 'backward' in 'backward compatible'..."====/

suhre@ondine.dsd.trw.com (Maurice E. Suhre) (05/17/91)

In article <13320@exodus.Eng.Sun.COM> rock@warp.Eng.Sun.COM (Bill Petro) writes:
>
>How could I do this using either the sort command, or perl or awk?
>
>I have the following data in this format - I want to sort it on the
>right most column.  
	[Most of data file deleted for brevity]
> FOO  BAR CITIBANK N. A.                 ANDOVER                4.00
> FOO  BAR CITIBANK NORTH AMERICA         LONG ISLAND CITY      26.00
>
I haven't seen an awk solution yet, so here's one.  The strategy is
to generate a line which has the last field duplicated in front and
uses a different field separator.  I selected the "plus" sign.  There
needs to be some character which is not found in the data file.

#! /bin/sh
# 
awk '{print $NF "+" $0}' | sort -n -t+ +0 -1 ,t1 | awk -F+ '{ print $2}' 
-- 
Maurice Suhre
suhre@trwrb.dsd.trw.com

sherman@unx.sas.com (Chris Sherman) (05/18/91)

In <13320@exodus.Eng.Sun.COM> rock@warp.Eng.Sun.COM (Bill Petro) writes:


>How could I do this using either the sort command, or perl or awk?

>I have the following data in this format - I want to sort it on the
>right most column.  

>  0    1   2                              3                     4

>  a    b   c                              c                     e

> FOO  BAR ACE CORPORATION                SUNNYVALE              2.00
> FOO  BAR ACER COMPUTED COMPANY          MILPITAS              20.00

I hate to be a spoil sport, but how about from the unix prompt:

% sort -n +.60b

Maybe you could do something like this inside perl...  I'm not sure
how though, and still keep it really fast.

--
Chris Sherman .................... sherman@unx.sas.com   |
              ,-----------------------------------------'
             /  Q:  How many IBM CPU's does it take to execute a job?
            |   A:  Four; three to hold it down, and one to rip its head off.

jpr@jpradley.jpr.com (Jean-Pierre Radley) (05/23/91)

>From the keyboard of felps@convex.com (Robert Felps):
>:I don't know of any way to sort based on right to left. But you might try
>

With awk, $NF is the last field. So pipe the file through awk, prepending
the last field ($NF) to the whole line ($0), separated by something wierd,
say ^B. Pipe this to sort, then pipe to a sed to ditch characters up to and
including the ^B.
As I think of it, the first awk could be a sed filter too.

Jean-Pierre Radley   Unix in NYC   jpr@jpr.com   jpradley!jpr   CIS: 72160,1341

dattier@vpnet.chi.il.us (David W. Tamkin) (05/24/91)

jpr@jpradley.jpr.com (Jean-Pierre Radley) wrote in
<1991May22.191522.19709@jpradley.jpr.com>:

| >From the keyboard of felps@convex.com (Robert Felps):
| >:I don't know of any way to sort based on right to left. But you might try


| With awk, $NF is the last field. So pipe the file through awk, prepending
| the last field ($NF) to the whole line ($0), separated by something wierd,
| say ^B. Pipe this to sort, then pipe to a sed to ditch characters up to and
| including the ^B.
| As I think of it, the first awk could be a sed filter too.

You know, if you're looking to replace the original file with a sorted one,
you could do the whole thing in ex.  Let's say your field separator is a
space:

ex - file << \,,
%s/.* \(.*\)/\1 &/
%!sort
%s/[^ ]* //
wq
,,

David Tamkin  PO Box 7002  Des Plaines IL  60018-7002  dattier@vpnet.chi.il.us
GEnie:D.W.TAMKIN  CIS:73720,1570  MCIMail:426-1818  708 518 6769  312 693 0591

"Parker Lewis Can't Lose" mailing list:
 flamingo-request@esd.sgi.com (relay)  flamingo-request@ddsw1.mcs.com (digest)