rock@warp.Eng.Sun.COM (Bill Petro) (05/15/91)
How could I do this using either the sort command, or perl or awk? I have the following data in this format - I want to sort it on the right most column. The columns are not tab delimited, and some of the columns have arbitrary numbers of words (specifically the third column from the left, column c). Sort assumes that you start numbering on the left, and count from 0. How would you start with the right most column, column e? 0 1 2 3 4 a b c c e FOO BAR ACE CORPORATION SUNNYVALE 2.00 FOO BAR ACER COMPUTED COMPANY MILPITAS 20.00 FOO BAR APOLLO COMPUTER, INC. CHELMSFORD 1.00 FOO BAR APPLE COMPUTER, INC. CUPERTINO 8.00 FOO BAR BOEING TUKWILA 53.00 FOO BAR BOEING COMPUTER SERVICES EDDYSTONE 2.00 FOO BAR CITIBANK N. A. ANDOVER 4.00 FOO BAR CITIBANK NORTH AMERICA LONG ISLAND CITY 26.00 Thanks! -- Bill Petro {decwrl,hplabs,ucbvax}!sun!Eng!rock "UNIX for the sake of the kingdom of heaven" Matthew 19:12
merlyn@iwarp.intel.com (Randal L. Schwartz) (05/15/91)
In article <13320@exodus.Eng.Sun.COM>, rock@warp (Bill Petro) writes: | I have the following data in this format - I want to sort it on the | right most column. The columns are not tab delimited, and some of the | columns have arbitrary numbers of words (specifically the third column | from the left, column c). Sort assumes that you start numbering on the | left, and count from 0. How would you start with the right most | column, column e? | | 0 1 2 3 4 | | a b c c e | | FOO BAR ACE CORPORATION SUNNYVALE 2.00 | FOO BAR ACER COMPUTED COMPANY MILPITAS 20.00 | FOO BAR APOLLO COMPUTER, INC. CHELMSFORD 1.00 | FOO BAR APPLE COMPUTER, INC. CUPERTINO 8.00 | FOO BAR BOEING TUKWILA 53.00 | FOO BAR BOEING COMPUTER SERVICES EDDYSTONE 2.00 | FOO BAR CITIBANK N. A. ANDOVER 4.00 | FOO BAR CITIBANK NORTH AMERICA LONG ISLAND CITY 26.00 Well, here's a terribly inefficient one, good enough for small amounts of data: ################################################## #!/usr/bin/perl sub bylastcol { @a = split(/\s+/, $a); @b = split(/\s+/, $b); pop(@a) <=> pop(@b); } print sort bylastcol <>; ################################################## This is inefficient because the "last column" is computed and recomputed over and over again on each compare. A better way would be to compute it once and cache it: ################################################## #!/usr/bin/perl sub once { return $n if defined($n = $once{$_[0]}); @a = split(/\s+/, $_[0]); $once{$_[0]} = pop(@a); } sub bylastcol { &once($a) <=> once($b); } print sort bylastcol <>; ################################################## An even better way requires the cooperation of the calling routine to make a normal array, rather than an associative array, as in: ################################################## #!/usr/bin/perl sub byaux { $aux[$a] <=> $aux[$b]; } @data = <>; for (@data) { @a = split(/\s+/, $_); push(@aux, pop(@a)); } print @data[sort byaux $[..$#data]; ################################################## print "Just another Perl hacker," -- /=Randal L. Schwartz, Stonehenge Consulting Services (503)777-0095 ==========\ | on contract to Intel's iWarp project, Beaverton, Oregon, USA, Sol III | | merlyn@iwarp.intel.com ...!any-MX-mailer-like-uunet!iwarp.intel.com!merlyn | \=Cute Quote: "Intel: putting the 'backward' in 'backward compatible'..."====/
bjaspan@ATHENA.MIT.EDU (Barr3y Jaspan) (05/16/91)
|> I have the following data in this format - I want to sort it on the |> right most column. |> |> 0 1 2 3 4 |> |> a b c c e |> |> FOO BAR ACE CORPORATION SUNNYVALE 2.00 |> FOO BAR ACER COMPUTED COMPANY MILPITAS 20.00 (Introduction: I'm not sure what the first two lines ("0 1 2 ...") are for, so I'm ignoring them. :-) Well, since you know the sort key is the last number on each line, you could do something like this (untested): while (<>) { /([\d.]+)$/; $lines{$1} = $_; } for (sort numerically keys %lines) { print $lines{$_}; } sub numerically { $a <=> $b; } The idea is to use the number at the end of a line as a key in an associative array, storing each entire line of text keyed on its final number. (You could do this more "efficiently" with a normal array if you knew (1) that all your numbers were integers and (2) that all the numbers were "small".) Then you just sort the keys and print the respective lines of text, in order. (You could also split the entire line on whitespace and use the last element in the returned array as your key; I suspect (but I'm not sure) the first way is faster.) Barr3y Jaspan, bjaspan@mit.edu
arielf@tasu8c.UUCP (Ariel Faigon) (05/16/91)
+--- In the referenced article, Barr3y Jaspan wrote: ||> I have the following data in this format - I want to sort it on the ||> right most column. ||> ||> 0 1 2 3 4 ||> ||> a b c c e ||> ||> FOO BAR ACE CORPORATION SUNNYVALE 2.00 ||> FOO BAR ACER COMPUTED COMPANY MILPITAS 20.00 | | ... ] |Well, since you know the sort key is the last number on each line, you |could do something like this (untested): | |while (<>) { | /([\d.]+)$/; | $lines{$1} = $_; |} | |for (sort numerically keys %lines) { | print $lines{$_}; |} | |sub numerically { | $a <=> $b; |} | |The idea is to use the number at the end of a line as a key in an |associative array, storing each entire line of text keyed on its final |number. [more deleted ...] +------ Barr3y's solution is of course most straight forward. One problem remains though, since associative array keys are unique, lines with the same key (same last column) will overwrite each other. The solution is to concatenate lines with the same key (the '\n' is left intact thanks to Perl). i.e. to replace the line: > $lines{$1} = $_; With $lines{$1} .= $_; This should do what the original poster wanted. Peace, Ariel Ariel Faigon National Semiconductor Corp. (NSTA Design Center) 6 Maskit St. P.O.B. 3007, Herzlia 46104, Israel Tel. +972 52-522272 arielf@taux01.nsc.com