[comp.unix.questions] How to sort on exactly one field only

ARaman@massey.ac.nz (Anand Venkata Raman) (10/24/90)

I want to sort a file exclusively on field #2.
For example,
A portion of the file looks like
appl1  3  Submitted
appl2  4  Submitted
appl1  3  Started
appl3  5  Submitted
appl2  4  Started
appl1  3  Finished(0)
appl2  4  Finished(7)
appl3  5  Started
appl3  5  Finished(3)

I want all applications with the same jobID (Second field) grouped together
and the whole file sorted by jobID.  Sort does that fine.  But within a single
jobID, it sorts using the status as the secondary field which puts the Finished
record before the Started and Submitted.

I tried using sort +1 -2, but that doesn't seem to deter sort from looking
at field #3.

The man page says sort +pos1 -pos2 means the key field begins at pos1 and ends
before pos2.  But that doesn't seem to be the case.

Thanks for any help. (email)

- & (Anand)

quan@sol.surv.utas.oz (Stephen Quan) (10/24/90)

ARaman@massey.ac.nz (Anand Venkata Raman) writes:

>I want to sort a file exclusively on field #2.
>For example,
>A portion of the file looks like
>appl1  3  Submitted
>appl2  4  Submitted
>appl1  3  Started
>appl3  5  Submitted
>appl2  4  Started
>appl1  3  Finished(0)
>appl2  4  Finished(7)
>appl3  5  Started
>appl3  5  Finished(3)

In the bourne shell (sh) :

cat -n file | awk "{print \$3,\$1,\$2,\$4}" | sort | awk "{print \$3,\$1,\$4}"
- line numbers are added
- arguments are rearrange so :
- sort is done on first preference your key, second preference line numbers
- remove the line# are rearrange arguments back

I think I am missing some argument to 'sort' to make it interpret the line
numbers as numbers - not just text.

Stephen Quan,
University of Tasmania.

rob@b15.INGR.COM (Rob Lemley) (10/25/90)

In <1094@massey.ac.nz> ARaman@massey.ac.nz (Anand Venkata Raman) writes:

>I want to sort a file exclusively on field #2.
 . . . 
>I tried using sort +1 -2, but that doesn't seem to deter sort from looking
>at field #3.

Anand, you cannot deter sort from looking at field three.  Try this:

cat <<EOF | sort +1 -2 +2r
appl1  3  Submitted
appl2  4  Submitted
appl1  3  Started
appl3  5  Submitted
appl2  4  Started
appl1  3  Finished(0)
appl2  4  Finished(7)
appl3  5  Started
appl3  5  Finished(3)
EOF

The +2r sorts the third field in reverse order.

The man page clearly states that you cannot cause sort to ignore a field
or preserve relative line ordering.  From the sort(1) man page
(sys V release 3), under DESCRIPTION:

       When there are multiple sort keys, later keys are compared only
       after all earlier keys compare equal.  Lines that otherwise
       compare equal are ordered with all bytes significant.
                                      ^^^^^^^^^^^^^^^^^^^^^
and (from WARNINGS):

       sort does not guarantee preservation of relative line ordering
       on equal keys.

--
-Rob Lemley, System Consultant, Scanning Software, Intergraph, Huntsville, AL
 ...!uunet!ingr!b15!rob   OR   b15!rob@INGR.COM
 205-730-1546

jbw@bucsf.bu.edu (Joe Wells) (10/25/90)

In article <1757@b15.INGR.COM> rob@b15.INGR.COM (Rob Lemley) writes:

   In <1094@massey.ac.nz> ARaman@massey.ac.nz (Anand Venkata Raman) writes:
   >I want to sort a file exclusively on field #2.
    . . . 
   >I tried using sort +1 -2, but that doesn't seem to deter sort from looking
   >at field #3.

   Anand, you cannot deter sort from looking at field three.  Try this:
   [stuff deleted]
   The man page clearly states that you cannot cause sort to ignore a field
   or preserve relative line ordering.  From the sort(1) man page
   (sys V release 3), under DESCRIPTION:

	  When there are multiple sort keys, later keys are compared only
	  after all earlier keys compare equal.  Lines that otherwise
	  compare equal are ordered with all bytes significant.
					 ^^^^^^^^^^^^^^^^^^^^^
   and (from WARNINGS):

	  sort does not guarantee preservation of relative line ordering
	  on equal keys.

I'm including a fix that someone posted a while back to make sort preserve
relative line ordering on equal keys.

Enjoy,

-- 
Joe Wells <jbw@bu.edu>
----------------------------------------------------------------------
One of our users needed a version of sort (we're running
4.3bsd) that is stable; ie, lines considered equal are
grouped together but not otherwise rearranged;  For example,
	0 1
	0 0
sorted on the 0th field would not be rearranged.

As distributed, sort's behavior is (from the man page)
"Lines that otherwise compare equal are ordered with all
bytes significant".

We puzzled through sort and came up with a "fix".  One
question about sort though, any idea why ibuf[256] rather
than ibuf[7], since it never seems to merge more than
7 files at a time?

Thanks,

Ron Stanonik
stanonik@nprdc.arpa

Diffs to make sort stable
*** /usr/src/usr.bin/sort.c	Tue Jun  3 16:59:43 1986
--- sort.c	Tue Dec 20 09:00:45 1988
***************
*** 35,40 ****
--- 35,41 ----
  int 	mflg;
  int	cflg;
  int	uflg;
+ int	Sflg;
  char	*outfil;
  int unsafeout;	/*kludge to assure -m -o works*/
  char	tabchar;
***************
*** 206,211 ****
--- 207,216 ----
  					dirtry[0] = *++argv;
  				continue;
  
+ 			case 'S':
+ 				Sflg++;
+ 				continue;
+ 
  			default:
  				field(++*argv,nfields>0);
  				break;
***************
*** 673,678 ****
--- 678,685 ----
  	}
  	if(uflg)
  		return(0);
+ 	if(Sflg)
+ 		return(i < j ? 1 : -1);
  	return(cmpa(i, j));
  }
  

gwyn@smoke.brl.mil (Doug Gwyn) (10/26/90)

In article <1757@b15.INGR.COM> rob@b15.INGR.COM (Rob Lemley) writes:
>The man page clearly states that you cannot cause sort to ignore a field
>or preserve relative line ordering.

"sort" does ignore non-key fields except when keys compare equal.
To obtain a stable sort, add line numbers, sort on the keys you
wanted followed by the line numbers as an additional key, then
strip off the line numbers.