[net.bugs.usg] sort

rick@ariel.UUCP (R.MAUS) (01/10/85)

The following report was recently submitted to the UNIX* Support
Group.  I am reposting it for the benefit of the network community.

PROBLEM:
  The "-f" option of "sort(1)" does not  perform  as  advertised,
when  used  in  conjunction  with  the  "-u"  option.   Given the
following  three  examples,  the  output  hinges  on  the   first
occurance  of  the text.  The word is not folded to lower-case as
described other than for comparision purposes.

REPEAT-BY:
        $ sort -fu <<!  $ sort -fu <<!  $ sort -fu <<!
        JUNK            Junk            junk
        Junk            junk            JUNK
        junk            JUNK            Junk
        !               !               !

PRODUCES:
        JUNK            Junk            junk

CONCLUSION:
  The output should always be in lower case.  Changing the manual
page  to  indicate  the  problem  will  only serve to confuse the
situation.

*  UNIX is a trademark AT&T Bell Laboratories.

				Richard L. Maus, Jr. (Rick)
				AT&T-ISL HO 1K313 201-834-4532
				...!ho???!ariel!rick

eli@ahuta.UUCP (e.mantel) (01/11/85)

REFERENCES:  <826@ariel.UUCP>

With regard to "f" and several other options, the sort(1) manual page
I'm looking at says:

	"The *ordering* is affected by the following options..."

It is an idiosyncracy of the "u" option that the selection of the line
output from a set of equal lines is not predictable by the user.

The behavior of the "f" option is probably preferable to what ariel!rick
suggests it should be.  If I want to ignore case, I'll just use tr(1)
before I ever turn it over to sort.

	Eli Mantel, AT&T Information Systems, Holmdel, NJ 07733 (ahuta!eli)

mike@enmasse.UUCP (Mike Schloss) (01/13/85)

> CONCLUSION:
>   The output should always be in lower case.  Changing the manual
> page  to  indicate  the  problem  will  only serve to confuse the
> situation.
> 

WRONG

CONCLUSION:
	Extreanous flags should not be added to utilites when there
already exists a perfectly good utility to perform that function.

	"sort -fu" caused some confusion.
	the output from "sort -f | uniq" is self explanatory.

	What you prbably want is :

	... | tr "[A-Z]" "[a-z]" | sort -f | uniq | ...

kpmartin@watmath.UUCP (Kevin Martin) (01/14/85)

>	Extreanous flags should not be added to utilites when there
>already exists a perfectly good utility to perform that function.
>	"sort -fu" caused some confusion.
>	the output from "sort -f | uniq" is self explanatory.
Of course, this only works if you want to sort and uniq by entire lines.
If only part of the line is the sort key, you're stuck with the -u flag.

bruce@ISM780.UUCP (01/14/85)

>       CONCLUSION:
>         The output should always be in lower case.  Changing the manual
>       page  to  indicate  the  problem  will  only serve to confuse the
>       situation.

I disagree, you've ignored the fact that there may be other stuff on the
line that isn't part of the key. For example if my input were:

	sort -fu +0 -1 <<!
	JUNK    fizz
	Junk    frap
	junk    fart
	!

Your fix would produce:

	junk    fizz

This doesn't correspond to an actual input line.

jim@ISM780B.UUCP (01/14/85)

The manual says "u -- Suppress all but one in each set of equal lines".
It doesn't say which one.

Now a *real* bug with sort -f is that it folds lower case onto upper case
rather than vice versa as advertised.
This means that [\]^_` (the ASCII characters between Z and a)
sort after the letters instead of before.

preece@ccvaxa.UUCP (01/16/85)

The use of tr with sort is unlikely to do the desired thing.
When I sort things I usually want the output to look like the
input, including use of upper and lower case.  It's only the
ordering mechanism that should ignore case.  You could write a
similar sort using simple tools to (1) extract key fields into
a file, (2) tr them, (3) attach the key fields on to the front
of the corresponding data lines, (4) sort, (5) strip off the
key fields.  This seems like an awful lot of effort to do the
natural kind of sort for text fields.  The use of the -f flag
seems like a perfectly natural way to tell the sort program to
use special conventions for a text field just as the -n flag
tells sort that to use special conventions for numbers.

scott preece
ihnp4!uiucdcs!ccvaxa!preece