[net.unix-wizards] /bin/sort bug

tgb@ccieng5.UUCP (Timothy G. Becker) (08/30/83)

I don't think that /bin/sort works right.
(We are using 4.1bsd /bin/sort, if it makes any difference).

If I type "sort +0 -1 input >out", I expect the input file to be sorted
beginning with the first field and ending before the second field.

The left hand column is the input file contents.  The middle column
contains what I think the output should be.  The right most column is
what /bin/sort actually produces.

=========================================================
input		       expected			actual
=========================================================
a a a			a a a			a a a
b a a			a c a			a b a
a c a			a b a			a c a
b c a			b a a			b a a
a b a			b c a			b b a
b b a			b b a			b c a

The bottom line is that sort continues to look at the remaining fields
in the line even though the "-pos" argument tells him not to.  Does
anyone disagree that above sort command should do what I expected?  Any
ideas?  Any fixes?

Tim Becker
Computer Consoles, Inc.
Rochester, NY
...!seismo!rochester!ritcv!ccieng5!tgb

tag@tty3b.UUCP (08/31/83)

*  I don't think that /bin/sort works right.
*  (We are using 4.1bsd /bin/sort, if it makes any difference).
*
*  If I type "sort +0 -1 input >out", I expect the input file to be sorted
*  beginning with the first field and ending before the second field.

This is yet another bug that is really a feature.  or vice-versa...

In any case, it's probably in your documentation.  In the version we're
running (5.0.3) there is a paragraph in the manual for sort(1) which reads:

 >  When there are multiple sort keys, later keys are compared
 >  only after all earlier keys compare equal.  *Lines that
 >  otherwise compare equal are ordered with all bytes
 >  significant.* (Emphasis mine)

In other words: the whole line is used for sorting, you only get to specify
what is of major importance.  We learned this the hard way when trying 
to do multiple sorts on a database.

                        Tom Gloger
                        Teletype Corporation
                        (ihnp4|we13) otuxa!tty3b!tag

dougc@tekecs.UUCP (Doug Campbell) (08/31/83)

If you specify the sort to be performed on only one field
you cannot specify a specific ordering for the other unsorted
fields.  The ordering for the unsorted fields is nondeterministic.
Therefore, I do not agree that sort should be producing the
"expected result" you posted.  (Various sorting algorithms will
produce diferent results in the unsorted fields.)


I do not feel this is a bug in /bin/sort.  If you really need
the "expected result" you posted, you should not be using
/bin/sort.

Doug Campbell

johnl@ima.UUCP (09/01/83)

#R:ccieng5:-13300:ima:35700002:000:815
ima!johnl    Aug 31 18:02:00 1983

    I don't think that /bin/sort works right.  ... The bottom line is
    that sort continues to look at the remaining fields in the line even
    though the "-pos" argument tells him not to.

The sort page in my Unix manual says:

    When there are multiple sort keys, later keys are compared only after
    all earlier keys compare equal.   Lines that otherwise compare equal
    are ordered with all bytes significant.

so sort behaves exactly the way the documentation says it does.  I think
this language in the manual page dates back at least to V7.  If you want
a stable sort, you can number the lines, sort, and un-number them.  On
S/3, you could say:

	pr -t -n file | sort +1 -2 +0n | sed "s/^.....//"

to do that.  A "stable" option to sort would have to do the same internally.

John Levine, ima!johnl

mark@umcp-cs.UUCP (09/01/83)

I think the problem is that /bin/sort doesn't guarantee anything
about the order of records which are "equal" on the specified
sorting fields.  This is typical of sort programs, actually,
since they can then run faster.  Sort does a lot of sorting
of subsets into temp files and then merging the results--retaining
equality ordering would be difficult.
-- 
spoken:	mark weiser
UUCP:	{seismo,allegra,brl-bmd}!umcp-cs!mark
CSNet:	mark@umcp-cs
ARPA:	mark.umcp-cs@UDel-Relay

hansen@pegasus.UUCP (09/01/83)

I just looked at my 4.1 Berkeley manual, my System III manual, and my System
V manual, and they all had the same comment:

    Lines that otherwise compare equal are ordered with all bytes
    significant.

It looks like that "bug"/"feature" has been around for a LONG time.

					Tony Hansen
					pegasus!hansen

grahamr@bronze.UUCP (Graham Ross) (09/01/83)

/bin/sort's quick sort algorithm is unstable (in the sense of Knuth, ACP,
Vol. 3, p. 4).  To achieve stability, do the following:

cat -n <input | sort ...your_flags... +0n | sed 's/^[^\t]\t//' >output

This prepends an identifying number to each line in the input, which stays
with it until the sorting is done.  The +0n key in the sort gives a numeric
comparison on this first field.  Remember to bump the column numbers in
you original sort flags.  The "\t"s in the sed command probably should be
real tabs (I forget).  If you don't have cat's -n option (another UCB
monstrosity, I think) you can easily write a separate program to do it.

	-Graham Ross
	teklabs!tekmdp!grahamr

preece@uicsl.UUCP (09/02/83)

#R:ccieng5:-13300:uicsl:12500007:000:621
uicsl!preece    Sep  1 11:58:00 1983

>From the man page for sort:

	"Lines that otherwise compare equal are ordered
	with all bytes significant."

There is, in any case, no rule that says a sort has to be stable.
Most of the system sort packages I've used are not stable: items
with the same key are arbitrarily ordered, not in the original
order. The authors of this sort have kindly told us what that
arbitrary ordering will be (a sort on the non-key parts of the line).
Lines which are identical will presumably be in an unspecified
order, but since they're identical, you won't be able to tell
the difference...

scott preece
pur-ee!uiucdcs!uicsl!preece

gwyn@brl-vld@sri-unix.UUCP (09/02/83)

From:      Doug Gwyn (VLD/VMB) <gwyn@brl-vld>

"Lines that otherwise compare equal are ordered with all bytes
significant."
		- from SORT(1)

Nowhere is "sort" advertised as being a stable sort, and it isn't.

craig@cfib.UUCP (09/03/83)

#R:ccieng5:-13300:cfib:11100005:000:205
cfib!craig    Sep  2 11:52:00 1983


	This discussion raises an interesting question --- does
anyone know of a sort program for Unix that does "stable" sorts?
(Unlike /bin/sort).  Replies by mail please...

Craig Partridge
...ima!cfib!craig

jim@mcvax.UUCP (Jim McKie) (09/04/83)

It HAS been around a long time, the same comment appears
in my Fifth Edition (Version 5) manual, dated 6/11/74.

Jim McKie  Mathematisch Centrum, Amsterdam  ..{philabs|decvax}!mcvax!jim