tgb@ccieng5.UUCP (Timothy G. Becker) (08/30/83)
I don't think that /bin/sort works right. (We are using 4.1bsd /bin/sort, if it makes any difference). If I type "sort +0 -1 input >out", I expect the input file to be sorted beginning with the first field and ending before the second field. The left hand column is the input file contents. The middle column contains what I think the output should be. The right most column is what /bin/sort actually produces. ========================================================= input expected actual ========================================================= a a a a a a a a a b a a a c a a b a a c a a b a a c a b c a b a a b a a a b a b c a b b a b b a b b a b c a The bottom line is that sort continues to look at the remaining fields in the line even though the "-pos" argument tells him not to. Does anyone disagree that above sort command should do what I expected? Any ideas? Any fixes? Tim Becker Computer Consoles, Inc. Rochester, NY ...!seismo!rochester!ritcv!ccieng5!tgb
tag@tty3b.UUCP (08/31/83)
* I don't think that /bin/sort works right. * (We are using 4.1bsd /bin/sort, if it makes any difference). * * If I type "sort +0 -1 input >out", I expect the input file to be sorted * beginning with the first field and ending before the second field. This is yet another bug that is really a feature. or vice-versa... In any case, it's probably in your documentation. In the version we're running (5.0.3) there is a paragraph in the manual for sort(1) which reads: > When there are multiple sort keys, later keys are compared > only after all earlier keys compare equal. *Lines that > otherwise compare equal are ordered with all bytes > significant.* (Emphasis mine) In other words: the whole line is used for sorting, you only get to specify what is of major importance. We learned this the hard way when trying to do multiple sorts on a database. Tom Gloger Teletype Corporation (ihnp4|we13) otuxa!tty3b!tag
dougc@tekecs.UUCP (Doug Campbell) (08/31/83)
If you specify the sort to be performed on only one field you cannot specify a specific ordering for the other unsorted fields. The ordering for the unsorted fields is nondeterministic. Therefore, I do not agree that sort should be producing the "expected result" you posted. (Various sorting algorithms will produce diferent results in the unsorted fields.) I do not feel this is a bug in /bin/sort. If you really need the "expected result" you posted, you should not be using /bin/sort. Doug Campbell
johnl@ima.UUCP (09/01/83)
#R:ccieng5:-13300:ima:35700002:000:815 ima!johnl Aug 31 18:02:00 1983 I don't think that /bin/sort works right. ... The bottom line is that sort continues to look at the remaining fields in the line even though the "-pos" argument tells him not to. The sort page in my Unix manual says: When there are multiple sort keys, later keys are compared only after all earlier keys compare equal. Lines that otherwise compare equal are ordered with all bytes significant. so sort behaves exactly the way the documentation says it does. I think this language in the manual page dates back at least to V7. If you want a stable sort, you can number the lines, sort, and un-number them. On S/3, you could say: pr -t -n file | sort +1 -2 +0n | sed "s/^.....//" to do that. A "stable" option to sort would have to do the same internally. John Levine, ima!johnl
mark@umcp-cs.UUCP (09/01/83)
I think the problem is that /bin/sort doesn't guarantee anything about the order of records which are "equal" on the specified sorting fields. This is typical of sort programs, actually, since they can then run faster. Sort does a lot of sorting of subsets into temp files and then merging the results--retaining equality ordering would be difficult. -- spoken: mark weiser UUCP: {seismo,allegra,brl-bmd}!umcp-cs!mark CSNet: mark@umcp-cs ARPA: mark.umcp-cs@UDel-Relay
hansen@pegasus.UUCP (09/01/83)
I just looked at my 4.1 Berkeley manual, my System III manual, and my System V manual, and they all had the same comment: Lines that otherwise compare equal are ordered with all bytes significant. It looks like that "bug"/"feature" has been around for a LONG time. Tony Hansen pegasus!hansen
grahamr@bronze.UUCP (Graham Ross) (09/01/83)
/bin/sort's quick sort algorithm is unstable (in the sense of Knuth, ACP, Vol. 3, p. 4). To achieve stability, do the following: cat -n <input | sort ...your_flags... +0n | sed 's/^[^\t]\t//' >output This prepends an identifying number to each line in the input, which stays with it until the sorting is done. The +0n key in the sort gives a numeric comparison on this first field. Remember to bump the column numbers in you original sort flags. The "\t"s in the sed command probably should be real tabs (I forget). If you don't have cat's -n option (another UCB monstrosity, I think) you can easily write a separate program to do it. -Graham Ross teklabs!tekmdp!grahamr
preece@uicsl.UUCP (09/02/83)
#R:ccieng5:-13300:uicsl:12500007:000:621
uicsl!preece Sep 1 11:58:00 1983
>From the man page for sort:
"Lines that otherwise compare equal are ordered
with all bytes significant."
There is, in any case, no rule that says a sort has to be stable.
Most of the system sort packages I've used are not stable: items
with the same key are arbitrarily ordered, not in the original
order. The authors of this sort have kindly told us what that
arbitrary ordering will be (a sort on the non-key parts of the line).
Lines which are identical will presumably be in an unspecified
order, but since they're identical, you won't be able to tell
the difference...
scott preece
pur-ee!uiucdcs!uicsl!preece
gwyn@brl-vld@sri-unix.UUCP (09/02/83)
From: Doug Gwyn (VLD/VMB) <gwyn@brl-vld> "Lines that otherwise compare equal are ordered with all bytes significant." - from SORT(1) Nowhere is "sort" advertised as being a stable sort, and it isn't.
craig@cfib.UUCP (09/03/83)
#R:ccieng5:-13300:cfib:11100005:000:205 cfib!craig Sep 2 11:52:00 1983 This discussion raises an interesting question --- does anyone know of a sort program for Unix that does "stable" sorts? (Unlike /bin/sort). Replies by mail please... Craig Partridge ...ima!cfib!craig
jim@mcvax.UUCP (Jim McKie) (09/04/83)
It HAS been around a long time, the same comment appears in my Fifth Edition (Version 5) manual, dated 6/11/74. Jim McKie Mathematisch Centrum, Amsterdam ..{philabs|decvax}!mcvax!jim