david@cs.uow.edu.au (David E A Wilson) (04/11/91)
I have run into a bug in the sort(1) command on a Sun4 running SunOS 4.1.1. This occurs both with /usr/bin/sort and /usr/5bin/sort. The problem also appears on a Sequent Symmetry running Dynix 2.0v2 (both ucb sort and att sort) and finally with the sort command compiled using the 4.3-bsd tahoe sources. The bug is as follows. I am trying to sort on the 2nd character of the second field of a colon separated record. If this character compares equal I then sort on the 1st character of the 2nd field. /usr/bin/sort -t: +1.1 -1.2 +1.0 -1.1 <<! abc:Ab:xyz def:zA:pqr ! This results in: Script started on Thu Apr 11 13:16:57 1991 $ /usr/bin/sort -t: +1.1 -1.2 +1.0 -1.1 <<! >abc:Ab:xyz >def:zA:pqr >! abc:Ab:xyz def:zA:pqr $ script done on Thu Apr 11 13:17:11 1991 Which is incorrect. The same input to "sort +0.5 -0.6 +0.4 -0.5" will however work (but is useless if the first field is not fixed length). The problem appears to be in the skip function used to find the start and end of sort keys. Patching it to add one to the end pointer fixes my problem but may introduce other problems. -- David Wilson Dept Comp Sci, Uni of Wollongong david@cs.uow.edu.au
paul@unhtel.unh.edu (Paul S. Sawyer) (04/11/91)
In article <1991Apr11.031926.19901@cs.uow.edu.au> david@cs.uow.edu.au (David E A Wilson) writes: >I have run into a bug in the sort(1) command on a Sun4 running SunOS 4.1.1. >This occurs both with /usr/bin/sort and /usr/5bin/sort. The problem also >appears on a Sequent Symmetry running Dynix 2.0v2 (both ucb sort and att sort) >and finally with the sort command compiled using the 4.3-bsd tahoe sources. > >The bug is as follows. I am trying to sort on the 2nd character of the second >field of a colon separated record. If this character compares equal I then >sort on the 1st character of the 2nd field. > >/usr/bin/sort -t: +1.1 -1.2 +1.0 -1.1 <<! >abc:Ab:xyz >def:zA:pqr >! > >This results in: > ... >abc:Ab:xyz >def:zA:pqr > ... >Which is incorrect. ... > >The problem appears to be in the skip function used to find the start and end >of sort keys. Patching it to add one to the end pointer fixes my problem but >may introduce other problems. That is what you have to do. If you RTFM very carefully, the bug is in the IMPLEMENTATION, such that +m.n does not mean the same as -m.n !! It seems perverse, but you need: /usr/bin/sort -t: +1.1 -1.3 +1.0 -1.2 -- Paul S. Sawyer {uunet,attmail}!unhtel!paul paul@unhtel.unh.edu UNH CIS - - Telecommunications and Network Services VOX: +1 603 862 3262 Durham, New Hampshire 03824-3523 FAX: +1 603 862 2030
gwyn@smoke.brl.mil (Doug Gwyn) (04/13/91)
In article <1991Apr11.031926.19901@cs.uow.edu.au> david@cs.uow.edu.au (David E A Wilson) writes: >I have run into a bug in the sort(1) command on a Sun4 running SunOS 4.1.1. No, it's just a feature. >/usr/5bin/sort -t: +1.1 -1.2 +1.0 -1.1 <<! That should have been sort -t: +1.1 -1.3 +1.0 -1.2 as documented and shown in an example in the SVR2 manual entry SORT(1). This is admittedly a strange design that is hard to use correctly. My guess is that it may have actually been meant to work differently when it was first implemented, but now that the "sort" utility is in wide use the specifications cannot be changed without breaking many applications.
david@cs.uow.edu.au (David E A Wilson) (04/15/91)
gwyn@smoke.brl.mil (Doug Gwyn) writes: >No, it's just a feature. >>/usr/5bin/sort -t: +1.1 -1.2 +1.0 -1.1 <<! >That should have been > sort -t: +1.1 -1.3 +1.0 -1.2 >as documented and shown in an example in the SVR2 manual entry SORT(1). It looks like the SunOS 4.1.1 manual is in error then: --------------------------------------------------------------------------- Field Specification Options -tc Use c as the word delimiter character; unlike white- space characters, adjacent delimiters indicate word breaks; if : is the delimiter character, :: delimits an empty word. sort-field This is a combination of options that specifies a field, within each line, to sort on. A sort-field specification can take either of the following forms: +sw[cf] +sw -ew[cf] where sw is the number of the starting word (beginning with `0') to include in the field, ew is the number of the word before which to end the field, and cf is a string containing collating flags (without a leading `-'.) When included in a sort-field specification, these flags apply only to the field being specified, and when given, override other collating flags given in separate arguments (which otherwise apply to an entire line). If the -ew option is omitted, the field continues to the end of a line. You can apply a character offset to sw and ew to indi- cate that a field is to start or end a given number of characters within a word, using the notation: `w.c'. A starting position specified in the form: `+w.c' indi- cates the character in position c (beginning with 0 for the first character), within word w (1 and 1.0 are equivalent). An ending position specified in the form: `-w.c' indicates that the field ends at the character just prior to position c (beginning with 0 for the del- imiter just prior to the first character), within word w. If the -b flag is in effect, c is counted from the first non-white-space or non-delimiter character in the field, otherwise, delimiter characters are counted. EXAMPLES Sort, in reverse order, the contents of input-file1 and input-file2, placing the output in output-file and using the first character of the second field as the sort key: sort -r -o output-file +1.0 -1.1 input-file1 input- file2 Sort, in reverse order, the contents of input-file1 and input-file2 using the first non-blank character of the second field as the sort key: sort -r +1.0b -1.1b input-file1 input-file2 --------------------------------------------------------------------------- -- David Wilson Dept Comp Sci, Uni of Wollongong david@cs.uow.edu.au