[comp.bugs.sys5] Bug in sort

david@cs.uow.edu.au (David E A Wilson) (04/11/91)

I have run into a bug in the sort(1) command on a Sun4 running SunOS 4.1.1.
This occurs both with /usr/bin/sort and /usr/5bin/sort. The problem also
appears on a Sequent Symmetry running Dynix 2.0v2 (both ucb sort and att sort)
and finally with the sort command compiled using the 4.3-bsd tahoe sources.

The bug is as follows. I am trying to sort on the 2nd character of the second
field of a colon separated record. If this character compares equal I then
sort on the 1st character of the 2nd field.

/usr/bin/sort -t: +1.1 -1.2 +1.0 -1.1 <<!
abc:Ab:xyz
def:zA:pqr
!

This results in:

Script started on Thu Apr 11 13:16:57 1991
$ /usr/bin/sort -t: +1.1 -1.2 +1.0 -1.1 <<!
>abc:Ab:xyz
>def:zA:pqr
>!
abc:Ab:xyz
def:zA:pqr
$ 
script done on Thu Apr 11 13:17:11 1991

Which is incorrect. The same input to "sort +0.5 -0.6 +0.4 -0.5" will however
work (but is useless if the first field is not fixed length).

The problem appears to be in the skip function used to find the start and end
of sort keys. Patching it to add one to the end pointer fixes my problem but
may introduce other problems.
-- 
David Wilson	Dept Comp Sci, Uni of Wollongong	david@cs.uow.edu.au

paul@unhtel.unh.edu (Paul S. Sawyer) (04/11/91)

In article <1991Apr11.031926.19901@cs.uow.edu.au> david@cs.uow.edu.au (David E A Wilson) writes:
>I have run into a bug in the sort(1) command on a Sun4 running SunOS 4.1.1.
>This occurs both with /usr/bin/sort and /usr/5bin/sort. The problem also
>appears on a Sequent Symmetry running Dynix 2.0v2 (both ucb sort and att sort)
>and finally with the sort command compiled using the 4.3-bsd tahoe sources.
>
>The bug is as follows. I am trying to sort on the 2nd character of the second
>field of a colon separated record. If this character compares equal I then
>sort on the 1st character of the 2nd field.
>
>/usr/bin/sort -t: +1.1 -1.2 +1.0 -1.1 <<!
>abc:Ab:xyz
>def:zA:pqr
>!
>
>This results in:
> ...
>abc:Ab:xyz
>def:zA:pqr
> ...
>Which is incorrect. ...
>
>The problem appears to be in the skip function used to find the start and end
>of sort keys. Patching it to add one to the end pointer fixes my problem but
>may introduce other problems.

That is what you have to do.  If you RTFM very carefully, the bug is in the
IMPLEMENTATION, such that +m.n does not mean the same as -m.n !!  It seems
perverse, but you need:

	/usr/bin/sort -t: +1.1 -1.3 +1.0 -1.2

-- 
Paul S. Sawyer             {uunet,attmail}!unhtel!paul    paul@unhtel.unh.edu
UNH CIS - - Telecommunications and Network Services      VOX: +1 603 862 3262
Durham, New Hampshire  03824-3523                        FAX: +1 603 862 2030

gwyn@smoke.brl.mil (Doug Gwyn) (04/13/91)

In article <1991Apr11.031926.19901@cs.uow.edu.au> david@cs.uow.edu.au (David E A Wilson) writes:
>I have run into a bug in the sort(1) command on a Sun4 running SunOS 4.1.1.

No, it's just a feature.

>/usr/5bin/sort -t: +1.1 -1.2 +1.0 -1.1 <<!

That should have been
	sort -t: +1.1 -1.3 +1.0 -1.2
as documented and shown in an example in the SVR2 manual entry SORT(1).

This is admittedly a strange design that is hard to use correctly.
My guess is that it may have actually been meant to work differently
when it was first implemented, but now that the "sort" utility is in
wide use the specifications cannot be changed without breaking many
applications.

david@cs.uow.edu.au (David E A Wilson) (04/15/91)

gwyn@smoke.brl.mil (Doug Gwyn) writes:
>No, it's just a feature.
>>/usr/5bin/sort -t: +1.1 -1.2 +1.0 -1.1 <<!
>That should have been
>	sort -t: +1.1 -1.3 +1.0 -1.2
>as documented and shown in an example in the SVR2 manual entry SORT(1).

It looks like the SunOS 4.1.1 manual is in error then:
---------------------------------------------------------------------------
  Field Specification Options
     -tc  Use c as the word delimiter  character;  unlike  white-
          space  characters,  adjacent  delimiters  indicate word
          breaks; if : is the delimiter character, :: delimits an
          empty word.

     sort-field
          This is a  combination  of  options  that  specifies  a
          field,  within  each  line,  to  sort on.  A sort-field
          specification can take either of the following forms:
               +sw[cf]
               +sw -ew[cf]

          where sw is the number of the starting word  (beginning
          with  `0') to include in the field, ew is the number of
          the word before which to end the field,  and  cf  is  a
          string  containing  collating  flags (without a leading
          `-'.) When  included  in  a  sort-field  specification,
          these  flags  apply  only to the field being specified,
          and when given, override other collating flags given in
          separate  arguments (which otherwise apply to an entire
          line).

          If the -ew option is omitted, the  field  continues  to
          the end of a line.

          You can apply a character offset to sw and ew to  indi-
          cate  that a field is to start or end a given number of
          characters within a word, using the notation: `w.c'.  A
          starting  position  specified in the form: `+w.c' indi-
          cates the character in position c (beginning with 0 for
          the  first  character),  within  word  w (1 and 1.0 are
          equivalent).  An ending position specified in the form:
          `-w.c'  indicates  that the field ends at the character
          just prior to position c (beginning with 0 for the del-
          imiter  just prior to the first character), within word
          w.  If the -b flag is in effect, c is counted from  the
          first non-white-space or non-delimiter character in the
          field, otherwise, delimiter characters are counted.
EXAMPLES
     Sort, in reverse order,  the  contents  of  input-file1  and
     input-file2, placing the output in output-file and using the
     first character of the second field as the sort key:

          sort -r -o output-file  +1.0  -1.1  input-file1  input-
          file2

     Sort, in reverse order,  the  contents  of  input-file1  and
     input-file2  using  the  first  non-blank  character  of the
     second field as the sort key:

          sort -r +1.0b -1.1b input-file1 input-file2
---------------------------------------------------------------------------
-- 
David Wilson	Dept Comp Sci, Uni of Wollongong	david@cs.uow.edu.au