[comp.unix.xenix.sco] Sort bug causes data loss

davidsen@crdos1.crd.ge.COM (Wm E Davidsen Jr) (09/17/90)

  I have discovered what appears to be a serious bug in the sort
routine used in several SysV variants including Stellar. Since it
causes silent loss of data I am cross posting a bit more than I usually
do.

  The problem occurs when the options -n (numeric) and -u (discard
duplicates) are used together sorting data which has a fixed width
numeric as the first key field. The results is output of only one line,
regardless of the input data. I found this by losing 15 months of data
(yes it was backed up). Since sort is often in shell scripts run from
cron to do system things, this problem might not be instantly noticed.

  I have generated the following shell script to test for the problem.

#!/bin/sh
# this tests sort for bugs in -nu option
# as found in SCO Xenix and UNIX

echo ""
echo "Start test of sort error"
echo ""

sort -nu <<XX >x$$.tmp
  1: a
  3: b
  2: c
  1: a
 10: x
XX

sort -n <<XX | uniq >y$$.tmp
  1: a
  3: b
  2: c
  1: a
 10: x
XX

echo "Starting check"
if [ `cat x$$.tmp | wc -l` -ne 4 ]
then	echo "Error in sort with -nu option."
	echo "Output was:"; cat x$$.tmp
	echo "Should be:"; cat y$$.tmp
else	echo "Output appears okay unless diff reported below:"
	diff x$$.tmp y$$.tmp
fi	#

rm [xy]$$.tmp
echo "Test ends"
# ================================================================

  Of course someone may tell me it's supposed to work that way, and that
the BSD version is broken.

  Suggested workaround is to pipe sort through uniq rather than use the
-u option. The form "sort -n +0nu" also worked. If I can find disk
space to load the source tape I'll check it further.
-- 
bill davidsen	(davidsen@crdos1.crd.GE.COM -or- uunet!crdgw1!crdos1!davidsen)
    VMS is a text-only adventure game. If you win you can use unix.