[net.unix-wizards] sort bug needs wizard

joec@u1100a.UUCP (Joe Carfagno) (08/07/85)

{this line not shown by sort -u ... only kidding}
>Running 4.2 on a 750, I have encountered a bug in sort, such that the output
>of "sort -u", when run against a particular file, is missing one line.
>
>Unfortunately, the file that produces this error is 136550 bytes (45516
>lines), and I have been unable to extract a reasonably small subset of
>the file which will demonstrate the bug.  Nevertheless, I consider this
>extremely serious.  I'll never know what erroneous reports my application
>has turned out;  I use sort a lot, and it was only by accident that I
>noticed the problem with a file this large.  I would be glad to send any
>interested party a copy of the file, but I think that it is too big to
>send over the net.
>
>Everything seems to go ok until the final call to the "merge" routine,
>and although I have come up with a patch that fixes this instance of
>the problem, the fact that I can't replicate the bug with a smaller file
>verifies that my understanding of sort is inadequate.
>
>Does any of this sound familiar?
>Any help or suggestions would be greatly appreciated.
>
>    Carl Shapiro
>    ...sdcrdcf!otto!carl
 
Here's the problem (and the fix) with sort -u missing one line 
 
$ diff /usr/src/cmd/sort.c /usr/src/cmd/OLDsort.c
399c399
< 		if(!cflg && (uflg == 0 || muflg || i <= 1 ||
---
> 		if(!cflg && (uflg == 0 || muflg ||
 
It seems that i can be set to 1 and the lines which follow
 
line 399	if(!cflg && (uflg == 0 || muflg ||
			(*compare)(ibuf[i-1]->l,ibuf[i-2]->l)))
			do
				putc(*cp, os);
			while(*cp++ != '\n');
 
specifically the ibuf[i-2]->l don't make much sense for i == 1.
 
So, add the extra "|| i <= 1" condition.
 
The problem not occuring with small files is explainable - hopefully I
can explain it.  Not remembering all the details, but small files can be
merged in core (see what the MEM tag gets used for, that rings a bell).
I think that when you merge, one of the files can get one line in them
(thus i == 1) and it screws up.
 
I found the problem on our version of the UNIX system which runs on
the Sperry 1100 processor line.  The (*compare) function was called
with a strange second argument which worked on one type of processor
but not on another.  It turns out that the eol() function was scanning
the register set (which starts at address 0 on an 1100) and finding
a new-line character in one processor type's register set, but not
in the other.  This was an interesting one to find, and I think it'll
solve your problem.
 
Joe Carfagno
{...!u1100a!joec}