[comp.bugs.sys5] Sort bug causes data loss

davidsen@crdos1.crd.ge.COM (Wm E Davidsen Jr) (09/17/90)

  I have discovered what appears to be a serious bug in the sort
routine used in several SysV variants including Stellar. Since it
causes silent loss of data I am cross posting a bit more than I usually
do.

  The problem occurs when the options -n (numeric) and -u (discard
duplicates) are used together sorting data which has a fixed width
numeric as the first key field. The results is output of only one line,
regardless of the input data. I found this by losing 15 months of data
(yes it was backed up). Since sort is often in shell scripts run from
cron to do system things, this problem might not be instantly noticed.

  I have generated the following shell script to test for the problem.

#!/bin/sh
# this tests sort for bugs in -nu option
# as found in SCO Xenix and UNIX

echo ""
echo "Start test of sort error"
echo ""

sort -nu <<XX >x$$.tmp
  1: a
  3: b
  2: c
  1: a
 10: x
XX

sort -n <<XX | uniq >y$$.tmp
  1: a
  3: b
  2: c
  1: a
 10: x
XX

echo "Starting check"
if [ `cat x$$.tmp | wc -l` -ne 4 ]
then	echo "Error in sort with -nu option."
	echo "Output was:"; cat x$$.tmp
	echo "Should be:"; cat y$$.tmp
else	echo "Output appears okay unless diff reported below:"
	diff x$$.tmp y$$.tmp
fi	#

rm [xy]$$.tmp
echo "Test ends"
# ================================================================

  Of course someone may tell me it's supposed to work that way, and that
the BSD version is broken.

  Suggested workaround is to pipe sort through uniq rather than use the
-u option. The form "sort -n +0nu" also worked. If I can find disk
space to load the source tape I'll check it further.
-- 
bill davidsen	(davidsen@crdos1.crd.GE.COM -or- uunet!crdgw1!crdos1!davidsen)
    VMS is a text-only adventure game. If you win you can use unix.

terryl@sail.LABS.TEK.COM (09/18/90)

In article <2675@crdos1.crd.ge.COM> davidsen@crdos1.crd.ge.com (bill davidsen) writes:
+
+  I have discovered what appears to be a serious bug in the sort
+routine used in several SysV variants including Stellar. Since it
+causes silent loss of data I am cross posting a bit more than I usually
+do.
+
+  The problem occurs when the options -n (numeric) and -u (discard
+duplicates) are used together sorting data which has a fixed width
+numeric as the first key field. The results is output of only one line,
+regardless of the input data. I found this by losing 15 months of data
+(yes it was backed up). Since sort is often in shell scripts run from
+cron to do system things, this problem might not be instantly noticed.


     Interesting. Right now I'm running on a Solbourne, which is a Sun
clone, and it has the dual BSD/SYSV environment.

     The bug shows up using the SYSV sort, but not the BSD sort.....

rr@mips.COM (Robert "Bob" Rodriguez) (09/18/90)

I verified  the SYS5 sort bug on MIPS RISC/os 4.51.
It works fine with our BSD sort but is broken with
the SYS5 sort. Oh well...
-- 
Robert Rodriguez
rr@mips.com
hi-ho, hi-ho, it's off to workstations I go.....

davidsen@crdgw1.crd.ge.com (William E. Davidsen Jr) (09/18/90)

  Several people have told me to use the -b option on the sort. That
doesn't work. I thought I posted the test script with this option set,
but I didn't. Adding -b to the two sort commands does not change the
behavior on any machines to which I have access.

  Sure saves on disk space, though ;-)

-- 
	- bill davidsen (davidsen@crdgw1.crd.ge.com)
	GE Corp. R&D Center; Box 8, KW-C206; Schenectady NY 12345

ggs@ulysses.att.com (Griff Smith) (09/19/90)

In article <2675@crdos1.crd.ge.COM>, davidsen@crdos1.crd.ge.COM (Wm E Davidsen Jr) writes:
> 
>   I have discovered what appears to be a serious bug in the sort
> routine used in several SysV variants including Stellar. Since it
> causes silent loss of data I am cross posting a bit more than I usually
> do.
> 
[deleted some details to save space, followed by test script...]
> 
> sort -nu <<XX >x$$.tmp
>   1: a
>   3: b
>   2: c
>   1: a
>  10: x
> XX
> 
>   Of course someone may tell me it's supposed to work that way, and that
> the BSD version is broken.

I suspect this may be the case.  The system V manual page says this about
the -u option:

     -u	  Unique: suppress all but one in each set of lines hav-
	  ing equal keys.

This doesn't agree with the code, though.  The real behavior matches what
I find in the BSD manual page:

     u	  Suppress all but  one	 in  each  set	of  equal  lines.
	  Ignored bytes	and bytes outside keys do not participate
	  in this comparison.

The next clue is from the System V manual page again:

     -n	  An initial numeric string, consisting	of optional
	  blanks, optional minus sign, and zero	or more	digits
	  with optional	decimal	point, is sorted by arithmetic
	  value.  The -n option	implies	the -b option (see
	  below).  Note	that the -b option is only effective when
	  restricted sort key specifications are in effect.

The tricky point is that a numeric comparison stops as soon as it finds
a non-numeric character.  Since your test file has leading blanks, and
you didn't specify a sort key, the numeric comparison stops when it
sees the leading blank in each record; the test file appears to
contain five empty records as seen by the numeric comparison code.
Furthermore, the -u option suppresses the following escape clause in
the manual page:

     When there	are multiple sort keys,	later keys  are	 compared
     only  after all earlier keys compare equal.  Lines	that oth-
     erwise compare equal are ordered with all bytes significant.

Translation: if a numeric comparison, or a set of keyed comparisons,
shows that two records match, `sort' then compares both records as
simple text to determine whether the records are really identical.  
This `tie breaking' test is suppressed if the -u option is enabled.

Since all five of your test lines appear to be identical, the -u
option deletes all but one of them.  I think the command you want
to use is

	sort -nu +0

This forces a trip through the key finder, which activates the code
that strips leading blanks.

> -- 
> bill davidsen	(davidsen@crdos1.crd.GE.COM -or- uunet!crdgw1!crdos1!davidsen)
>     VMS is a text-only adventure game. If you win you can use unix.

Flames, counter arguments, cheerfully accepted.  I didn't write the
rules, I just work here.
-- 
Griff Smith	AT&T (Bell Laboratories), Murray Hill
Phone:		1-201-582-7736
UUCP:		{most AT&T sites}!ulysses!ggs
Internet:	ggs@ulysses.att.com

guy@auspex.auspex.com (Guy Harris) (09/20/90)

>The next clue is from the System V manual page again:
>
>     -n	  An initial numeric string, consisting	of optional
>	  blanks, optional minus sign, and zero	or more	digits
>	  with optional	decimal	point, is sorted by arithmetic
>	  value.  The -n option	implies	the -b option (see
>	  below).  Note	that the -b option is only effective when
>	  restricted sort key specifications are in effect.

And a further clue comes from the SunOS 4.0.3 manual page:

  SYSTEM V DESCRIPTION
       When no fields are specified in the command line, the System
       V version of sort treats leading blanks as significant, even
       with the -n (numeric collating sequence) option; the lines:
            123
              23

       are collated as:
              23
            123

I.e., this is one of the places where the SunOS "/usr/bin/sort" and
"/usr/5bin/sort" differ in their behavior.

(They're both built from the same source code, BTW, which is basically
the S5R2 "sort", that being significantly faster than the V7-vintage
sort that comes with 4.3BSD; see J. P. Linderman's paper in the
October 1984 AT&T Bell Laboratories Technical Journal, October 1984,
Vol. 63 No. 8 Part 2 - the second special UNIX issue - which discusses
work on sorting that, I think, got into the S5 sort in S5R2.  There's an
"#ifdef S5EMUL" that governs which of the behaviors is selected.)

bdb@becker.UUCP (Bruce D. Becker) (09/21/90)

In article <2675@crdos1.crd.ge.COM> davidsen@crdos1.crd.ge.com (bill davidsen) writes:
|
|  I have discovered what appears to be a serious bug in the sort
|routine used in several SysV variants including Stellar. Since it
|causes silent loss of data I am cross posting a bit more than I usually
|do.
|
|  The problem occurs when the options -n (numeric) and -u (discard
|duplicates) are used together sorting data which has a fixed width
|numeric as the first key field. The results is output of only one line,
|regardless of the input data. I found this by losing 15 months of data
|(yes it was backed up). Since sort is often in shell scripts run from
|cron to do system things, this problem might not be instantly noticed.
|
|  I have generated the following shell script to test for the problem.
| [...]

	This isn't a problem on the AT&T 3B1, although
	Convergent may have used the BSD version...

-- 
  ,u,	 Bruce Becker	Toronto, Ontario
a /i/	 Internet: bdb@becker.UUCP, bruce@gpu.utcs.toronto.edu
 `\o\-e	 UUCP: ...!uunet!mnetor!becker!bdb
 _< /_	 "I still have my phil-os-o-phy" - Meredith Monk

ache@hq.demos.su (Andrew A. Chernov) (09/23/90)

In article <2675@crdos1.crd.ge.COM> davidsen@crdos1.crd.ge.com (bill davidsen) writes:
>
>  I have discovered what appears to be a serious bug in the sort
>routine used in several SysV variants including Stellar. Since it
>causes silent loss of data I am cross posting a bit more than I usually
>do.
>
>  The problem occurs when the options -n (numeric) and -u (discard
>duplicates) are used together sorting data which has a fixed width
>numeric as the first key field. The results is output of only one line,
>regardless of the input data. I found this by losing 15 months of data
>(yes it was backed up). Since sort is often in shell scripts run from
>cron to do system things, this problem might not be instantly noticed.
>
>  I have generated the following shell script to test for the problem.


I have & send my version of sort under SCO XENIX: this test run correctly!

-----------------------------------------------------------------------
Start test of sort error

Starting check
Output appears okay unless diff reported below:
Test ends


=== CUT HERE =============================================================
/*
 * It is version of SORT, maybe old, but BUG FREE!
 */
#include <stdio.h>
#include <ctype.h>
#include <signal.h>
#include <sys/types.h>
#include <sys/stat.h>

#define min(a,b) ((a)<(b)?(a):(b))
#define lcmp(a,b) ((a&0377)-(b&0377))   /* My RUSSIAN version of lcmp was here */

#define	L	512
#define	C	20
#define N       15  /* Max. files number for merge */
#define MEM     (((unsigned)56)*1024) /* 56K max */
#define NF      10 /* Max. fields number */

FILE *is, *os;
char isbuf[BUFSIZ], osbuf[BUFSIZ];

char   *dirtry[] = {
    "/usr/tmp", "/tmp", NULL
};
char  **dirs;
char    file1[30];
char   *file = file1;
char   *filep;
int     nfiles;
unsigned    nlines;
unsigned    ntext;
int    *lspace;
char   *tspace;
int     cmp (), cmpa ();
int     (*compare) () = cmpa;
char   *eol ();
int     term ();
int     mflg;
int     cflg;
int     uflg;
char   *outfil;
int     unsafeout;		/* kludge to assure -m -o works
				   */
char    tabchar;
int     eargc;
char  **eargv;
char   *progname;

char    zero[256];      /* All arrays extended to 256 for my RUSSIAN version */
			/* You may reduce it to 128 */

char    fold[256] = {
    0200, 0201, 0202, 0203, 0204, 0205, 0206, 0207,
    0210, 0211, 0212, 0213, 0214, 0215, 0216, 0217,
    0220, 0221, 0222, 0223, 0224, 0225, 0226, 0227,
    0230, 0231, 0232, 0233, 0234, 0235, 0236, 0237,
    0240, 0241, 0242, 0243, 0244, 0245, 0246, 0247,
    0250, 0251, 0252, 0253, 0254, 0255, 0256, 0257,
    0260, 0261, 0262, 0263, 0264, 0265, 0266, 0267,
    0270, 0271, 0272, 0273, 0274, 0275, 0276, 0277,
    0340, 0341, 0342, 0343, 0344, 0345, 0346, 0347,
    0350, 0351, 0352, 0353, 0354, 0355, 0356, 0357,
    0360, 0361, 0362, 0363, 0364, 0365, 0366, 0367,
    0370, 0371, 0372, 0373, 0374, 0375, 0376, 0337,
    0340, 0341, 0342, 0343, 0344, 0345, 0346, 0347,
    0350, 0351, 0352, 0353, 0354, 0355, 0356, 0357,
    0360, 0361, 0362, 0363, 0364, 0365, 0366, 0367,
    0370, 0371, 0372, 0373, 0374, 0375, 0376, 0337,
    0000, 0001, 0002, 0003, 0004, 0005, 0006, 0007,
    0010, 0011, 0012, 0013, 0014, 0015, 0016, 0017,
    0020, 0021, 0022, 0023, 0024, 0025, 0026, 0027,
    0030, 0031, 0032, 0033, 0034, 0035, 0036, 0037,
    0040, 0041, 0042, 0043, 0044, 0045, 0046, 0047,
    0050, 0051, 0052, 0053, 0054, 0055, 0056, 0057,
    0060, 0061, 0062, 0063, 0064, 0065, 0066, 0067,
    0070, 0071, 0072, 0073, 0074, 0075, 0076, 0077,
    0100, 0101, 0102, 0103, 0104, 0105, 0106, 0107,
    0110, 0111, 0112, 0113, 0114, 0115, 0116, 0117,
    0120, 0121, 0122, 0123, 0124, 0125, 0126, 0127,
    0130, 0131, 0132, 0133, 0134, 0135, 0136, 0137,
    0140, 0101, 0102, 0103, 0104, 0105, 0106, 0107,
    0110, 0111, 0112, 0113, 0114, 0115, 0116, 0117,
    0120, 0121, 0122, 0123, 0124, 0125, 0126, 0127,
    0130, 0131, 0132, 0173, 0174, 0175, 0176, 0177
};
char    nofold[256] = {
    0200, 0201, 0202, 0203, 0204, 0205, 0206, 0207,
    0210, 0211, 0212, 0213, 0214, 0215, 0216, 0217,
    0220, 0221, 0222, 0223, 0224, 0225, 0226, 0227,
    0230, 0231, 0232, 0233, 0234, 0235, 0236, 0237,
    0240, 0241, 0242, 0243, 0244, 0245, 0246, 0247,
    0250, 0251, 0252, 0253, 0254, 0255, 0256, 0257,
    0260, 0261, 0262, 0263, 0264, 0265, 0266, 0267,
    0270, 0271, 0272, 0273, 0274, 0275, 0276, 0277,
    0300, 0301, 0302, 0303, 0304, 0305, 0306, 0307,
    0310, 0311, 0312, 0313, 0314, 0315, 0316, 0317,
    0320, 0321, 0322, 0323, 0324, 0325, 0326, 0327,
    0330, 0331, 0332, 0333, 0334, 0335, 0336, 0337,
    0340, 0341, 0342, 0343, 0344, 0345, 0346, 0347,
    0350, 0351, 0352, 0353, 0354, 0355, 0356, 0357,
    0360, 0361, 0362, 0363, 0364, 0365, 0366, 0367,
    0370, 0371, 0372, 0373, 0374, 0375, 0376, 0377,
    0000, 0001, 0002, 0003, 0004, 0005, 0006, 0007,
    0010, 0011, 0012, 0013, 0014, 0015, 0016, 0017,
    0020, 0021, 0022, 0023, 0024, 0025, 0026, 0027,
    0030, 0031, 0032, 0033, 0034, 0035, 0036, 0037,
    0040, 0041, 0042, 0043, 0044, 0045, 0046, 0047,
    0050, 0051, 0052, 0053, 0054, 0055, 0056, 0057,
    0060, 0061, 0062, 0063, 0064, 0065, 0066, 0067,
    0070, 0071, 0072, 0073, 0074, 0075, 0076, 0077,
    0100, 0101, 0102, 0103, 0104, 0105, 0106, 0107,
    0110, 0111, 0112, 0113, 0114, 0115, 0116, 0117,
    0120, 0121, 0122, 0123, 0124, 0125, 0126, 0127,
    0130, 0131, 0132, 0133, 0134, 0135, 0136, 0137,
    0140, 0141, 0142, 0143, 0144, 0145, 0146, 0147,
    0150, 0151, 0152, 0153, 0154, 0155, 0156, 0157,
    0160, 0161, 0162, 0163, 0164, 0165, 0166, 0167,
    0170, 0171, 0172, 0173, 0174, 0175, 0176, 0177
};

char    nonprint[256] = {
    1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
    1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
    1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
    1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
    0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
    0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
    0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
    0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1,
    1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 1,
    1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
    0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
    0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
    0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
    0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
    0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
    0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1
};

char    dict[256] = {
    1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
    1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
    1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
    1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
    0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
    0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
    0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
    0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1,
    1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 1,
    1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
    0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
    0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1,
    1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
    0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1,
    1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
    0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1
};

struct field {
    char   *code;
    char   *ignore;
    char    nflg;
    char    rflg;
    char    zflg;
    char    bflg[2];
    int     m[2];
    int     n[2];
}               fields[NF];

struct field    proto = {
    nofold + 128,
    zero + 128,
    0,
    1,
    0,
    {0, 0},
    {0, -1},
    {0, 0}
};
int     nfields;
int     error = 1;
char   *setfil ();

main (argc, argv)
char  **argv;
{
    register    a;
    unsigned ep, curr, fp, maxmem;
    char   *arg;
    register struct field   *p,
			    *q;
    int     i;

    for (arg = progname = argv[0]; *arg; arg++)
	if (*arg == '/')
	    progname = arg + 1;
    copyproto ();
    eargv = argv;
    while (--argc > 0) {
	if (**++argv == '-')
	    for (arg = *argv;;) {
		switch (*++arg) {
		    case '\0':
			if (arg[-1] == '-')
			    eargv[eargc++] = "-";
			break;

		    case 'o':
			if (--argc > 0)
			    outfil = *++argv;
			continue;

		    case 'T':
			if (--argc > 0)
			    dirtry[0] = *++argv;
			continue;

		    default:
			field (++*argv, nfields > 0);
			break;
		}
		break;
	    }
	else
	    if (**argv == '+') {
		if (++nfields >= NF) {
		    diag ("too many fields", "");
		    exit (1);
		}
		copyproto ();
		field (++*argv, 0);
	    }
	    else
		eargv[eargc++] = *argv;
    }
    q = &fields[0];
    for (a = 1; a <= nfields; a++) {
	p = &fields[a];
	if (p -> code != proto.code)
	    continue;
	if (p -> ignore != proto.ignore)
	    continue;
	if (p -> nflg != proto.nflg)
	    continue;
	if (p -> rflg != proto.rflg)
	    continue;
	if (p -> zflg != proto.zflg)
	    continue;
	if (p -> bflg[0] != proto.bflg[0])
	    continue;
	if (p -> bflg[1] != proto.bflg[1])
	    continue;
	p -> code = q -> code;
	p -> ignore = q -> ignore;
	p -> nflg = q -> nflg;
	p -> rflg = q -> rflg;
	p -> zflg = q -> zflg;
	p -> bflg[0] = p -> bflg[1] = q -> bflg[0];
    }
    if (eargc == 0)
	eargv[eargc++] = "-";
    if (cflg && eargc > 1) {
	diag ("can check only 1 file", "");
	exit (1);
    }
    safeoutfil ();

    fp = sbrk (0);
    lspace = (int *) fp;
    ep = min (MEM, ((unsigned) ~0) - 512 - fp) &~ 0777;
    maxmem = fp + ep;
    for (curr = fp; ep >= 512 && curr < maxmem; ep /= 2) {
	while ((long) curr + ep < (unsigned) ~0 - 512 && sbrk (ep) != -1) {
	    if ((curr += ep) >= maxmem)
		goto Out;
	}
    }
Out:
    ep = ((unsigned) sbrk (0)) - fp;
    nlines = ep - L;
    nlines /= (5 * (sizeof (char *) / sizeof (char)));
    ntext = nlines * 8;
    tspace = (char *) (lspace + nlines);
    a = -1;
    for (dirs = dirtry; *dirs; dirs++) {
	sprintf (filep = file1, "%s/stm%05uaa", *dirs, getpid ());
	while (*filep)
	    filep++;
	filep -= 2;
	if ((a = creat (file, 0600)) >= 0)
	    break;
    }
    if (a < 0) {
	diag ("can't locate temp: ", file);
	exit (1);
    }
    close (a);
    signal (SIGHUP, term);
    if (signal (SIGINT, SIG_IGN) != SIG_IGN)
	signal (SIGINT, term);
    if (signal (SIGQUIT, SIG_IGN) != SIG_IGN)
	signal (SIGQUIT, term);
    signal (SIGPIPE, term);
    signal (SIGTERM, term);
    nfiles = eargc;
    if (!mflg && !cflg) {
	sort ();
	fclose (stdin);
    }
    for (a = mflg | cflg ? 0 : eargc;
	    a + N < nfiles
	    || unsafeout
	    && a < eargc;
	    a = i) {
	i = a + N;
	if (i >= nfiles)
	    i = nfiles;
	newfile ();
	merge (a, i);
    }
    if (a != nfiles) {
	oldfile ();
	merge (a, nfiles);
    }
    error = 0;
    term ();
}

sort () {
    register char  *cp;
    register char **lp;
    register    c;
    int     done;
    int     i;
    char   *f;

    done = 0;
    i = 0;
    c = EOF;
    do {
	cp = tspace;
	lp = (char **) lspace;
	while (lp < (char **) lspace + nlines
		&& cp < tspace + ntext) {
	    *lp++ = cp;
	    while (c != '\n') {
		if (c != EOF) {
		    *cp++ = c;
		    c = getc (is);
		    continue;
		}
		else
		    if (is)
			fclose (is);
		if (i < eargc) {
		    if ((f = setfil (i++)) == 0)
			is = stdin;
		    else {
			if ((is = fopen (f, "r")) == NULL)
			    cant (f);
			setbuf (is, isbuf);
		    }
		    c = getc (is);
		}
		else
		    break;
	    }
	    *cp++ = '\n';
	    if (c == EOF) {
		done++;
		lp--;
		break;
	    }
	    c = getc (is);
	}
	qsort ((char **) lspace, lp);
	if (done == 0 || nfiles != eargc)
	    newfile ();
	else
	    oldfile ();
	while (lp > (char **) lspace) {
	    cp = *--lp;
	    if (*cp)
		do
		    putc (*cp, os);
		while (*cp++ != '\n');
	}
	fclose (os);
    } while (done == 0);
}

struct merg {
    char    l[L];
            FILE * b;
}          *ibuf[256];

merge (a, b) {
    static char merbufs[N][BUFSIZ];
    struct merg *p;
    register char  *cp,
                   *dp;
    register    i;
    struct merg **ip,
               *jp;
    char   *f;
    int     j;
    int     k,
            l;
    int     muflg;

    p = (struct merg   *) lspace;
    j = 0;
    for (i = a; i < b; i++) {
	f = setfil (i);
	if (f == 0)
	    p -> b = stdin;
	else {
	    if ((p -> b = fopen (f, "r")) == NULL)
		cant (f);
	    setbuf (p -> b, merbufs[i - a]);
	}
	ibuf[j] = p;
	if (!rline (p))
	    j++;
	p++;
    }

    do {
	i = j;
	qsort ((char **) ibuf, (char **) (ibuf + i));
	l = 0;
	while (i--) {
	    cp = ibuf[i] -> l;
	    if (*cp == '\0') {
		l = 1;
		if (rline (ibuf[i])) {
		    k = i;
		    while (++k < j)
			ibuf[k - 1] = ibuf[k];
		    j--;
		}
	    }
	}
    } while (l);

    muflg = mflg & uflg | cflg;
    i = j;
    while (i > 0) {
	cp = ibuf[i - 1] -> l;
	if (!cflg && (uflg == 0 || muflg ||
		    (*compare) (ibuf[i - 1] -> l, ibuf[i - 2] -> l)))
	    do
		putc (*cp, os);
	    while (*cp++ != '\n');
	if (muflg) {
	    cp = ibuf[i - 1] -> l;
	    dp = p -> l;
	    do {
	    } while ((*dp++ = *cp++) != '\n');
	}
	for (;;) {
	    if (rline (ibuf[i - 1])) {
		i--;
		if (i == 0)
		    break;
		if (i == 1)
		    muflg = uflg;
	    }
	    ip = &ibuf[i];
	    while (--ip > ibuf
		    && (*compare) (ip[0] -> l, ip[-1] -> l) < 0) {
		jp = *ip;
		*ip = *(ip - 1);
		*(ip - 1) = jp;
	    }
	    if (!muflg)
		break;
	    j = (*compare) (ibuf[i - 1] -> l, p -> l);
	    if (cflg) {
		if (j > 0)
		    disorder ("disorder: ", ibuf[i - 1] -> l);
		else
		    if (uflg && j == 0)
			disorder ("nonunique: ", ibuf[i - 1] -> l);
	    }
	    else
		if (j == 0)
		    continue;
	    break;
	}
    }
    p = (struct merg   *) lspace;
    for (i = a; i < b; i++) {
	fclose (p -> b);
	p++;
	if (i >= eargc)
	    unlink (setfil (i));
    }
    fclose (os);
}

rline (mp)
struct merg *mp;
{
    register char  *cp;
    register char  *ce;
    FILE * bp;
    register    c;

    bp = mp -> b;
    cp = mp -> l;
    ce = cp + L;
    do {
	c = getc (bp);
	if (c == EOF)
	    return (1);
	if (cp >= ce)
	    cp--;
	*cp++ = c;
    } while (c != '\n');
    return (0);
}

disorder (s, t)
char   *s,
       *t;
{
    register char  *u;
    for (u = t; *u != '\n'; u++);
    *u = 0;
    diag (s, t);
    term ();
}

newfile () {
    register char  *f;

    f = setfil (nfiles);
    if ((os = fopen (f, "w")) == NULL) {
	diag ("can't create ", f);
	term ();
    }
    setbuf (os, osbuf);
    nfiles++;
}

char   *
        setfil (i) {

    if      (i < eargc)
	if      (eargv[i][0] == '-' && eargv[i][1] == '\0')
	            return (0);
	else
	    return (eargv[i]);
    i -= eargc;
    filep[0] = i / 26 + 'a';
    filep[1] = i % 26 + 'a';
    return (file);
}

oldfile () {

    if (outfil) {
	if ((os = fopen (outfil, "w")) == NULL) {
	    diag ("can't create ", outfil);
	    term ();
	}
	setbuf (os, osbuf);
    }
    else
	os = stdout;
}

safeoutfil () {
    register int    i;
    struct stat obuf,
                ibuf;

    if (!mflg || outfil == 0)
	return;
    if (stat (outfil, &obuf) == -1)
	return;
    for (i = eargc - N; i < eargc; i++) {
    /*-N is suff., not nec.*/
	if (stat (eargv[i], &ibuf) == -1)
	    continue;
	if (obuf.st_dev == ibuf.st_dev &&
		obuf.st_ino == ibuf.st_ino)
	    unsafeout++;
    }
}

cant (f)
char   *f;
{

    diag ("can't open ", f);
    term ();
}

diag (s, t)
char   *s,
       *t;
{
    fputs (progname, stderr);
    fputs (": ", stderr);
    fputs (s, stderr);
    fputs (t, stderr);
    putc ('\n', stderr);
}

term () {
    register    i;

    signal (SIGQUIT, SIG_IGN);
    signal (SIGINT, SIG_IGN);
    signal (SIGHUP, SIG_IGN);
    signal (SIGTERM, SIG_IGN);
    if (nfiles == eargc)
	nfiles++;
    for (i = eargc; i <= nfiles; i++) {
    /* <= in case of interrupt */
	unlink (setfil (i));	/* with nfiles not updated */
    }
    exit (error);
}

cmp (i, j)
char   *i,
       *j;
{
    register char  *pa,
                   *pb;
    char   *skip ();
    char   *code,
           *ignore;
    int     a,
            b;
    int     k;
    char   *la,
           *lb;
    register int    sa;
    int     sb;
    char   *ipa,
	   *ipb,
           *jpa,
           *jpb;
    struct field   *fp;

    for (k = nfields > 0; k <= nfields; k++) {
	fp = &fields[k];
	pa = i;
	pb = j;
	if (k) {
	    la = skip (pa, fp, 1);
	    pa = skip (pa, fp, 0);
	    lb = skip (pb, fp, 1);
	    pb = skip (pb, fp, 0);
	}
	else {
	    la = eol (pa);
	    lb = eol (pb);
	}
	if (fp -> nflg || k == 0 && fp -> bflg[0]) {
	    while (blank (*pa))
		pa++;
	    while (blank (*pb))
		pb++;
	}
	if (fp -> nflg) {
	    sa = sb = fp -> rflg;
	    if (*pa == '-') {
		pa++;
		sa = -sa;
	    }
	    if (*pb == '-') {
		pb++;
		sb = -sb;
	    }
	    for (ipa = pa; ipa < la && isdigit (*ipa); ipa++);
	    for (ipb = pb; ipb < lb && isdigit (*ipb); ipb++);
	    jpa = ipa;
	    jpb = ipb;
	    a = 0;
	    if (sa == sb)
		while (ipa > pa && ipb > pb)
		    if (b = *--ipb - *--ipa)
			a = b;
	    while (ipa > pa)
		if (*--ipa != '0')
		    return (-sa);
	    while (ipb > pb)
		if (*--ipb != '0')
		    return (sb);
	    if (a)
		return (a * sa);
	    if (*(pa = jpa) == '.')
		pa++;
	    if (*(pb = jpb) == '.')
		pb++;
	    if (sa == sb)
		while (pa < la && isdigit (*pa)
			&& pb < lb && isdigit (*pb))
		    if (a = *pb++ - *pa++)
			return (a * sa);
	    while (pa < la && isdigit (*pa))
		if (*pa++ != '0')
		    return (-sa);
	    while (pb < lb && isdigit (*pb))
		if (*pb++ != '0')
		    return (sb);
	    continue;
	}
	code = fp -> code;
	ignore = fp -> ignore;
loop:
	while (ignore[*pa])
	    pa++;
	while (ignore[*pb])
	    pb++;
	if (pa >= la || *pa == '\n')
	    if (pb < lb && *pb != '\n')
		return (fp -> rflg);
	    else
		continue;
	if (pb >= lb || *pb == '\n')
	    return (-fp -> rflg);
	/* STUPID Ritchie compiler can't swallow whole expression */
	{ register c1 = code[*pb++], c2 = code[*pa++];
	sa = fp -> zflg ? (c1&0377) - (c2&0377) : lcmp (c1, c2);
	}
	if (sa == 0)
		goto loop;
	return (sa * fp -> rflg);
    }
    if (uflg)
	return (0);
    return (cmpa (i, j));
}

cmpa (pa, pb)
register char  *pa,
               *pb;
{
    while (*pa == *pb) {
	if (*pa++ == '\n')
	    return (0);
	pb++;
    }

    if (*pa == '\n' ) return fields[0].rflg;
    else if (*pb == '\n') return  -fields[0].rflg;
    else if (fields[0].zflg && (*pb&0377) > (*pa&0377)) return fields[0].rflg;
    else if (!fields[0].zflg && lcmp (*pb, *pa) > 0) return fields[0].rflg;
    else return -fields[0].rflg;
}

char   *
        skip (pp, fp, j)
struct field   *fp;
char   *pp;
{
    register    i;
    register char  *p;

    p = pp;
    if ((i = fp -> m[j]) < 0)
	return (eol (p));
    while (i-- > 0) {
	if (tabchar != 0) {
	    while (*p != tabchar)
		if (*p != '\n')
		    p++;
		else
		    goto ret;
	    p++;
	}
	else {
	    while (blank (*p))
		p++;
	    while (!blank (*p))
		if (*p != '\n')
		    p++;
		else
		    goto ret;
	}
    }
    if (fp -> bflg[j])
	while (blank (*p))
	    p++;
    i = fp -> n[j];
    while (i-- > 0) {
	if (*p != '\n')
	    p++;
	else
	    goto ret;
    }
ret:
    return (p);
}

char   *
        eol (p)
register char  *p;
{
    while (*p != '\n')
	p++;
    return (p);
}

copyproto () {
    register    i;
    register int   *p,
                   *q;

    p = (int *) & proto;
    q = (int *) & fields[nfields];
    for (i = 0; i < sizeof (proto) / sizeof (*p); i++)
	*q++ = *p++;
}

field (s, k)
char   *s;
{
    register struct field  *p;
    register    d;
    p = &fields[nfields];
    d = 0;
    for (; *s != 0; s++) {
	switch (*s) {
	    case '\0':
		return;

	    case 'b':
		p -> bflg[k]++;
		break;

	    case 'd':
		p -> ignore = dict + 128;
		break;

	    case 'f':
		p -> code = fold + 128;
		break;

	    case 'i':
		p -> ignore = nonprint + 128;
		break;

	    case 'c':
		cflg = 1;
		continue;

	    case 'm':
		mflg = 1;
		continue;

	    case 'n':
		p -> nflg++;
		break;

	    case 't':
		tabchar = *++s;
		if (tabchar == 0)
		    s--;
		continue;

	    case 'r':
		p -> rflg = -1;
		continue;

	    case 'u':
		uflg = 1;
		break;

	    case 'z':
		p -> zflg++;
		break;

	    case '.':
		if (p -> m[k] == -1)/* -m.n with m missing */
		    p -> m[k] = 0;
		d = &fields[0].n[0] - &fields[0].m[0];
		s++;

	    default:
		if (!isdigit (*s)) {
		    diag (
"usage: [-mucbdfinrz] [-tc] [+p1 [-p2]]... [-o file] [-T dir] [file]...",
"");
		    exit (1);
		}
		p -> m[k + d] = number (&s);
	}
	compare = cmp;
    }
}

number (ppa)
char  **ppa;
{
    int     n;
    register char  *pa;
    pa = *ppa;
    n = 0;
    while (isdigit (*pa)) {
	n = n * 10 + *pa - '0';
	*ppa = pa++;
    }
    return (n);
}

blank (c) {
    if (c == ' ' || c == '\t')
	return (1);
    return (0);
}

#define qsexc(p,q) t= *p;*p= *q;*q=t
#define qstexc(p,q,r) t= *p;*p= *r;*r= *q;*q=t

qsort (a, l)
char  **a,
      **l;
{
    register char **i,
                  **j;
    char  **k;
    char  **lp,
          **hp;
    int     c;
    char   *t;
    unsigned    n;



start:
    if ((n = l - a) <= 1)
	return;


    n /= 2;
    hp = lp = a + n;
    i = a;
    j = l - 1;


    for (;;) {
	if (i < lp) {
	    if ((c = (*compare) (*i, *lp)) == 0) {
		--lp;
		qsexc (i, lp);
		continue;
	    }
	    if (c < 0) {
		++i;
		continue;
	    }
	}

loop:
	if (j > hp) {
	    if ((c = (*compare) (*hp, *j)) == 0) {
		++hp;
		qsexc (hp, j);
		goto loop;
	    }
	    if (c > 0) {
		if (i == lp) {
		    ++hp;
		    qstexc (i, hp, j);
		    i = ++lp;
		    goto loop;
		}
		qsexc (i, j);
		--j;
		++i;
		continue;
	    }
	    --j;
	    goto loop;
	}


	if (i == lp) {
	    if (uflg)
		for (k = lp + 1; k <= hp;)
		    **k++ = '\0';
	    if (lp - a >= l - hp) {
		qsort (hp + 1, l);
		l = lp;
	    }
	    else {
		qsort (a, lp);
		a = hp + 1;
	    }
	    goto start;
	}


	--lp;
	qstexc (j, lp, i);
	j = --hp;
    }
}
=== CUT HERE =============================================================
__________________________________________________________________________
In-Real-Life: Andrew A. Chernov   |  Domain: ache@hq.demos.su,
Zodiac-Sign:  Virgo               |          ache%hq.demos.su@relay.eu.net
Organization: DEMOS Cooperative,  |  Phone:  +7 095 2312129
	      Moscow, USSR        |  Fax:    +7 095 2335016

--
__________________________________________________________________________
In-Real-Life: Andrew A. Chernov   |  Domain: ache@hq.demos.su,
Zodiac-Sign:  Virgo               |          ache%hq.demos.su@relay.eu.net
Organization: DEMOS Cooperative,  |  Phone:  +7 095 2312129

ache@hq.demos.su (Andrew A. Chernov) (09/28/90)

The previously posted source of sort contains parts of V7 source code.
I've been told, that posting this version is against USA copyright laws.
Please, get rid of your copies of this posting. Sorry for inconvenience.

-- 
In-Real-Life: Andrew A. Chernov   |  Domain: ache@hq.demos.su,
Zodiac-Sign:  Virgo               |          ache%hq.demos.su@relay.eu.net
Organization: DEMOS Cooperative,  |  Phone:  +7 095 2312129
	      Moscow, USSR        |  Fax:    +7 095 2335016