davidsen@crdos1.crd.ge.COM (Wm E Davidsen Jr) (09/17/90)
I have discovered what appears to be a serious bug in the sort routine used in several SysV variants including Stellar. Since it causes silent loss of data I am cross posting a bit more than I usually do. The problem occurs when the options -n (numeric) and -u (discard duplicates) are used together sorting data which has a fixed width numeric as the first key field. The results is output of only one line, regardless of the input data. I found this by losing 15 months of data (yes it was backed up). Since sort is often in shell scripts run from cron to do system things, this problem might not be instantly noticed. I have generated the following shell script to test for the problem. #!/bin/sh # this tests sort for bugs in -nu option # as found in SCO Xenix and UNIX echo "" echo "Start test of sort error" echo "" sort -nu <<XX >x$$.tmp 1: a 3: b 2: c 1: a 10: x XX sort -n <<XX | uniq >y$$.tmp 1: a 3: b 2: c 1: a 10: x XX echo "Starting check" if [ `cat x$$.tmp | wc -l` -ne 4 ] then echo "Error in sort with -nu option." echo "Output was:"; cat x$$.tmp echo "Should be:"; cat y$$.tmp else echo "Output appears okay unless diff reported below:" diff x$$.tmp y$$.tmp fi # rm [xy]$$.tmp echo "Test ends" # ================================================================ Of course someone may tell me it's supposed to work that way, and that the BSD version is broken. Suggested workaround is to pipe sort through uniq rather than use the -u option. The form "sort -n +0nu" also worked. If I can find disk space to load the source tape I'll check it further. -- bill davidsen (davidsen@crdos1.crd.GE.COM -or- uunet!crdgw1!crdos1!davidsen) VMS is a text-only adventure game. If you win you can use unix.
terryl@sail.LABS.TEK.COM (09/18/90)
In article <2675@crdos1.crd.ge.COM> davidsen@crdos1.crd.ge.com (bill davidsen) writes:
+
+ I have discovered what appears to be a serious bug in the sort
+routine used in several SysV variants including Stellar. Since it
+causes silent loss of data I am cross posting a bit more than I usually
+do.
+
+ The problem occurs when the options -n (numeric) and -u (discard
+duplicates) are used together sorting data which has a fixed width
+numeric as the first key field. The results is output of only one line,
+regardless of the input data. I found this by losing 15 months of data
+(yes it was backed up). Since sort is often in shell scripts run from
+cron to do system things, this problem might not be instantly noticed.
Interesting. Right now I'm running on a Solbourne, which is a Sun
clone, and it has the dual BSD/SYSV environment.
The bug shows up using the SYSV sort, but not the BSD sort.....
rr@mips.COM (Robert "Bob" Rodriguez) (09/18/90)
I verified the SYS5 sort bug on MIPS RISC/os 4.51. It works fine with our BSD sort but is broken with the SYS5 sort. Oh well... -- Robert Rodriguez rr@mips.com hi-ho, hi-ho, it's off to workstations I go.....
davidsen@crdgw1.crd.ge.com (William E. Davidsen Jr) (09/18/90)
Several people have told me to use the -b option on the sort. That doesn't work. I thought I posted the test script with this option set, but I didn't. Adding -b to the two sort commands does not change the behavior on any machines to which I have access. Sure saves on disk space, though ;-) -- - bill davidsen (davidsen@crdgw1.crd.ge.com) GE Corp. R&D Center; Box 8, KW-C206; Schenectady NY 12345
ggs@ulysses.att.com (Griff Smith) (09/19/90)
In article <2675@crdos1.crd.ge.COM>, davidsen@crdos1.crd.ge.COM (Wm E Davidsen Jr) writes: > > I have discovered what appears to be a serious bug in the sort > routine used in several SysV variants including Stellar. Since it > causes silent loss of data I am cross posting a bit more than I usually > do. > [deleted some details to save space, followed by test script...] > > sort -nu <<XX >x$$.tmp > 1: a > 3: b > 2: c > 1: a > 10: x > XX > > Of course someone may tell me it's supposed to work that way, and that > the BSD version is broken. I suspect this may be the case. The system V manual page says this about the -u option: -u Unique: suppress all but one in each set of lines hav- ing equal keys. This doesn't agree with the code, though. The real behavior matches what I find in the BSD manual page: u Suppress all but one in each set of equal lines. Ignored bytes and bytes outside keys do not participate in this comparison. The next clue is from the System V manual page again: -n An initial numeric string, consisting of optional blanks, optional minus sign, and zero or more digits with optional decimal point, is sorted by arithmetic value. The -n option implies the -b option (see below). Note that the -b option is only effective when restricted sort key specifications are in effect. The tricky point is that a numeric comparison stops as soon as it finds a non-numeric character. Since your test file has leading blanks, and you didn't specify a sort key, the numeric comparison stops when it sees the leading blank in each record; the test file appears to contain five empty records as seen by the numeric comparison code. Furthermore, the -u option suppresses the following escape clause in the manual page: When there are multiple sort keys, later keys are compared only after all earlier keys compare equal. Lines that oth- erwise compare equal are ordered with all bytes significant. Translation: if a numeric comparison, or a set of keyed comparisons, shows that two records match, `sort' then compares both records as simple text to determine whether the records are really identical. This `tie breaking' test is suppressed if the -u option is enabled. Since all five of your test lines appear to be identical, the -u option deletes all but one of them. I think the command you want to use is sort -nu +0 This forces a trip through the key finder, which activates the code that strips leading blanks. > -- > bill davidsen (davidsen@crdos1.crd.GE.COM -or- uunet!crdgw1!crdos1!davidsen) > VMS is a text-only adventure game. If you win you can use unix. Flames, counter arguments, cheerfully accepted. I didn't write the rules, I just work here. -- Griff Smith AT&T (Bell Laboratories), Murray Hill Phone: 1-201-582-7736 UUCP: {most AT&T sites}!ulysses!ggs Internet: ggs@ulysses.att.com
guy@auspex.auspex.com (Guy Harris) (09/20/90)
>The next clue is from the System V manual page again: > > -n An initial numeric string, consisting of optional > blanks, optional minus sign, and zero or more digits > with optional decimal point, is sorted by arithmetic > value. The -n option implies the -b option (see > below). Note that the -b option is only effective when > restricted sort key specifications are in effect. And a further clue comes from the SunOS 4.0.3 manual page: SYSTEM V DESCRIPTION When no fields are specified in the command line, the System V version of sort treats leading blanks as significant, even with the -n (numeric collating sequence) option; the lines: 123 23 are collated as: 23 123 I.e., this is one of the places where the SunOS "/usr/bin/sort" and "/usr/5bin/sort" differ in their behavior. (They're both built from the same source code, BTW, which is basically the S5R2 "sort", that being significantly faster than the V7-vintage sort that comes with 4.3BSD; see J. P. Linderman's paper in the October 1984 AT&T Bell Laboratories Technical Journal, October 1984, Vol. 63 No. 8 Part 2 - the second special UNIX issue - which discusses work on sorting that, I think, got into the S5 sort in S5R2. There's an "#ifdef S5EMUL" that governs which of the behaviors is selected.)
bdb@becker.UUCP (Bruce D. Becker) (09/21/90)
In article <2675@crdos1.crd.ge.COM> davidsen@crdos1.crd.ge.com (bill davidsen) writes: | | I have discovered what appears to be a serious bug in the sort |routine used in several SysV variants including Stellar. Since it |causes silent loss of data I am cross posting a bit more than I usually |do. | | The problem occurs when the options -n (numeric) and -u (discard |duplicates) are used together sorting data which has a fixed width |numeric as the first key field. The results is output of only one line, |regardless of the input data. I found this by losing 15 months of data |(yes it was backed up). Since sort is often in shell scripts run from |cron to do system things, this problem might not be instantly noticed. | | I have generated the following shell script to test for the problem. | [...] This isn't a problem on the AT&T 3B1, although Convergent may have used the BSD version... -- ,u, Bruce Becker Toronto, Ontario a /i/ Internet: bdb@becker.UUCP, bruce@gpu.utcs.toronto.edu `\o\-e UUCP: ...!uunet!mnetor!becker!bdb _< /_ "I still have my phil-os-o-phy" - Meredith Monk
ache@hq.demos.su (Andrew A. Chernov) (09/23/90)
In article <2675@crdos1.crd.ge.COM> davidsen@crdos1.crd.ge.com (bill davidsen) writes: > > I have discovered what appears to be a serious bug in the sort >routine used in several SysV variants including Stellar. Since it >causes silent loss of data I am cross posting a bit more than I usually >do. > > The problem occurs when the options -n (numeric) and -u (discard >duplicates) are used together sorting data which has a fixed width >numeric as the first key field. The results is output of only one line, >regardless of the input data. I found this by losing 15 months of data >(yes it was backed up). Since sort is often in shell scripts run from >cron to do system things, this problem might not be instantly noticed. > > I have generated the following shell script to test for the problem. I have & send my version of sort under SCO XENIX: this test run correctly! ----------------------------------------------------------------------- Start test of sort error Starting check Output appears okay unless diff reported below: Test ends === CUT HERE ============================================================= /* * It is version of SORT, maybe old, but BUG FREE! */ #include <stdio.h> #include <ctype.h> #include <signal.h> #include <sys/types.h> #include <sys/stat.h> #define min(a,b) ((a)<(b)?(a):(b)) #define lcmp(a,b) ((a&0377)-(b&0377)) /* My RUSSIAN version of lcmp was here */ #define L 512 #define C 20 #define N 15 /* Max. files number for merge */ #define MEM (((unsigned)56)*1024) /* 56K max */ #define NF 10 /* Max. fields number */ FILE *is, *os; char isbuf[BUFSIZ], osbuf[BUFSIZ]; char *dirtry[] = { "/usr/tmp", "/tmp", NULL }; char **dirs; char file1[30]; char *file = file1; char *filep; int nfiles; unsigned nlines; unsigned ntext; int *lspace; char *tspace; int cmp (), cmpa (); int (*compare) () = cmpa; char *eol (); int term (); int mflg; int cflg; int uflg; char *outfil; int unsafeout; /* kludge to assure -m -o works */ char tabchar; int eargc; char **eargv; char *progname; char zero[256]; /* All arrays extended to 256 for my RUSSIAN version */ /* You may reduce it to 128 */ char fold[256] = { 0200, 0201, 0202, 0203, 0204, 0205, 0206, 0207, 0210, 0211, 0212, 0213, 0214, 0215, 0216, 0217, 0220, 0221, 0222, 0223, 0224, 0225, 0226, 0227, 0230, 0231, 0232, 0233, 0234, 0235, 0236, 0237, 0240, 0241, 0242, 0243, 0244, 0245, 0246, 0247, 0250, 0251, 0252, 0253, 0254, 0255, 0256, 0257, 0260, 0261, 0262, 0263, 0264, 0265, 0266, 0267, 0270, 0271, 0272, 0273, 0274, 0275, 0276, 0277, 0340, 0341, 0342, 0343, 0344, 0345, 0346, 0347, 0350, 0351, 0352, 0353, 0354, 0355, 0356, 0357, 0360, 0361, 0362, 0363, 0364, 0365, 0366, 0367, 0370, 0371, 0372, 0373, 0374, 0375, 0376, 0337, 0340, 0341, 0342, 0343, 0344, 0345, 0346, 0347, 0350, 0351, 0352, 0353, 0354, 0355, 0356, 0357, 0360, 0361, 0362, 0363, 0364, 0365, 0366, 0367, 0370, 0371, 0372, 0373, 0374, 0375, 0376, 0337, 0000, 0001, 0002, 0003, 0004, 0005, 0006, 0007, 0010, 0011, 0012, 0013, 0014, 0015, 0016, 0017, 0020, 0021, 0022, 0023, 0024, 0025, 0026, 0027, 0030, 0031, 0032, 0033, 0034, 0035, 0036, 0037, 0040, 0041, 0042, 0043, 0044, 0045, 0046, 0047, 0050, 0051, 0052, 0053, 0054, 0055, 0056, 0057, 0060, 0061, 0062, 0063, 0064, 0065, 0066, 0067, 0070, 0071, 0072, 0073, 0074, 0075, 0076, 0077, 0100, 0101, 0102, 0103, 0104, 0105, 0106, 0107, 0110, 0111, 0112, 0113, 0114, 0115, 0116, 0117, 0120, 0121, 0122, 0123, 0124, 0125, 0126, 0127, 0130, 0131, 0132, 0133, 0134, 0135, 0136, 0137, 0140, 0101, 0102, 0103, 0104, 0105, 0106, 0107, 0110, 0111, 0112, 0113, 0114, 0115, 0116, 0117, 0120, 0121, 0122, 0123, 0124, 0125, 0126, 0127, 0130, 0131, 0132, 0173, 0174, 0175, 0176, 0177 }; char nofold[256] = { 0200, 0201, 0202, 0203, 0204, 0205, 0206, 0207, 0210, 0211, 0212, 0213, 0214, 0215, 0216, 0217, 0220, 0221, 0222, 0223, 0224, 0225, 0226, 0227, 0230, 0231, 0232, 0233, 0234, 0235, 0236, 0237, 0240, 0241, 0242, 0243, 0244, 0245, 0246, 0247, 0250, 0251, 0252, 0253, 0254, 0255, 0256, 0257, 0260, 0261, 0262, 0263, 0264, 0265, 0266, 0267, 0270, 0271, 0272, 0273, 0274, 0275, 0276, 0277, 0300, 0301, 0302, 0303, 0304, 0305, 0306, 0307, 0310, 0311, 0312, 0313, 0314, 0315, 0316, 0317, 0320, 0321, 0322, 0323, 0324, 0325, 0326, 0327, 0330, 0331, 0332, 0333, 0334, 0335, 0336, 0337, 0340, 0341, 0342, 0343, 0344, 0345, 0346, 0347, 0350, 0351, 0352, 0353, 0354, 0355, 0356, 0357, 0360, 0361, 0362, 0363, 0364, 0365, 0366, 0367, 0370, 0371, 0372, 0373, 0374, 0375, 0376, 0377, 0000, 0001, 0002, 0003, 0004, 0005, 0006, 0007, 0010, 0011, 0012, 0013, 0014, 0015, 0016, 0017, 0020, 0021, 0022, 0023, 0024, 0025, 0026, 0027, 0030, 0031, 0032, 0033, 0034, 0035, 0036, 0037, 0040, 0041, 0042, 0043, 0044, 0045, 0046, 0047, 0050, 0051, 0052, 0053, 0054, 0055, 0056, 0057, 0060, 0061, 0062, 0063, 0064, 0065, 0066, 0067, 0070, 0071, 0072, 0073, 0074, 0075, 0076, 0077, 0100, 0101, 0102, 0103, 0104, 0105, 0106, 0107, 0110, 0111, 0112, 0113, 0114, 0115, 0116, 0117, 0120, 0121, 0122, 0123, 0124, 0125, 0126, 0127, 0130, 0131, 0132, 0133, 0134, 0135, 0136, 0137, 0140, 0141, 0142, 0143, 0144, 0145, 0146, 0147, 0150, 0151, 0152, 0153, 0154, 0155, 0156, 0157, 0160, 0161, 0162, 0163, 0164, 0165, 0166, 0167, 0170, 0171, 0172, 0173, 0174, 0175, 0176, 0177 }; char nonprint[256] = { 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1 }; char dict[256] = { 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1 }; struct field { char *code; char *ignore; char nflg; char rflg; char zflg; char bflg[2]; int m[2]; int n[2]; } fields[NF]; struct field proto = { nofold + 128, zero + 128, 0, 1, 0, {0, 0}, {0, -1}, {0, 0} }; int nfields; int error = 1; char *setfil (); main (argc, argv) char **argv; { register a; unsigned ep, curr, fp, maxmem; char *arg; register struct field *p, *q; int i; for (arg = progname = argv[0]; *arg; arg++) if (*arg == '/') progname = arg + 1; copyproto (); eargv = argv; while (--argc > 0) { if (**++argv == '-') for (arg = *argv;;) { switch (*++arg) { case '\0': if (arg[-1] == '-') eargv[eargc++] = "-"; break; case 'o': if (--argc > 0) outfil = *++argv; continue; case 'T': if (--argc > 0) dirtry[0] = *++argv; continue; default: field (++*argv, nfields > 0); break; } break; } else if (**argv == '+') { if (++nfields >= NF) { diag ("too many fields", ""); exit (1); } copyproto (); field (++*argv, 0); } else eargv[eargc++] = *argv; } q = &fields[0]; for (a = 1; a <= nfields; a++) { p = &fields[a]; if (p -> code != proto.code) continue; if (p -> ignore != proto.ignore) continue; if (p -> nflg != proto.nflg) continue; if (p -> rflg != proto.rflg) continue; if (p -> zflg != proto.zflg) continue; if (p -> bflg[0] != proto.bflg[0]) continue; if (p -> bflg[1] != proto.bflg[1]) continue; p -> code = q -> code; p -> ignore = q -> ignore; p -> nflg = q -> nflg; p -> rflg = q -> rflg; p -> zflg = q -> zflg; p -> bflg[0] = p -> bflg[1] = q -> bflg[0]; } if (eargc == 0) eargv[eargc++] = "-"; if (cflg && eargc > 1) { diag ("can check only 1 file", ""); exit (1); } safeoutfil (); fp = sbrk (0); lspace = (int *) fp; ep = min (MEM, ((unsigned) ~0) - 512 - fp) &~ 0777; maxmem = fp + ep; for (curr = fp; ep >= 512 && curr < maxmem; ep /= 2) { while ((long) curr + ep < (unsigned) ~0 - 512 && sbrk (ep) != -1) { if ((curr += ep) >= maxmem) goto Out; } } Out: ep = ((unsigned) sbrk (0)) - fp; nlines = ep - L; nlines /= (5 * (sizeof (char *) / sizeof (char))); ntext = nlines * 8; tspace = (char *) (lspace + nlines); a = -1; for (dirs = dirtry; *dirs; dirs++) { sprintf (filep = file1, "%s/stm%05uaa", *dirs, getpid ()); while (*filep) filep++; filep -= 2; if ((a = creat (file, 0600)) >= 0) break; } if (a < 0) { diag ("can't locate temp: ", file); exit (1); } close (a); signal (SIGHUP, term); if (signal (SIGINT, SIG_IGN) != SIG_IGN) signal (SIGINT, term); if (signal (SIGQUIT, SIG_IGN) != SIG_IGN) signal (SIGQUIT, term); signal (SIGPIPE, term); signal (SIGTERM, term); nfiles = eargc; if (!mflg && !cflg) { sort (); fclose (stdin); } for (a = mflg | cflg ? 0 : eargc; a + N < nfiles || unsafeout && a < eargc; a = i) { i = a + N; if (i >= nfiles) i = nfiles; newfile (); merge (a, i); } if (a != nfiles) { oldfile (); merge (a, nfiles); } error = 0; term (); } sort () { register char *cp; register char **lp; register c; int done; int i; char *f; done = 0; i = 0; c = EOF; do { cp = tspace; lp = (char **) lspace; while (lp < (char **) lspace + nlines && cp < tspace + ntext) { *lp++ = cp; while (c != '\n') { if (c != EOF) { *cp++ = c; c = getc (is); continue; } else if (is) fclose (is); if (i < eargc) { if ((f = setfil (i++)) == 0) is = stdin; else { if ((is = fopen (f, "r")) == NULL) cant (f); setbuf (is, isbuf); } c = getc (is); } else break; } *cp++ = '\n'; if (c == EOF) { done++; lp--; break; } c = getc (is); } qsort ((char **) lspace, lp); if (done == 0 || nfiles != eargc) newfile (); else oldfile (); while (lp > (char **) lspace) { cp = *--lp; if (*cp) do putc (*cp, os); while (*cp++ != '\n'); } fclose (os); } while (done == 0); } struct merg { char l[L]; FILE * b; } *ibuf[256]; merge (a, b) { static char merbufs[N][BUFSIZ]; struct merg *p; register char *cp, *dp; register i; struct merg **ip, *jp; char *f; int j; int k, l; int muflg; p = (struct merg *) lspace; j = 0; for (i = a; i < b; i++) { f = setfil (i); if (f == 0) p -> b = stdin; else { if ((p -> b = fopen (f, "r")) == NULL) cant (f); setbuf (p -> b, merbufs[i - a]); } ibuf[j] = p; if (!rline (p)) j++; p++; } do { i = j; qsort ((char **) ibuf, (char **) (ibuf + i)); l = 0; while (i--) { cp = ibuf[i] -> l; if (*cp == '\0') { l = 1; if (rline (ibuf[i])) { k = i; while (++k < j) ibuf[k - 1] = ibuf[k]; j--; } } } } while (l); muflg = mflg & uflg | cflg; i = j; while (i > 0) { cp = ibuf[i - 1] -> l; if (!cflg && (uflg == 0 || muflg || (*compare) (ibuf[i - 1] -> l, ibuf[i - 2] -> l))) do putc (*cp, os); while (*cp++ != '\n'); if (muflg) { cp = ibuf[i - 1] -> l; dp = p -> l; do { } while ((*dp++ = *cp++) != '\n'); } for (;;) { if (rline (ibuf[i - 1])) { i--; if (i == 0) break; if (i == 1) muflg = uflg; } ip = &ibuf[i]; while (--ip > ibuf && (*compare) (ip[0] -> l, ip[-1] -> l) < 0) { jp = *ip; *ip = *(ip - 1); *(ip - 1) = jp; } if (!muflg) break; j = (*compare) (ibuf[i - 1] -> l, p -> l); if (cflg) { if (j > 0) disorder ("disorder: ", ibuf[i - 1] -> l); else if (uflg && j == 0) disorder ("nonunique: ", ibuf[i - 1] -> l); } else if (j == 0) continue; break; } } p = (struct merg *) lspace; for (i = a; i < b; i++) { fclose (p -> b); p++; if (i >= eargc) unlink (setfil (i)); } fclose (os); } rline (mp) struct merg *mp; { register char *cp; register char *ce; FILE * bp; register c; bp = mp -> b; cp = mp -> l; ce = cp + L; do { c = getc (bp); if (c == EOF) return (1); if (cp >= ce) cp--; *cp++ = c; } while (c != '\n'); return (0); } disorder (s, t) char *s, *t; { register char *u; for (u = t; *u != '\n'; u++); *u = 0; diag (s, t); term (); } newfile () { register char *f; f = setfil (nfiles); if ((os = fopen (f, "w")) == NULL) { diag ("can't create ", f); term (); } setbuf (os, osbuf); nfiles++; } char * setfil (i) { if (i < eargc) if (eargv[i][0] == '-' && eargv[i][1] == '\0') return (0); else return (eargv[i]); i -= eargc; filep[0] = i / 26 + 'a'; filep[1] = i % 26 + 'a'; return (file); } oldfile () { if (outfil) { if ((os = fopen (outfil, "w")) == NULL) { diag ("can't create ", outfil); term (); } setbuf (os, osbuf); } else os = stdout; } safeoutfil () { register int i; struct stat obuf, ibuf; if (!mflg || outfil == 0) return; if (stat (outfil, &obuf) == -1) return; for (i = eargc - N; i < eargc; i++) { /*-N is suff., not nec.*/ if (stat (eargv[i], &ibuf) == -1) continue; if (obuf.st_dev == ibuf.st_dev && obuf.st_ino == ibuf.st_ino) unsafeout++; } } cant (f) char *f; { diag ("can't open ", f); term (); } diag (s, t) char *s, *t; { fputs (progname, stderr); fputs (": ", stderr); fputs (s, stderr); fputs (t, stderr); putc ('\n', stderr); } term () { register i; signal (SIGQUIT, SIG_IGN); signal (SIGINT, SIG_IGN); signal (SIGHUP, SIG_IGN); signal (SIGTERM, SIG_IGN); if (nfiles == eargc) nfiles++; for (i = eargc; i <= nfiles; i++) { /* <= in case of interrupt */ unlink (setfil (i)); /* with nfiles not updated */ } exit (error); } cmp (i, j) char *i, *j; { register char *pa, *pb; char *skip (); char *code, *ignore; int a, b; int k; char *la, *lb; register int sa; int sb; char *ipa, *ipb, *jpa, *jpb; struct field *fp; for (k = nfields > 0; k <= nfields; k++) { fp = &fields[k]; pa = i; pb = j; if (k) { la = skip (pa, fp, 1); pa = skip (pa, fp, 0); lb = skip (pb, fp, 1); pb = skip (pb, fp, 0); } else { la = eol (pa); lb = eol (pb); } if (fp -> nflg || k == 0 && fp -> bflg[0]) { while (blank (*pa)) pa++; while (blank (*pb)) pb++; } if (fp -> nflg) { sa = sb = fp -> rflg; if (*pa == '-') { pa++; sa = -sa; } if (*pb == '-') { pb++; sb = -sb; } for (ipa = pa; ipa < la && isdigit (*ipa); ipa++); for (ipb = pb; ipb < lb && isdigit (*ipb); ipb++); jpa = ipa; jpb = ipb; a = 0; if (sa == sb) while (ipa > pa && ipb > pb) if (b = *--ipb - *--ipa) a = b; while (ipa > pa) if (*--ipa != '0') return (-sa); while (ipb > pb) if (*--ipb != '0') return (sb); if (a) return (a * sa); if (*(pa = jpa) == '.') pa++; if (*(pb = jpb) == '.') pb++; if (sa == sb) while (pa < la && isdigit (*pa) && pb < lb && isdigit (*pb)) if (a = *pb++ - *pa++) return (a * sa); while (pa < la && isdigit (*pa)) if (*pa++ != '0') return (-sa); while (pb < lb && isdigit (*pb)) if (*pb++ != '0') return (sb); continue; } code = fp -> code; ignore = fp -> ignore; loop: while (ignore[*pa]) pa++; while (ignore[*pb]) pb++; if (pa >= la || *pa == '\n') if (pb < lb && *pb != '\n') return (fp -> rflg); else continue; if (pb >= lb || *pb == '\n') return (-fp -> rflg); /* STUPID Ritchie compiler can't swallow whole expression */ { register c1 = code[*pb++], c2 = code[*pa++]; sa = fp -> zflg ? (c1&0377) - (c2&0377) : lcmp (c1, c2); } if (sa == 0) goto loop; return (sa * fp -> rflg); } if (uflg) return (0); return (cmpa (i, j)); } cmpa (pa, pb) register char *pa, *pb; { while (*pa == *pb) { if (*pa++ == '\n') return (0); pb++; } if (*pa == '\n' ) return fields[0].rflg; else if (*pb == '\n') return -fields[0].rflg; else if (fields[0].zflg && (*pb&0377) > (*pa&0377)) return fields[0].rflg; else if (!fields[0].zflg && lcmp (*pb, *pa) > 0) return fields[0].rflg; else return -fields[0].rflg; } char * skip (pp, fp, j) struct field *fp; char *pp; { register i; register char *p; p = pp; if ((i = fp -> m[j]) < 0) return (eol (p)); while (i-- > 0) { if (tabchar != 0) { while (*p != tabchar) if (*p != '\n') p++; else goto ret; p++; } else { while (blank (*p)) p++; while (!blank (*p)) if (*p != '\n') p++; else goto ret; } } if (fp -> bflg[j]) while (blank (*p)) p++; i = fp -> n[j]; while (i-- > 0) { if (*p != '\n') p++; else goto ret; } ret: return (p); } char * eol (p) register char *p; { while (*p != '\n') p++; return (p); } copyproto () { register i; register int *p, *q; p = (int *) & proto; q = (int *) & fields[nfields]; for (i = 0; i < sizeof (proto) / sizeof (*p); i++) *q++ = *p++; } field (s, k) char *s; { register struct field *p; register d; p = &fields[nfields]; d = 0; for (; *s != 0; s++) { switch (*s) { case '\0': return; case 'b': p -> bflg[k]++; break; case 'd': p -> ignore = dict + 128; break; case 'f': p -> code = fold + 128; break; case 'i': p -> ignore = nonprint + 128; break; case 'c': cflg = 1; continue; case 'm': mflg = 1; continue; case 'n': p -> nflg++; break; case 't': tabchar = *++s; if (tabchar == 0) s--; continue; case 'r': p -> rflg = -1; continue; case 'u': uflg = 1; break; case 'z': p -> zflg++; break; case '.': if (p -> m[k] == -1)/* -m.n with m missing */ p -> m[k] = 0; d = &fields[0].n[0] - &fields[0].m[0]; s++; default: if (!isdigit (*s)) { diag ( "usage: [-mucbdfinrz] [-tc] [+p1 [-p2]]... [-o file] [-T dir] [file]...", ""); exit (1); } p -> m[k + d] = number (&s); } compare = cmp; } } number (ppa) char **ppa; { int n; register char *pa; pa = *ppa; n = 0; while (isdigit (*pa)) { n = n * 10 + *pa - '0'; *ppa = pa++; } return (n); } blank (c) { if (c == ' ' || c == '\t') return (1); return (0); } #define qsexc(p,q) t= *p;*p= *q;*q=t #define qstexc(p,q,r) t= *p;*p= *r;*r= *q;*q=t qsort (a, l) char **a, **l; { register char **i, **j; char **k; char **lp, **hp; int c; char *t; unsigned n; start: if ((n = l - a) <= 1) return; n /= 2; hp = lp = a + n; i = a; j = l - 1; for (;;) { if (i < lp) { if ((c = (*compare) (*i, *lp)) == 0) { --lp; qsexc (i, lp); continue; } if (c < 0) { ++i; continue; } } loop: if (j > hp) { if ((c = (*compare) (*hp, *j)) == 0) { ++hp; qsexc (hp, j); goto loop; } if (c > 0) { if (i == lp) { ++hp; qstexc (i, hp, j); i = ++lp; goto loop; } qsexc (i, j); --j; ++i; continue; } --j; goto loop; } if (i == lp) { if (uflg) for (k = lp + 1; k <= hp;) **k++ = '\0'; if (lp - a >= l - hp) { qsort (hp + 1, l); l = lp; } else { qsort (a, lp); a = hp + 1; } goto start; } --lp; qstexc (j, lp, i); j = --hp; } } === CUT HERE ============================================================= __________________________________________________________________________ In-Real-Life: Andrew A. Chernov | Domain: ache@hq.demos.su, Zodiac-Sign: Virgo | ache%hq.demos.su@relay.eu.net Organization: DEMOS Cooperative, | Phone: +7 095 2312129 Moscow, USSR | Fax: +7 095 2335016 -- __________________________________________________________________________ In-Real-Life: Andrew A. Chernov | Domain: ache@hq.demos.su, Zodiac-Sign: Virgo | ache%hq.demos.su@relay.eu.net Organization: DEMOS Cooperative, | Phone: +7 095 2312129
ache@hq.demos.su (Andrew A. Chernov) (09/28/90)
The previously posted source of sort contains parts of V7 source code. I've been told, that posting this version is against USA copyright laws. Please, get rid of your copies of this posting. Sorry for inconvenience. -- In-Real-Life: Andrew A. Chernov | Domain: ache@hq.demos.su, Zodiac-Sign: Virgo | ache%hq.demos.su@relay.eu.net Organization: DEMOS Cooperative, | Phone: +7 095 2312129 Moscow, USSR | Fax: +7 095 2335016