[comp.unix.xenix] PD diff for RCS

jennings@anchor.colorado.edu (Jeff Jennings) (02/18/89)
here's a diff program (cdiff) recently posted to the net that i hacked up to
do RCS style diffs.  i just looked at the RCS code and found that the RCS
programs called diff with the -n option, so i ran diff -n on a machine with
RCS and studied the output.  then i hacked up the code in cdiff to produce
the same output.  i think i got all the cases, in any event, i currently use
this diff with RCS on XENIX and on DOS.  you can get RCS from many places 
on the net, try arthur.cs.purdue.edu.  anyway, for those of us who have had
problems getting GNU diff running on a 286, this will do RCS diffs.  have fun.

-------------------------- CUT HERE ------------------------------------------
#! /bin/sh
# This is a shell archive, meaning:
# 1. Remove everything above the #! /bin/sh line.
# 2. Save the resulting text in a file.
# 3. Execute the file with /bin/sh (not csh) to create the files:
#	diff.1
#	diff.c
#	diff.doc
#	makefile
#	readme
# This archive created: Fri Feb 17 19:47:12 1989
export PATH; PATH=/bin:$PATH
if test -f 'diff.1'
then
	echo shar: will not over-write existing file "'diff.1'"
else
cat << \SHAR_EOF > 'diff.1'
.TH DIFF 1
.SH NAME
diff \- Public Domain diff (context diff) program
.SH SYNOPSIS
.B diff
[
.B \-b \-c \-i \-e \-n \-o 
file
] file1 file2
.SH DESCRIPTION
.I Diff\^
compares two files, showing what must be changed to make them
identical. Either file1 or file2 (but not both) may refer to
directories. If that is the case, a file in the directory whose
name is the same as the other file argument will be used. The
standard input may be used for one of the files by replacing the
argument by "-". Except for the standard input, both files must
be on disk devices.
.SH OPTIONS
.HP
.B \-b  
Remove trailing whitespace (blanks and tabs)
and compress all other strings of whitespace to a single blank.
.HP
.B \-c  
Print some context -- matching lines before
and after the non-match section.  Mark non-matched sections
with "|".
.HP
.B \-e  
Output is in an "editor script" format which
is compatible with the Unix 'ed' editor.
.HP
.B \-i  
Ignore lower/upper case distinctions.
.HP
.B \-n  
Output is in an RCS style format for use with RCS tools.
.HP
.B \-o outfile  
Place output in file "outfile" rather than on stdout.
.PP
All information needed to compare the files is maintained in main
memory. This means that very large files (or fairly large files
with many differences) will cause the program to abort with an
"out of space" message. Main memory requirements (in words) are
approximately:
.br

.ce
2 * (length of file1 + length of file2)
.br
.ce
+ 3 * (number of changes)
.br
.PP
(Where "length" is the number of lines of data in each file.)
.PP
The algorithm reads each file twice, once to build hash tables
and once to check for fortuitous matches (two lines that are in
fact different, but which have the same hash value). CPU time
requirements include sorting the hash tables and randomly
searching memory tables for equivalence classes. For example, on
a time-shared VAX-11/780, comparing two 1000 line files required
about 30 seconds (elapsed clock time) and about 10,000 bytes of
working storage. About 90 per-cent of the time was taken up by
file I/O.
.SH DIAGNOSTICS
.HP
Warning, bad option 'x'
.br
The option is ignored.
.HP
Usage ...
.br
Two input files were not specified.
.HP
Can't open input file "filename".
.br
Can't continue.
.HP
Out of space
.br
The program ran out of memory while comparing the two files.
.HP
Can't read line nnn at xxx in file[A/B]
.br
This indicates an I/O error when seeking to the specific line.
It should not happen.
.HP
Spurious match, output is not optimal.
.br
Two lines that were different yielded the same hash value.  This is
harmless except that the difference output is not the minimum set of
differences between the two files.  For example, instead of the output:
.ce
lines 1 to 5 were changed to ...
.br
the program will print
.ce
lines 1 to 3 were changed to ...
.br
.ce
lines 4 to 5 were changed to ...
.br
.HP
The program uses a CRC16 hash code.
.br
The likelihood of this error is
quite small.
.SH AUTHOR
The diff algorithm was developed by J. W. Hunt and M. D. McIlroy,
using a central algorithm defined by H. S. Stone.
It was published in:
.in +5
.nf
Hunt, J. W., and McIlroy, M. D.,
An Algorithm for Differential File Comparison,
Computing Science Technical Report #41,
Bell Laboratories, Murray Hill, NJ  07974
.fi
.in -5
.SH BUGS
On RSX and DECUS C on VMS systems, diff may fail if the both
files are not "variable-length, implied carriage control" format.
The scopy program can be used to convert files to this format if
problems arise.
.PP
When compiled under VAX C, diff handles STREAM_LF files properly
(in addition to the canonical variable-length implied carriage
control files). Other variations should work, but have not been
tested.
.PP
When compiled under VAX C, diff is quite slow for unknown reasons
which ought to be investigated. On the other hand, it has access
to effectively unlimited memory.
.PP
Output in a form suitable for ed - the -e option - seems rather
pointless; the analogue on DEC systems is SLP (SUMSLP on VMS). It
would be simple to provide SLP-compatible output. The question
is, why bother - since the various DEC file comparison utilities
already produce it.
SHAR_EOF
fi # end of overwriting check
if test -f 'diff.c'
then
	echo shar: will not over-write existing file "'diff.c'"
else
cat << \SHAR_EOF > 'diff.c'
/*
 * diff.c - public domain context diff program
 *
 */

#ifdef TURBO
#include <alloc.h>
#include <errno.h>
#include <process.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>

#else /* !TURBO */
#include <stdio.h>
#include <ctype.h>
#endif /* TURBO */

#ifdef vms
#include  	<ssdef.h>
#include  	<stsdef.h>
#define  IO_SUCCESS  (SS | STS)
#define  IO_ERROR  SS
#endif /* vms */

/*
 * Note: IO_SUCCESS and IO_ERROR are defined in the Decus C stdio.h file
 */

#ifndef  IO_SUCCESS
#define  IO_SUCCESS  0
#endif /* IO_SUCCESS */
#ifndef  IO_ERROR
#define  IO_ERROR  1
#endif /* IO_ERROR */

#define  EOS  	0
#ifdef unix
char            temfile[L_tmpnam];
char           *tmpnam();
#define  TEMPFILE  (temfile[0]? temfile: (tmpnam(temfile), temfile))
#else /* unix */
#define  TEMPFILE  "diff.tmp"
#endif /* unix */
#define  TRUE  	1
#define  FALSE  	0

#ifdef  pdp11
#define  short  int
#endif /* pdp11 */

typedef struct candidate {
	int             b;			/* Line in fileB  	 */
	int             a;			/* Line in fileA  	 */
	int             link;		/* Previous candidate  	 */
} CANDIDATE;

typedef struct line {
	unsigned short  hash;		/* Hash value etc.  	 */
	short           serial;		/* Line number  	  */
} LINE;

LINE           *file[2];		/* Hash/line for total file  */
#define  fileA  file[0]
#define  fileB  file[1]

LINE           *sfile[2];		/* Hash/line after prefix  */
#define  sfileA  sfile[0]
#define  sfileB  sfile[1]

int             len[2];			/* Actual lines in each file  */
#define  lenA  len[0]
#define  lenB  len[1]

int             slen[2];		/* Squished lengths  	 */
#define  slenA  slen[0]
#define  slenB  slen[1]

int             prefix;			/* Identical lines at start  */
int             suffix;			/* Identical lenes at end  */

FILE           *infd[2] = {NULL, NULL};	/* Input file identifiers  */
FILE           *tempfd;			/* Temp for input redirection  */

extern long     ftell();
extern FILE    *fopen();

#ifdef TURBO
extern void    *malloc();
#else /* !TURBO */
extern char    *malloc();
#endif /* TURBO */

char           *fgetss();
unsigned short  hash();

#ifdef	AMIGA
/* Define these types for Amiga C */
char           *savptr;
int             savsiz;
char           *wrk;
char           *wrk2;
int             cpysiz;
#endif /* AMIGA */

/*
 * The following vectors overlay the area defined by fileA
 */

short          *class;			/* Unsorted line numbers  */
int            *klist;			/* Index of element in clist  */
CANDIDATE      *clist;			/* Storage pool for candidates  */
int             clength = 0;	/* Number of active candidates  */
#define	CSIZE_INC 50			/* How many to allocate each time we have to */
int             csize = CSIZE_INC;		/* Current size of storage pool */

int            *match;			/* Longest subsequence  	 */
long           *oldseek;		/* Seek position in file A  */

/*
 * The following vectors overlay the area defined by fileB
 */

short          *member;			/* Concatenated equiv. classes  */
long           *newseek;		/* Seek position in file B  */
char           *textb;			/* Input from file2 for check  */

/*
 * Global variables
 */

int             eflag = FALSE;	/* Edit script requested  */
int             nflag = FALSE;	/* Reverse Edit script requested  */
int             bflag = FALSE;	/* Blank supress requested  */
int             cflag = FALSE;	/* Context printout  	 */
int             iflag = FALSE;	/* Ignore case requested  */
char            text[257];		/* Input line from file1  */
extern char    *myalloc();		/* Storage allocator  	 */

extern char    *compact();		/* Storage compactor  	 */

#ifdef  DEBUG
#ifndef OSK
#define  TIMING
#endif /* OSK */
#endif /* DEBUG */
#ifdef  TIMING
extern long     time();
extern char    *4532mend;
long            totaltime;
long            sectiontime;
char           *mstart;
#endif /* TIMING */
void			free();
void			exit();
#ifndef OSK
void			perror();
#endif /* OSK */

FILE *outfile = stdout;
extern int optind;
extern char * optarg;
extern int opterr;

/*
 * Diff main program
 */

main(argc, argv)
	int             argc;
	char          **argv;
{
	register int    i,j;
	int opt;

#ifdef OSK
	extern int      _memmins;
	_memmins = 16 * 1024;		/* tell OSK we will malloc a lot */
#endif /* OSK */
#ifdef  TIMING
	sectiontime = time(&totaltime);
#endif /* TIMING */
#ifdef vms
	argc = getredirection(argc, argv);
#endif /* vms */
	opterr = 0;
	
	while ((opt = getopt(argc,argv,"bc:eino:-")) != EOF)
		switch (opt) {
			case 'b':
				bflag++;
				break;

			case 'c':
				if (*optarg > '0' && *optarg <= '9')
					cflag = *optarg - '0';
				else
					cflag = 3;
				break;

			case 'e':
				eflag++;
				break;
			
			case 'n':
				nflag++;
				break;

			case 'i':
				iflag++;
				break;

			case 'o':
				/* output file next */
				if ((outfile = fopen(optarg,"w")) == NULL)
					cant(optarg,"output",1);
				break;
			case '?':
				fprintf(stderr,"diff: unknown option \n");
				fprintf(stderr,"Usage: diff [-options] file1 file2");
				break;
		}
	if (argc - optind != 2)
		error("Usage: diff [-options] file1 file2");
	if (nflag && (cflag || eflag)) {
		fprintf(stderr,
				"Warning, -n incompatible with -c and -e, -c and -e supressed.\n");
		cflag = FALSE;
		eflag = FALSE;
	}
	if (cflag && eflag) {
		fprintf(stderr,
				"Warning, -c and -e are incompatible, -c supressed.\n");
		cflag = FALSE;
	}
	for (i = optind,j=0; i <= optind+1; i++,j++) {
		if (argv[i][0] == '-' && argv[i][1] == EOS) {
			infd[j] = stdin;
			if ((tempfd = fopen(TEMPFILE, "w")) == NULL)
				cant(TEMPFILE, "work", 1);
		} else {
			infd[j] = fopen(argv[i], "r");
			if (!infd[j])
				cant(argv[i], "input", 2);		/* Fatal error */
		}
	}

	if (infd[0] == stdin && infd[1] == stdin)
		error("Can't diff two things both on standard input.");

	if (infd[0] == NULL && infd[1] == NULL) {
		cant(argv[optind], "input", 0);
		cant(argv[optind+1], "input", 1);
	}
#ifdef vms
	else if (infd[1] == NULL)
		opendir(1, &argv[1], infd[0]);
	else if (infd[0] == NULL)
		opendir(0, &argv[0], infd[1]);
#endif /* vms */

	/*
	 * Read input, building hash tables. 
	 */
	input(0);
	input(1);
	squish();
#ifdef  DEBUG
	fprintf(outfile,"before sort\n");
	for (i = 1; i <= slenA; i++)
		fprintf(outfile,"sfileA[%d] = %6d %06o\n",
			   i, sfileA[i].serial, sfileA[i].hash);
	for (i = 1; i <= slenB; i++)
		fprintf(outfile,"sfileB[%d] = %6d %06o\n",
			   i, sfileB[i].serial, sfileB[i].hash);
#endif /* DEBUG */
	sort(sfileA, slenA);
	sort(sfileB, slenB);
#ifdef  TIMING
	ptime("input");
#endif /* TIMING */
#ifdef  DEBUG
	fprintf(outfile,"after sort\n");
	for (i = 1; i <= slenA; i++)
		fprintf(outfile,"sfileA[%d] = %6d %06o\n",
			   i, sfileA[i].serial, sfileB[i].hash);
	for (i = 1; i <= slenB; i++)
		fprintf(outfile,"sfileB[%d] = %6d %06o\n",
			   i, sfileB[i].serial, sfileB[i].hash);
#endif /* DEBUG */

	/*
	 * Build equivalence classes. 
	 */
	member = (short *) fileB;
	equiv();
	member = (short *) compact((char *) member, (slenB + 2) * sizeof(int),
							   "squeezing member vector");

	/*
	 * Reorder equivalence classes into array class[] 
	 */
	class = (short *) fileA;
	unsort();
	class = (short *) compact((char *) class, (slenA + 2) * sizeof(int),
							  "compacting class vector");
#ifdef  TIMING
	ptime("equiv/unsort");
#endif /* TIMING */

	/*
	 * Find longest subsequences 
	 */
	klist = (int *) myalloc((slenA + 2) * sizeof(int), "klist");
	clist = (CANDIDATE *) myalloc(csize * sizeof(CANDIDATE), "clist");
	i = subseq();
#ifndef OSK
	free((char *) member);
	free((char *) class);
#else /* OSK */
	free((char *) member - sizeof(int));
	free((char *) class - sizeof(int));
#endif /* OSK */
	match = (int *) myalloc((lenA + 2) * sizeof(int), "match");
	unravel(klist[i]);
#ifndef OSK
	free((char *) clist);
	free((char *) klist);
#else /* OSK */
	free((char *) clist - sizeof(int));
	free((char *) klist - sizeof(int));
#endif /* OSK */
#ifdef  TIMING
	ptime("subsequence/unravel");
#endif /* TIMING */

	/*
	 * Check for fortuitous matches and output differences 
	 */
	oldseek = (long *) myalloc((lenA + 2) * sizeof(*oldseek), "oldseek");
	newseek = (long *) myalloc((lenB + 2) * sizeof(*newseek), "newseek");
	textb = myalloc(sizeof text, "textbuffer");
	if (check(argv[0], argv[1]))
		fprintf(stderr, "Spurious match, output is not optimal\n");
#ifdef  TIMING
	ptime("check");
#endif /* TIMING */
	output(argv[0], argv[1]);
#ifdef  TIMING
	ptime("output");
	fprintf(stderr,"%ld seconds required\n", sectiontime - totaltime);
#endif /* TIMING */
	if (tempfd != NULL) {
		fclose(tempfd);
#ifdef unix
		unlink(TEMPFILE);
#else /* !unix */
#ifdef OSK
		unlink(TEMPFILE);
#else /* OSK */
#ifdef MSC						/* MSC 4.0 does not understand disjunctive
								 * #if's.  */
		unlink(TEMPFILE);
#else /* MSC */
		remove(TEMPFILE);
#endif /* MSC */
#endif /* OSK */
#endif /* unxi */
	}
	return(0);
}


/*
 * Read the file, building hash table
 */

input(which)
	int             which;		/* 0 or 1 to redefine infd[]  */
{
	register LINE  *lentry;
	register int    linect = 0;
	FILE           *fd;
#define	LSIZE_INC 200			/* # of line entries to alloc at once */
	int             lsize = LSIZE_INC;

	lentry = (LINE *) myalloc(sizeof(LINE) * (lsize + 3), "line");
	fd = infd[which];
	while (!getline(fd, text)) {
		if (++linect >= lsize) {
			lsize += 200;
			lentry = (LINE *) compact((char *) lentry,
									  (lsize + 3) * sizeof(LINE),
									  "extending line vector");
		}
		lentry[linect].hash = hash(text);
	}

	/*
	 * If input was from stdin ("-" command), finish off the temp file. 
	 */
	if (fd == stdin) {
		fclose(tempfd);
		tempfd = infd[which] = fopen(TEMPFILE, "r");
	}

	/*
	 * If we wanted to be stingy with memory, we could realloc lentry down to
	 * its exact size (+3 for some odd reason) here.  No need?  
	 */
	len[which] = linect;
	file[which] = lentry;
}


/*
 * Look for initial and trailing sequences that have identical hash values.
 * Don't bother building them into the candidate vector.
 */

squish()
{
	register int    i;
	register LINE  *ap;
	register LINE  *bp;
	int             j;
	int             k;

	/*
	 * prefix -> first line (from start) that doesn't hash identically 
	 */
	i = 0;
	ap = &fileA[1];
	bp = &fileB[1];
	while (i < lenA && i < lenB && ap->hash == bp->hash) {
		i++;
		ap++;
		bp++;
	}
	prefix = i;

	/*
	 * suffix -> first line (from end) that doesn't hash identically 
	 */
	j = lenA - i;
	k = lenB - i;
	ap = &fileA[lenA];
	bp = &fileB[lenB];
	i = 0;
	while (i < j && i < k && ap->hash == bp->hash) {
		i++;
		ap--;
		bp--;
	}
	suffix = i;

	/*
	 * Tuck the counts away 
	 */
	for (k = 0; k <= 1; k++) {
		sfile[k] = file[k] + prefix;
		j = slen[k] = len[k] - prefix - suffix;

		for (i = 0, ap = sfile[k]; i <= slen[k]; i++, ap++) {
			ap->serial = i;
		}
	}
}


/*
 * Sort hash entries
 */

sort(vector, vecsize)
	LINE           *vector;		/* What to sort  	  */
	int             vecsize;	/* How many to sort  	 */
{
	register int    j;
	register LINE  *aim;
	register LINE  *ai;
	int             mid;
	int             k;
	LINE            work;

	for (j = 1; j <= vecsize; j *= 2);
	mid = (j - 1);
	while ((mid /= 2) != 0) {
		k = vecsize - mid;
		for (j = 1; j <= k; j++) {
			for (ai = &vector[j]; ai > vector; ai -= mid) {
				aim = &ai[mid];
				if (aim < ai)
					break;		/* ?? Why ??  	 */
				if (aim->hash > ai->hash ||
					aim->hash == ai->hash &&
					aim->serial > ai->serial)
					break;
				work.hash = ai->hash;
				ai->hash = aim->hash;
				aim->hash = work.hash;
				work.serial = ai->serial;
				ai->serial = aim->serial;
				aim->serial = work.serial;
			}
		}
	}
}


/*
 * Build equivalence class vector
 */

equiv()
{
	register LINE  *ap;
	union {
		LINE           *bp;
		short          *mp;
	}               r;
	register int    j;
	LINE           *atop;

#ifdef  DEBUG
	fprintf(outfile,"equiv entry\n");
	for (j = 1; j <= slenA; j++)
		fprintf(outfile,"sfileA[%d] = %6d %06o\n",
			   j, sfileA[j].serial, sfileA[j].hash);
	for (j = 1; j <= slenB; j++)
		fprintf(outfile,"sfileB[%d] = %6d %06o\n",
			   j, sfileB[j].serial, sfileB[j].hash);
#endif /* DEBUG */
	j = 1;
	ap = &sfileA[1];
	r.bp = &sfileB[1];
	atop = &sfileA[slenA];
	while (ap <= atop && j <= slenB) {
		if (ap->hash < r.bp->hash) {
			ap->hash = 0;
			ap++;
		} else if (ap->hash == r.bp->hash) {
			ap->hash = j;
			ap++;
		} else {
			r.bp++;
			j++;
		}
	}
	while (ap <= atop) {
		ap->hash = 0;
		ap++;
	}
	sfileB[slenB + 1].hash = 0;
#ifdef  DEBUG
	fprintf(outfile,"equiv exit\n");
	for (j = 1; j <= slenA; j++)
		fprintf(outfile,"sfileA[%d] = %6d %06o\n",
			   j, sfileA[j].serial, sfileA[j].hash);
	for (j = 1; j <= slenB; j++)
		fprintf(outfile,"sfileB[%d] = %6d %06o\n",
			   j, sfileB[j].serial, sfileB[j].hash);
#endif /* DEBUG */
	ap = &sfileB[0];
	atop = &sfileB[slenB];
	r.mp = &member[0];
	while (++ap <= atop) {
		r.mp++;
		*r.mp = -(ap->serial);
		while (ap[1].hash == ap->hash) {
			ap++;
			r.mp++;
			*r.mp = ap->serial;
		}
	}
	r.mp[1] = -1;
#ifdef  DEBUG
	for (j = 0; j <= slenB; j++)
		fprintf(outfile,"member[%d] = %d\n", j, member[j]);
#endif /* DEBUG */
}


/*
 * Build class vector
 */

unsort()
{
	register int   *temp;
	register int   *tp;
	union {
		LINE           *ap;
		short          *cp;
	}               u;
	LINE           *evec;
	short          *eclass;
#ifdef  DEBUG
	int             i;
#endif /* DEBUG */

	temp = (int *) myalloc((slenA + 1) * sizeof(int), "unsort scratch");
	u.ap = &sfileA[1];
	evec = &sfileA[slenA];
	while (u.ap <= evec) {
#ifdef  DEBUG
		fprintf(outfile,"temp[%2d] := %06o\n", u.ap->serial, u.ap->hash);
#endif /* DEBUG */
		temp[u.ap->serial] = u.ap->hash;
		u.ap++;
	}

	/*
	 * Copy into class vector and free work space 
	 */
	u.cp = &class[1];
	eclass = &class[slenA];
	tp = &temp[1];
	while (u.cp <= eclass)
		*u.cp++ = *tp++;
#ifndef OSK
	free((char *) temp);
#else /* OSK */
	free((char *) temp - sizeof(int));
#endif /* OSK */
#ifdef  DEBUG
	fprintf(outfile,"unsort exit\n");
	for (i = 1; i <= slenA; i++)
		fprintf(outfile,"class[%d] = %d %06o\n", i, class[i], class[i]);
#endif /* DEBUG */
}


/*
 * Generate maximum common subsequence chain in clist[]
 */

subseq()
{
	int             a;
	register unsigned ktop;
	register int    b;
	register int    s;
	unsigned        r;
	int             i;
	int             cand;

	klist[0] = newcand(0, 0, -1);
	klist[1] = newcand(slenA + 1, slenB + 1, -1);
	ktop = 1;					/* -> guard entry  */
	for (a = 1; a <= slenA; a++) {

		/*
		 * For each non-zero element in fileA ... 
		 */
		if ((i = class[a]) == 0)
			continue;
		cand = klist[0];		/* No candidate now  */
		r = 0;					/* Current r-candidate  */
		do {
#ifdef  DEBUG
			fprintf(outfile,"a = %d, i = %d, b = %d\n", a, i, member[i]);
#endif /* DEBUG */

			/*
			 * Perform the merge algorithm 
			 */
			if ((b = member[i]) < 0)
				b = -b;
#ifdef  DEBUG
			fprintf(outfile,"search(%d, %d, %d) -> %d\n",
				   r, ktop, b, search(r, ktop, b));
#endif /* DEBUG */
			if ((s = search(r, ktop, b)) != 0) {
				if (clist[klist[s]].b > b) {
					klist[r] = cand;
					r = s;
					cand = newcand(a, b, klist[s - 1]);
#ifdef  DEBUG
					dumpklist(ktop, "klist[s-1]->b > b");
#endif /* DEBUG */
				}
				if (s >= ktop) {
					klist[ktop + 1] = klist[ktop];
					ktop++;
#ifdef  DEBUG
					klist[r] = cand;
					dumpklist(ktop, "extend");
#endif /* DEBUG */
					break;
				}
			}
		} while (member[++i] > 0);
		klist[r] = cand;
	}
#ifdef  DEBUG
	fprintf(outfile,"last entry = %d\n", ktop - 1);
#endif /* DEBUG */
	return (ktop - 1);			/* Last entry found  */
}


int
newcand(a, b, pred)
	int             a;			/* Line in fileA  	  */
	int             b;			/* Line in fileB  	  */
	int             pred;		/* Link to predecessor, index in cand[]  */
{
	register CANDIDATE *new;

	clength++;
	if (++clength >= csize) {
		csize += CSIZE_INC;
		clist = (CANDIDATE *) compact((char *) clist,
									  csize * sizeof(CANDIDATE),
									  "extending clist");
	}
	new = &clist[clength - 1];
	new->a = a;
	new->b = b;
	new->link = pred;
	return (clength - 1);
}


/*
 * Search klist[low..top] (inclusive) for b.  If klist[low]->b >= b,
 * return zero.  Else return s such that klist[s-1]->b < b and
 * klist[s]->b >= b.  Note that the algorithm presupposes the two
 * preset "fence" elements, (0, 0) and (slenA, slenB).
 */

search(low, high, b)
	register unsigned low;
	register unsigned high;
	register int    b;
{
	register int    temp;
	register unsigned mid;

	if (clist[klist[low]].b >= b)
		return (0);
	while ((mid = (low + high) / 2) > low) {
		if ((temp = clist[klist[mid]].b) > b)
			high = mid;
		else if (temp < b)
			low = mid;
		else {
			return (mid);
		}
	}
	return (mid + 1);
}


unravel(k)
	register int    k;
{
	register int    i;
	register CANDIDATE *cp;
	int             first_trailer;
	int             difference;

	first_trailer = lenA - suffix;
	difference = lenB - lenA;
#ifdef  DEBUG
	fprintf(outfile,"first trailer = %d, difference = %d\n",
		   first_trailer, difference);
#endif /* DEBUG */
	for (i = 0; i <= lenA; i++) {
		match[i] = (i <= prefix) ? i
			: (i > first_trailer) ? i + difference
			: 0;
	}
#ifdef  DEBUG
	fprintf(outfile,"unravel\n");
#endif /* DEBUG */
	while (k != -1) {
		cp = &clist[k];
#ifdef  DEBUG
		if (k < 0 || k >= clength)
			error("Illegal link -> %d", k);
		fprintf(outfile,"match[%d] := %d\n", cp->a + prefix, cp->b + prefix);
#endif /* DEBUG */
		match[cp->a + prefix] = cp->b + prefix;
		k = cp->link;
	}
}


/*
 * Check for hash matches (jackpots) and collect random access indices to
 * the two files.
 *
 * It should be possible to avoid doing most of the ftell's by noticing
 * that we are not doing a context diff and noticing that if a line
 * compares equal to the other file, we will not ever need to know its
 * file position.  FIXME.
 */

check(fileAname, fileBname)
	char           *fileAname;
	char           *fileBname;
{
	register int    a;			/* Current line in file A  */
	register int    b;			/* Current line in file B  */
	int             jackpot;

/*
 * The VAX C ftell() returns the address of the CURRENT record, not the
 * next one (as in DECUS C or, effectively, other C's).  Hence, the values
 * are "off by one" in the array.  OFFSET compensates for this.
 */

#ifdef vms
#define OFFSET (-1)
#else /* !vms */
#define OFFSET 0
#endif /* vms */

	b = 1;
	rewind(infd[0]);
	rewind(infd[1]);
/*
 * See above; these would be over-written on VMS anyway.
 */

#ifndef vms
	oldseek[0] = ftell(infd[0]);
	newseek[0] = ftell(infd[1]);
#endif /* vms */

	jackpot = 0;
#ifdef  DEBUG
	fprintf(outfile,"match vector\n");
	for (a = 0; a <= lenA; a++)
		fprintf(outfile,"match[%d] = %d\n", a, match[a]);
#endif /* DEBUG */
	for (a = 1; a <= lenA; a++) {
		if (match[a] == 0) {
			/* Unique line in A */
			oldseek[a + OFFSET] = ftell(infd[0]);
			getline(infd[0], text);
			continue;
		}
		while (b < match[a]) {
			/* Skip over unique lines in B */
			newseek[b + OFFSET] = ftell(infd[1]);
			getline(infd[1], textb);
			b++;
		}

		/*
		 * Compare the two, supposedly matching, lines. Unless we are going
		 * to print these lines, don't bother to remember where they are.  We
		 * only print matching lines if a context diff is happening, or if a
		 * jackpot occurs. 
		 */
		if (cflag) {
			oldseek[a + OFFSET] = ftell(infd[0]);
			newseek[b + OFFSET] = ftell(infd[1]);
		}
		getline(infd[0], text);
		getline(infd[1], textb);
		if (!streq(text, textb)) {
			fprintf(stderr, "Spurious match:\n");
			fprintf(stderr, "line %d in %s, \"%s\"\n",
					a, fileAname, text);
			fprintf(stderr, "line %d in %s, \"%s\"\n",
					b, fileBname, textb);
			match[a] = 0;
			jackpot++;
		}
		b++;
	}
	for (; b <= lenB; b++) {
		newseek[b + OFFSET] = ftell(infd[1]);
		getline(infd[1], textb);
	}
/*
 * The logical converse to the code up above, for NON-VMS systems, to
 * store away an fseek() pointer at the beginning of the file.  For VMS,
 * we need one at EOF...
 */

#ifdef vms
	oldseek[lenA] = ftell(infd[0]);
	getline(infd[0], text);		/* Will hit EOF...  */
	newseek[lenB] = ftell(infd[1]);
	getline(infd[1], textb);	/* Will hit EOF...  */
#endif /* vms */

	return (jackpot);
}


output(fileAname, fileBname)
	char           *fileAname, *fileBname;
{
	register int    astart;
	register int    aend = 0;
	int             bstart;
	register int    bend;

	rewind(infd[0]);
	rewind(infd[1]);
	match[0] = 0;
	match[lenA + 1] = lenB + 1;
	if (!eflag) {
		if (cflag) {

			/*
			 * Should include ctime style dates after the file names, but
			 * this would be non-trivial on OSK.  Perhaps there should be a
			 * special case for stdin. 
			 */
			fprintf(outfile,"*** %s\n--- %s\n", fileAname, fileBname);
		}

		/*
		 * Normal printout 
		 */
		for (astart = 1; astart <= lenA; astart = aend + 1) {

			/*
			 * New subsequence, skip over matching stuff 
			 */
			while (astart <= lenA
				   && match[astart] == (match[astart - 1] + 1))
				astart++;

			/*
			 * Found a difference, setup range and print it 
			 */
			bstart = match[astart - 1] + 1;
			aend = astart - 1;
			while (aend < lenA && match[aend + 1] == 0)
				aend++;
			bend = match[aend + 1] - 1;
			match[aend] = bend;
			change(astart, aend, bstart, bend);
		}
	} else {

		/*
		 * Edit script output -- differences are output "backwards" for the
		 * benefit of a line-oriented editor. 
		 */
		for (aend = lenA; aend >= 1; aend = astart - 1) {
			while (aend >= 1
				   && match[aend] == (match[aend + 1] - 1)
				   && match[aend] != 0)
				aend--;
			bend = match[aend + 1] - 1;
			astart = aend + 1;
			while (astart > 1 && match[astart - 1] == 0)
				astart--;
			bstart = match[astart - 1] + 1;
			match[astart] = bstart;
			change(astart, aend, bstart, bend);
		}
	}
	if (lenA == 0)
		change(1, 0, 1, lenB);
}


/*
 * Output a change entry: fileA[astart..aend] changed to fileB[bstart..bend]
 */

change(astart, aend, bstart, bend)
	int             astart;
	int             aend;
	int             bstart;
	int             bend;
{
	char            c;

	/*
	 * This catches a "dummy" last entry 
	 */
	if (astart > aend && bstart > bend)
		return;
	c = (astart > aend) ? 'a' : (bstart > bend) ? 'd' : 'c';
	if (cflag)
		fputs("**************\n*** ", outfile);

	if (nflag && c == 'c') {
		change(astart,aend,bstart+(bend-bstart+1),bend);
		change(astart+(aend-astart+1),aend,bstart,bend);
		return;
	}
	if (nflag)
		putc(c,outfile);
	if (c == 'a' && !cflag)
		range(astart - 1, astart - 1, 0);		/* Addition: just print one
												 * odd # */
	else 
		range(astart, aend, 0);	/* Print both, if different */
	if (!cflag) {
		if (!nflag)
			putc(c,outfile);
		else
			putc(' ',outfile);
		if (!eflag) {
			if (c == 'd') {
				if (nflag)
					range(astart,aend,1);
				else
					range(bstart - 1, bstart - 1, 1);	/* Deletion: just print
														 * one odd # */
			}
			else
				range(bstart, bend, 1);	/* Print both, if different */
		}
	}
	putc('\n',outfile);
	if (nflag && c == 'd')
		return;
	if ((!eflag && c != 'a') || cflag) {
		fetch(oldseek, astart, aend, lenA, infd[0],
			  cflag ? (c == 'd' ? "- " : "! ") : "< ");
		if (cflag) {
			fputs("--- ", outfile);
			range(bstart, bend, 1);
			fputs(" -----\n", outfile);
		} else if (astart <= aend && bstart <= bend)
			fprintf(outfile,"---\n");
	}
	fetch(newseek, bstart, bend, lenB, infd[1],
		  cflag ? (c == 'a' ? "+ " : "! ") : ((eflag || nflag) ? "" : "> "));
	if (eflag && bstart <= bend)
		fprintf(outfile,".\n");
}


/*
 * Print a range
 */

range(from, to, w)
	int             from;
	int             to;
	int             w;
{
	if (nflag) {
		if (w) {
			fprintf(outfile,"%d",to-from+1);
		} else {
			fprintf(outfile,"%d",from);
		}
		return;
	}
	if (cflag) {
		if ((from -= cflag) <= 0)
			from = 1;
		if ((to += cflag) > len[w])
			to = len[w];
	}
	if (to > from) {
		fprintf(outfile,"%d,%d", from, to);
	} else if (to < from) {
		fprintf(outfile,"%d,%d", to, from);
	} else {
		fprintf(outfile,"%d", from);
	}
}


/*
 * Print the appropriate text
 */

fetch(seekvec, start, end, trueend, fd, pfx)
	long           *seekvec;
	register int    start;
	register int    end;
	int             trueend;
	FILE           *fd;
	char           *pfx;
{
	register int    i;
	register int    first;
	register int    last;

	if (cflag) {
		if ((first = start - cflag) <= 0)
			first = 1;
		if ((last = end + cflag) > trueend)
			last = trueend;
	} else {
		first = start;
		last = end;
	}
	if (fseek(fd, seekvec[first], 0) != 0) {
		fprintf(stderr,"?Can't read line %d at %08lx (hex) in file%c\n",
			   start, seekvec[first],
			   (fd == infd[0]) ? 'A' : 'B');
	} else {
		for (i = first; i <= last; i++) {
			if (fgetss(text, sizeof text, fd) == NULL) {
				fprintf(stderr,"** Unexpected end of file\n");
				break;
			}
#ifdef DEBUG
			fprintf(stderr,"%5d: %s%s\n", i, pfx, text);
#else /* !DEBUG */
			fputs((cflag && (i < start || i > end)) ? "  " : pfx, outfile);
			fputs(text, outfile);
			putc('\n',outfile);
#endif /* DEBUG */
		}
	}
}


/*
 * Input routine, read one line to buffer[], return TRUE on eof, else FALSE.
 * The terminating newline is always removed.  If "-b" was given, trailing
 * whitespace (blanks and tabs) are removed and strings of blanks and
 * tabs are replaced by a single blank.  Getline() does all hacking for
 * redirected input files.
 */

int
getline(fd, buffer)
	FILE           *fd;
	char           *buffer;
{
	register char  *top;
	register char  *fromp;
	register char   c;

	if (fgetss(buffer, sizeof text, fd) == NULL) {
		*buffer = EOS;
		return (TRUE);
	}
	if (fd == stdin)
		fputss(buffer, tempfd);
	if (bflag || iflag) {
		top = buffer;
		fromp = buffer;
		while ((c = *fromp++) != EOS) {
			if (bflag && (c == ' ' || c == '\t')) {
				c = ' ';
				while (*fromp == ' ' || *fromp == '\t')
					fromp++;
			}
			if (iflag)
				c = tolower(c);
			*top++ = c;
		}
		if (bflag && top[-1] == ' ')
			top--;
		*top = EOS;
	}
	return (FALSE);
}


static unsigned short crc16a[] = {
								  0000000, 0140301, 0140601, 0000500,
								  0141401, 0001700, 0001200, 0141101,
								  0143001, 0003300, 0003600, 0143501,
								  0002400, 0142701, 0142201, 0002100,
};

static unsigned short crc16b[] = {
								  0000000, 0146001, 0154001, 0012000,
								  0170001, 0036000, 0024000, 0162001,
								  0120001, 0066000, 0074000, 0132001,
								  0050000, 0116001, 0104001, 0043000,
};


/*
 * Return the CRC16 hash code for the buffer
 * Algorithm from Stu Wecker (Digital memo 130-959-002-00).
 */

unsigned short
hash(buffer)
	char           *buffer;
{
	register unsigned short crc;
	register char  *tp;
	register short  temp;

	crc = 0;
	for (tp = buffer; *tp != EOS;) {
		temp = *tp++ ^ crc;		/* XOR crc with new char  */
		crc = (crc >> 8)
			^ crc16a[(temp & 0017)]
			^ crc16b[(temp & 0360) >> 4];
	}
#ifdef  DEBUG_ALL
	fprintf(outfile,"%06o: %s\n", (crc == 0) ? 1 : crc, buffer);
#endif /* DEBUG_ALL */
	return ((crc == 0) ? (unsigned short) 1 : crc);
}


#ifdef vms
opendir(which, arg, okfd)
	int             which;		/* Which file to open (0 or 1)  	 */
	char          **arg;		/* File name argument, &argv[which]  */
	FILE           *okfd;		/* File name (already open)  	 */
{
	register char  *tp;
	register int    c;
	register char  *newname;

	fgetname(okfd, text);

	/*
	 * Skip over device name 
	 */
	for (tp = text; (c = *tp) && c != ':'; tp++);
	if (c)
		tp++;
	else
		tp = text;

	/*
	 * Skip over [UIC] or [PPN] if present 
	 */
	if (*tp == '[' || *tp == '(') {
		while ((c = *tp++) && c != ']' && c != ')');
		if (c == 0) {
			fprintf(stderr, "?Bug: bad file name \"%s\"\n",
					text);
			tp--;
		}
	}
	strcpy(text, tp);

	/*
	 * Don't include version 
	 */
	for (tp = text; (c = *tp) && c != ';'; tp++);
	*tp = 0;

	/*
	 * Now, text has the file name, tp - text, its length, and *arg the
	 * (possible) directory name.  Create a new file name for opening. 
	 */
#ifndef	OSK
	if ((newname = malloc(tp - text + strlen(*arg) + 1)) == NULL)
		error("Out of space at start");
#ifdef	AMIGA
	savsiz = tp - text + strlen(*arg) + 1;
	savptr = newname;
#endif /* AMIGA */
#else /* OSK */
	newname = myalloc(tp - text + strlen(*arg) + 1, "Out of space at start");
#endif /* OSK */
	concat(newname, *arg, text, NULL);
	if ((infd[which] = fopen(newname, "r")) == NULL)
		cant(*arg, "constructed input", 1);
	else
		*arg = newname;
}
/* Amiga C doesn't have all these extensions for directory... */
#endif /* vms */


/*
 * Allocate or crash.
 */

char           *
myalloc(amount, why)
	unsigned        amount;
	char           *why;
{
	register char  *pointer;

#ifdef OSK
	amount += sizeof(int);
#endif /* OSK */
	if ((pointer = malloc((unsigned) amount)) == NULL)
		noroom(why);
#ifdef OSK
	*((int *) pointer) = amount;
	pointer += sizeof(int);
#ifdef DEBUG
	fprintf(stderr, "Myalloc: %d at %06o\n", amount, pointer);
#endif /* DEBUG */
#endif /* OSK */
#ifdef	AMIGA
	savsiz = amount;
	savptr = pointer;
#endif /* AMIGA */

	return (pointer);
}


/*
 * Reallocate pointer, compacting storage
 *
 * The "compacting storage" part is probably not relevant any more.
 * There used to be horrid code here that malloc'd one byte and freed
 * it at magic times, to cause garbage collection of the freespace
 * or something.  It's safely gone now, you didn't have to see it.
 *	-- John Gilmore, Nebula Consultants, Sept 26, 1986
 */

char           *
compact(pointer, new_amount, why)
	char           *pointer;
	unsigned        new_amount;
	char           *why;
{
	register char  *new_pointer;

#ifndef AMIGA
#ifndef OSK
#ifdef TURBO
	extern void    *realloc();
#else /* !TURBO */
	extern char    *realloc();
#endif /* TURBO */

	if ((new_pointer = realloc(pointer, (unsigned) new_amount)) == NULL) {
		noroom(why);
	}
#else /* OSK */
	register int    old_amount;
	new_amount += sizeof(int);
	if ((new_pointer = malloc(new_amount)) == NULL)
		noroom(why);
	*(int *) new_pointer = new_amount;
	new_pointer += sizeof(int);
	old_amount = *(((int *) pointer) - 1);
	/* _strass is like bcopy with the first two arguments reversted */
	_strass(new_pointer, pointer, (new_amount <= old_amount ?
								   new_amount : old_amount) - sizeof(int));
#ifdef DEBUG
	fprintf(stderr, "compact %d to %d from %06o to %06o\n",
			old_amount, new_amount, pointer, new_pointer);
#endif /* DEBUG */
	free(pointer - sizeof(int));
#endif /* OSK */
#else /* AMIGA */

	/*
	 * This routine is heavily dependent on C storage allocator hacks For
	 * Amiga, we can't rely on storage being left alone "up to" the boundary
	 * of allocation as in VMS or RSX. Therefore we have to be different here
	 * and allocate a new larger segment, then free the old one. Messy but
	 * hopefully it will work. 
	 */
	extern char    *malloc();

	/* No realloc().  Do a malloc and copy it.  */
	if ((new_pointer = malloc((unsigned) new_amount)) == NULL) {
		noroom(why);
	}
  /* This MUST assume the program calls compact using the old pointer as the
  last call of malloc... Reason is that RSX version is really simpleminded */
	cpysiz = savsiz;
  /* Complain if deallocate area not same as last allocate area */
	if (savptr != pointer)
		bogus(why);
	wrk2 = new_pointer;
	for (wrk = pointer; cpysiz > 0; cpysiz--) {
  /* copy data to new area */
		*wrk2++ = *wrk++;
	}
  /* when done, free old memory area. */
	free(pointer);
#endif /* AMIGA */

#ifndef OSK
#ifdef  DEBUG
	if (new_pointer != pointer) {
		fprintf(stderr, "moved from %06o to %06o\n",
				pointer, new_pointer);
	}
/*  rdump(new_pointer, why);
*/
#endif /* DEBUG */
#endif /* OSK */
	return (new_pointer);
}


noroom(why)
	char           *why;
{
	fprintf(stderr, "?DIFF-F-out of room when %s\n", why);
	exit(IO_ERROR);
}


#ifdef	AMIGA
bogus(why)
	char           *why;
{
	fprintf(stderr, "?DIFF-F-invalid compaction when %s\n", why);
	exit(IO_ERROR);
}
#endif	/* AMIGA */


#ifdef  DEBUG
/*
 * Dump memory block
 */

rdump(pointer, why)
	int            *pointer;
	char           *why;
{
	int            *last;
	int             count;

	last = ((int **) pointer)[-1];
	fprintf(stderr, "dump %s of %06o -> %06o, %d words",
			why, pointer, last, last - pointer);
	last = (int *) (((int) last) & ~1);
	for (count = 0; pointer < last; ++count) {
		if ((count & 07) == 0) {
			fprintf(stderr, "\n%06o", pointer);
		}
		fprintf(stderr, "\t%06o", *pointer);
		pointer++;
	}
	fprintf(stderr, "\n");
}
#endif /* DEBUG */


/*
 * Can't open file message
 */

cant(filename, what, fatalflag)
	char           *filename;
	char           *what;
	int             fatalflag;
{
	fprintf(stderr, "Can't open %s file \"%s\": ", what, filename);
#ifndef	OSK
	perror((char *) NULL);
#else
	prerr(0, errno);
#endif
	if (fatalflag) {
		exit(fatalflag);
	}
}


#ifdef  DEBUG
dump(d_linep, d_len, d_which)
	LINE           *d_linep;
	int				d_len;
	int				d_which;
{
	register int    i;

	fprintf(outfile,"Dump of file%c, %d elements\n", "AB"[d_which], d_len);
	fprintf(outfile,"linep @ %06o\n", d_linep);
	for (i = 0; i <= d_len; i++) {
		fprintf(outfile,"%3d  %6d  %06o\n", i,
			   d_linep[i].serial, d_linep[i].hash);
	}
}


/*
 * Dump klist
 */

dumpklist(kmax, why)
	int             kmax;
	char           *why;
{
	register int    i;
	register CANDIDATE *cp;
	register int    count;

	fprintf(outfile,"\nklist[0..%d] %s, clength = %d\n", kmax, why, clength);
	for (i = 0; i <= kmax; i++) {
		cp = &clist[klist[i]];
		fprintf(outfile,"%2d %2d", i, klist[i]);
		if (cp >= &clist[0] && cp < &clist[clength])
			fprintf(outfile," (%2d %2d -> %2d)\n", cp->a, cp->b, cp->link);
		else if (klist[i] == -1)
			fprintf(outfile," End of chain\n");
		else
			fprintf(outfile," illegal klist element\n");
	}
	for (i = 0; i <= kmax; i++) {
		count = -1;
		for (cp = (CANDIDATE *) klist[i]; cp > &clist[0];
			 cp = (CANDIDATE *) & cp->link) {
			if (++count >= 6) {
				fprintf(outfile,"\n    ");
				count = 0;
			}
			fprintf(outfile," (%2d: %2d,%2d -> %d)",
				   cp - clist, cp->a, cp->b, cp->link);
		}
		fprintf(outfile,"\n");
	}
	fprintf(outfile,"*\n");
}
#endif	/* DEBUG */



#ifdef  TIMING

/*
 * Dump time buffer
 */

ptime(why)
	char           *why;
{
	long            ttemp;

	ttemp = time(NULL);
	fprintf(outfile,"%ld seconds for %s\n",
		   ttemp - sectiontime, why);
	sectiontime = ttemp;
}
#endif	/* TIMING */


/*
 * TRUE if strings are identical
 */

int
streq(s1, s2)
	register char  *s1;
	register char  *s2;
{
	while (*s1++ == *s2) {
		if (*s2++ == EOS)
			return (TRUE);
	}
	return (FALSE);
}


/*
 * Error message before retiring.
 */

/* VARARGS */
error(format, args)
	char           *format;
{
	fprintf(stderr, format, &args);
	putc('\n', stderr);
	_error();
}


_error()
{
	exit(1);
}


/*
 * Like fput() except that it puts a newline at the end of the line.
 */

fputss(s, iop)
	register char  *s;
	register FILE  *iop;
{
	fputs(s, iop);
	putc('\n', iop);
}


/*
 * Fgetss() is like fgets() except that the terminating newline
 * is removed. 
 */

char           *
fgetss(s, n, iop)
	char           *s;
	register FILE  *iop;
{
	register char  *cs;

	if (fgets(s, n, iop) == NULL)
		return ((char *) NULL);
	cs = s + strlen(s) - 1;
	if (*cs == '\n')
		*cs = '\0';
	return (s);
}
SHAR_EOF
fi # end of overwriting check
if test -f 'diff.doc'
then
	echo shar: will not over-write existing file "'diff.doc'"
else
cat << \SHAR_EOF > 'diff.doc'



DIFF(1)                  USER COMMANDS                    DIFF(1)



NAME
     diff - Public Domain diff (context diff) program

SYNOPSIS
     diff [ -b -c -i -e -n -o file ] file1 file2

DESCRIPTION
     _D_i_f_f compares two files, showing what  must  be  changed  to
     make  them  identical.  Either file1 or file2 (but not both)
     may refer to directories. If that is the case, a file in the
     directory  whose name is the same as the other file argument
     will be used. The standard input may be used for one of  the
     files by replacing the argument by "-". Except for the stan-
     dard input, both files must be on disk devices.

OPTIONS
     -b Remove trailing whitespace (blanks and tabs) and compress
          all other strings of whitespace to a single blank.

     -c Print some context -- matching lines before and after the
          non-match section.  Mark non-matched sections with "|".

     -e Output is in an "editor script" format which is  compati-
          ble with the Unix 'ed' editor.

     -i Ignore lower/upper case distinctions.

     -n Output is in an RCS style format for use with RCS tools.

     -o outfile Place output in file  "outfile"  rather  than  on
          stdout.

     All information needed to compare the files is maintained in
     main  memory.  This  means  that very large files (or fairly
     large files with many differences) will cause the program to
     abort  with  an "out of space" message. Main memory require-
     ments (in words) are approximately:

               2 * (length of file1 + length of file2)
                      + 3 * (number of changes)

     (Where "length" is the number  of  lines  of  data  in  each
     file.)

     The algorithm reads each file  twice,  once  to  build  hash
     tables  and  once to check for fortuitous matches (two lines
     that are in fact different, but which  have  the  same  hash
     value).  CPU  time  requirements  include  sorting  the hash
     tables and randomly searching memory tables for  equivalence
     classes. For example, on a time-shared VAX-11/780, comparing
     two 1000 line files required about 30 seconds (elapsed clock
     time)  and  about  10,000 bytes of working storage. About 90



Sun Release 4.0           Last change:                          1






DIFF(1)                  USER COMMANDS                    DIFF(1)



     per-cent of the time was taken up by file I/O.

DIAGNOSTICS
     Warning, bad option 'x'
          The option is ignored.

     Usage ...
          Two input files were not specified.

     Can't open input file "filename".
          Can't continue.

     Out of space
          The program ran out of memory while comparing  the  two
          files.

     Can't read line nnn at xxx in file[A/B]
          This  indicates  an  I/O  error  when  seeking  to  the
          specific line.  It should not happen.

     Spurious match, output is not optimal.
          Two lines that were different  yielded  the  same  hash
          value.   This  is  harmless  except that the difference
          output is not the minimum set  of  differences  between
          the two files.  For example, instead of the output:
                     lines 1 to 5 were changed to ...
          the program will print
                     lines 1 to 3 were changed to ...
                     lines 4 to 5 were changed to ...

     The program uses a CRC16 hash code.
          The likelihood of this error is quite small.

AUTHOR
     The diff algorithm was developed by J. W.  Hunt  and  M.  D.
     McIlroy,  using  a central algorithm defined by H. S. Stone.
     It was published in:
          Hunt, J. W., and McIlroy, M. D.,
          An Algorithm for Differential File Comparison,
          Computing Science Technical Report #41,
          Bell Laboratories, Murray Hill, NJ  07974

BUGS
     On RSX and DECUS C on VMS systems, diff may fail if the both
     files  are  not  "variable-length, implied carriage control"
     format.  The scopy program can be used to convert  files  to
     this format if problems arise.

     When compiled under VAX  C,  diff  handles  STREAM_LF  files
     properly  (in  addition  to  the  canonical  variable-length
     implied carriage control  files).  Other  variations  should
     work, but have not been tested.



Sun Release 4.0           Last change:                          2






DIFF(1)                  USER COMMANDS                    DIFF(1)



     When compiled under VAX C, diff is quite  slow  for  unknown
     reasons  which  ought to be investigated. On the other hand,
     it has access to effectively unlimited memory.

     Output in a form suitable for ed - the  -e  option  -  seems
     rather pointless; the analogue on DEC systems is SLP (SUMSLP
     on VMS). It would be simple to provide  SLP-compatible  out-
     put.  The  question  is,  why bother - since the various DEC
     file comparison utilities already produce it.














































Sun Release 4.0           Last change:                          3



SHAR_EOF
fi # end of overwriting check
if test -f 'makefile'
then
	echo shar: will not over-write existing file "'makefile'"
else
cat << \SHAR_EOF > 'makefile'
#
# Makefile for Public Domain Diff
#
# the two 'include' directive are for these USG 5.3 machines (or 3B1) which
# have shared libraries.  Comment them out if you do not have shared 
# libraries on your system.

#include $(MAKEINC)/Makepre.h

#
# You may want to use one or more of the following definitions when compiling,
# depending on what your target environment is, and whether you want debugging
# turned on or off.
#
#	TURBO   - compile for TURBO C target (See TC-READ.ME for details)
#	AMIGA   - compile for AMIGA target
#	OSK     - compile for OSK
#	DEBUG   - turn on debugging (implies TIMING) 
#	TIMING  - turn on timing code
#	MSC     - Microsoft C 4.0 (small model support)
#
# additional support for vax, or pdp11 are invoked by defining 'vax'
# or 'pdp11'.  These should be defined by your preprocessor if you are
# on one of these machines.
#
# CFLAGS = -O -DTURBOC
CFLAGS = -O
LDFLAGS = -s

# The first line ('cc ...') is for standard C compilers, the second line
# ('$(LD) ...') is for shared library machines.  Use whichever line is 
# appropriate for your system and comment the other one out.

diff: diff.o
#	cc $(LDFLAGS) diff.o -o diff
	$(LD) $(SHAREDLIB) $(LDFLAGS) diff.o -o diff

clean:
	rm -f *.o a.out core diff

#include $(MAKEINC)/Makepost.h
SHAR_EOF
fi # end of overwriting check
if test -f 'readme'
then
	echo shar: will not over-write existing file "'readme'"
else
cat << \SHAR_EOF > 'readme'
Comp.sources.misc: Volume 2, Issue 1
Archive-Name: pd-diff
Submitted-By: blarson@skat.usc.edu (Bob Larson)

Here's a public domain diff with the -b and -c options.  (4.2bsd style
contex diffs.)  I wasn't aware that these wern't present in all unix
versions of diff, so I didn't think posting it was a priority.  It's
large, slow, and many of the comments are no longer true, but it does
work.  (except when it runs out of memory)  The one case I know of
where its output is incompatable with patch does seem to be pretty
rare.  No makefile is included, the 4.2bsd diff is better on the unix
system I use.  If you don't know how to compile and load a single C
program, this probably isn't the tool for you anyway.  I'd be grateful
to anyone who cleans this up and documents it properly.  It does
appear to have been separate files at some point, I'm presenting it in
a form similar to how I got it: mail headers and outdated documentation
in comments and all.  I just banged on it enough to get it doing what
I wanted.

---
Comp.Sources.Misc: Volume 2, Issue 8
Submitted-By: <mike2@lcuxa.UUCP>
Archive-Name: cdiff-v2

After receiving Bob Larson's sources for the PD context diff program,
I decided to accept his challenge to rewrite the documentation.  In
the process, I also ported it to TURBOC version 1.5.  It probably will
also compile in TURBOC 1.0, but since getting the update I dispensed
with the previous version and did not try it.  

The code has been reorganized to strip it of the documentation that
was built into it; that has been moved to the file cdiff.mem.  Thus,
the following shar file includes cdiff.c, cdiff.1 (man source), cdiff.mem
(the previously built-in documentation), cdiff.doc (cdiff.1 passed
through nroff -man for those who do not have nroff available), the
original README, and a new TC-READ.ME.  Follow the notes in TC-READ.ME
or it will run even slower!  

Of course, no warranties whatsoever go with this.  I merely hacked the
code minimally.  I didn't write it.
---

Comp.sources.misc: Volume 2, Issue 59
Submitted-By: <mike2@lcuxa.UUCP>
Archive-Name: pd-cdiff-patch

Neil Dixon uncovered a flaw in the logic of the cdiff program that
was distributed early in January, and which was redistributed with
changes to make it compilable in Turbo C.  I've tested his patch
both on the Unix SysVr2 version and on the PC, and have not found
any errors.  Conversely, the earlier version when compiled in MSC
4.0 (but, for some reason, not when compiled in TC 1.5) would
sporadically come up with "read" errors.  Since it now works in MSC as
well as TC, I've included the appropriate ifdefs for both compilers,
and have incorporated Neil's patch.  (This was for clarity.  The line
numbers in his patch did not correspond precisely to the line numbers
in the distributed code.)  Both the patch as sent to me and the
revised code are contained below.

As before, I did not write this code.  I merely ported it, and of
course make no warranties whatsoever.

---

Ok, I guess that I will add my two cents worth.  Here is yet another
repost of the public domain diff program.  

I have integrated some changes into the i/o portion of the code, providing 
some significant speedups.  These changes were made after spending two 
evenings playging around with the profiler, attempting various fixes to
make this beast a little faster.  I completed this prior to the latest release 
of the code (the version listed immediately above).  

I have attempted to merge the changes provided by Mike above, but, since I do 
not have any other machines close by, I could not test them.

The changes which I made are in the following areas:

	* modified the fgetss() and fputss() routines.  These were the primary
	  areas of intense activity on the system.  From the source that I
	  could see, these changes should be portable.  After timing this 
	  on my 3b1, the changes make this diff run at about the same speed
	  as the system diff for the files that I was using (amazing isn't it?).

	* Moved the defines from within the source code to within the Makefile.

	* Ran the code through indent.  Sorry about that, but it was the only
	  way that I could make sure that I got all the other patches integrated 
	  properly.

	* Cleaned up some of the comments and added a few of my own.

	* Made a few tweaks to make lint happier.

	* Modified the Makefile to allow use of shared libraries.  Included
	 instructions for all the defines in the system as well.

Mark H. Colburn (mark@jhereg.mn.org)
SHAR_EOF
fi # end of overwriting check
#	End of shell archive
exit 0
-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_
Jeff Jennings
University of Colorado, Boulder
{ ut-sally | ihnp4 } !nbires!boulder!jennings