[comp.sources.bugs] bug in public domain diff?

hugh@dgp.toronto.edu ("D. Hugh Redelmeier") (01/19/89)

Various versions of a public domain diff have been broadcast across
the net over the years (including comp.sources.misc 2.1, 2.8, and
2.59).  The original version was on the DECUS C tape, author unknown
(Conroy?).  This program is based on the same algorithm as UNIX diff.

Anyway, all the versions I have checked seem to produce suboptimal
output when diffing the following two files.  UNIX diff does not
have this problem.  Am I right, is this a bug?  Does anyone know a
fix?  The code is currently beyond my comprehension.

file a:
	9
	1
	2
	3
	4
	5
	6
	9

file b:
	8
	1
	2
	x
	x
	4
	3
	4
	5
	6
	8

pd-diff says:
	1c1
	< 9
	---
	> 8
	4c4,5
	< 3
	---
	> x
	> x
	5a7,8
	> 3
	> 4
	8c11
	< 9
	---
	> 8

SunOS3.5 diff says:
	1c1
	< 9
	---
	> 8
	3a4,6
	> x
	> x
	> 4
	8c11
	< 9
	---
	> 8

Notice that pd-diff uselessly deletes and re-inserts 3.  This is not
wrong, just suboptimal.  Perhaps there is a simple off-by-one error
in the code.

Hugh Redelmeier
{utcsri, yunexus, uunet!attcan!utzoo, hcr}!redvax!hugh
When all else fails: hugh@csri.toronto.edu
+1 416 482-8253

alanf%smile@Sun.COM (Alan Fargusson @ peace with the world) (01/21/89)

In article <8901191545.AA18697@explorer.dgp.toronto.edu>, hugh@dgp.toronto.edu ("D. Hugh Redelmeier") writes:
> 
> Anyway, all the versions I have checked seem to produce suboptimal
> output when diffing the following two files.  UNIX diff does not
> have this problem.  Am I right, is this a bug?  Does anyone know a
> fix?  The code is currently beyond my comprehension.
> 
> Notice that pd-diff uselessly deletes and re-inserts 3.  This is not
> wrong, just suboptimal.  Perhaps there is a simple off-by-one error
> in the code.

It looks like the code that tries to find the longest match is doing something
wrong.  This should be nearly the last thing done by diff.  I don't have source
for this.

GNU diff gets this right, so you may want to get that.  I have a version of
diff that I wrote that also gets it right.  I may try and post it after all.
I had decided not to since there are so many versions floating around these
days.
- - - - - - - - - - - - - - - - - - - - -
Alan Fargusson		Sun Microsystems
alanf@sun.com		..!sun!alanf

jdc@naucse.UUCP (John Campbell) (01/23/89)

From article <86247@sun.uucp>, by alanf%smile@Sun.COM (Alan Fargusson @ peace with the world):
> 
> GNU diff gets this right, so you may want to get that.  I have a version of
> diff that I wrote that also gets it right.  I may try and post it after all.
> I had decided not to since there are so many versions floating around these
> days.
> - - - - - - - - - - - - - - - - - - - - -
> Alan Fargusson		Sun Microsystems
> alanf@sun.com		..!sun!alanf

(I also wrote a "diff" type program...)

What we need to know is how "compatible" the various public domain diffs
are.  Can GNU diff be used with RCS, for instance?  I find the public
domain diff works well enough (diff -cb) for Larry Wall's patch program,
but I've heard complaints that there are command options used by RCS that
don't exist in the public domain version.

Can anyone summarize the compatibility questions or point to a diff that
RCS can use (public of course).
-- 
	John Campbell               ...!arizona!naucse!jdc
                                    CAMPBELL@NAUVAX.bitnet
	unix?  Sure send me a dozen, all different colors.

karl@mstar.UUCP (Karl Fox) (01/24/89)

Well, I *too* have also written a pd diff that *also* gives the same
output as the 'real' diff on these two files.  It uses a different (and
much faster) algorithm, one described by Webb Miller and Gene Myers in
"A File Comparison Program", _Software - Practice and Experience_,
November 1985 (with permission of the author, even).  This is the same
basic algorithm used by GNU diff, except that mine doesn't have the
fancy all-different-line-removal stuff that GNU diff does (which speeds
it up a lot on very different files), nor does it have the stuff that
massages the results to "look better" but make it also different from
regular diff.

Mine supports the -e, -f, -b, -h, -s, -r, -l, -S, -D and -c options,
which I think is the same as BSD diff.  I wrote it because the old diff
was eating up our machine when we used SCCS on very large files.  It
used to be, "type delta and go to lunch"; now it never takes more than
several seconds.

Maybe we should assign a central Diff Naming Authority to hand out
names like ndiff, gdiff, diff1, diff2, or maybe we should all post at
once.  Let the customer choose, just like "Sun Memory at Bargain Prices".
-- 
Karl Fox, Morning Star Technologies
UUCP:     osu-cis!mstar!karl -or- pyramid!mstar!karl -or- sequent!mstar!karl
Internet: osu-cis!mstar!karl@tut.cis.ohio-state.edu

alanf%smile@Sun.COM (Alan Fargusson @ peace with the world) (01/26/89)

In article <1133@naucse.UUCP>, jdc@naucse.UUCP (John Campbell) writes:
> 
> What we need to know is how "compatible" the various public domain diffs
> are.  Can GNU diff be used with RCS, for instance?  I find the public
> domain diff works well enough (diff -cb) for Larry Wall's patch program,
> but I've heard complaints that there are command options used by RCS that
> don't exist in the public domain version.
> 
> Can anyone summarize the compatibility questions or point to a diff that
> RCS can use (public of course).
> -- 
> 	John Campbell               ...!arizona!naucse!jdc
>                                     CAMPBELL@NAUVAX.bitnet
> 	unix?  Sure send me a dozen, all different colors.

GNU diff looks to be exactly compatible with UNIX diff.  My diff program is
not and I intended it that way.  I do intend to add an output format that does
look like UNIX for Larry Wall's patch program.

The biggest advantage of my diff is that it easy to add output options.
Currently it outputs a human readable format,  a edlin compatible format
(for MS-DOS), and a format for the CP/V editor (you don't want to know).
I guess it is an advantage that mine works on UNIX, MS-DOS (MSC 4.0), and
CP/V with no special care (not even one #ifdef).
- - - - - - - - - - - - - - - - - - - - -
Alan Fargusson		Sun Microsystems
alanf@sun.com		..!sun!alanf