[comp.binaries.ibm.pc.d] Problem with GNU DIFF; need new diff?

jrv@grad1.cis.upenn.edu (JR VanMechelen) (08/27/90)

I recently downloaded the DOS RCS, and when I discovered that it
needed diff, I also got gnudiff from wuarchive.wustl.edu.  gnudiff
seems to have a bug in it, however.  Whenever RCS calls it, as in:

     diff -a -n c:\tmp\t1010101 .\test.c

diff goes off.  It once reported that it was out of virtual memory,
a ^C will sometimes bring it back, and frequently I have to reboot.

As best I can figure it, diff is trying to construct a diff between
test.c and all of memory, a challenging task.  That, at least, is my
hypothesis.

The problem seems to have something to do with the '-n' flag and the
fact that the files are not in the current directory, for I can do:

     diff c:\tmp\t1010101 .\test.c

and

     diff -n t1010101 test.c

without any problems.

I am running DOS 4.0, if that matters to anyone.


So, I have three questions: first, is there someplace I can get a diff
that will work with RCS (all the diff's on simtel20 seem to use
different meanings for the switches)?  Second, where can I get the
source to the PC version of gnudiff?  Third, is there someone I can
notify so that gnudiff, which seems to run just fine otherwise, can be
tweaked to perfection?

JR VanMechelen
jrv@grad1.cis.upenn.edu

NU013809@NDSUVM1.BITNET (Greg Wettstein) (08/28/90)

There was a version of GNU DIFF recently posted to comp.binaries.ibm.pc
which Mr. Davidsen tested so I assume that it works correctly.  Perhaps this
version will work successfully for you.

I had submitted a competing port of GNU DIFF but unfortunately Mr. Davidsen
had to select the one that showed up first.  The port which I did was based
on Version 1.7 of GNU DIFF and works extremely well for me.  I use it under
both DOS and UNIX(c) with very little difficulty.  I have begun using RCS
under DOS and have not had any trouble generating the differential comparisons
using my port.

This is probably common knowledge to anyone who has monkeyed with porting
GNU DIFF to MS-DOS but the FSF's algorithm is extremely memory consumptive.
There has been a blurb repeated consistently in their source code and
documentation that a memory/disk version would be possible by building
the intermediate hash tables on disk rather than in memory.  As far as I
know no one has tackled that problem yet, a non-trivial exercise I am sure.

Another problem that typically bites these GNU DIFF ports is a problem with
the segments.  The FSF code uses low-level I/O (i.e. the read function) to
spool the two text files into memory.  This works well as long as each file
is less than one segment (65535 bytes) in size.  The way around this size
limitation is of course to move the huge memory model which allows single
entity data objects to be larger than one segment.  A subtle bug occurs
however because the low-level read function does not properly deal with huge
pointers.  Things work well until the point where segment wrap occurs, at
which time the pointer is not interpreted properly and the incoming data
gets jammed into somewhat arbitrary positions in the input buffer.

The behavior of the read function in huge model is documented in the MSC
5.1 manuals but I didn't find it until about 03:00 one night during a
raging snowstorm when there wasn't anything else to do but work on a
cranky program.  I had spent a fair amount of time chasing this problem
until I found the reference in the manual.

If anyone is interested my port of GNU DIFF consists of three sequential
context differences which when applied to version 1.7 FSF sources yields
a clean compile under MSC 5.1 (-W3).  I have version 1.14 sources but I
haven't had the time to go back and retro-fit the patches.  I suppose I
should do that one day but 1.7 has worked well enough that I couldn't justify
the time expenditure required.  Everybody knows the adage: When you are up
to your --hole in alligators ....

If anyone would like to tinker I would be glad to make arrangements for
making the patches available.  If anyone is interested in contacting me
regarding this please use the address in my sig below rather than the
address associated with the news article.  Mail to my office machine gets
to me faster and is easier for me to deal with.


                             As always,
                             Dr. G.W. Wettstein
                             Roger Maris Cancer Center Computing Facility

                             UUCP: uunet!plains!wind!greg
                             INTERNET: greg%wind.uucp@plains.nodak.edu
                             Phone: 701-234-2833

`The truest mark of a man's wisdom is his ability to listen to other
 men expound their wisdom.'