davison@drivax.UUCP (Wayne Davison) (05/05/90)
I was just recently contemplating context diffs (I was mailing a 140k context
diff and had applied a 50k patch to rn), when I thought that while new-style
context diffs are much nicer than the old, we could save even more space if
we optimized the change-bar case. And thus was born the "protext diff."
Briefly, a protext diff is a context diff with all the changes and lines of
context in one hunk. It takes the two line-number headers and puts them on
one line, with each one's old ('-') and new ('+') starting line and section
length. It also shortens the initial '+', '-', ' ' field to one character,
and offers an option of using a '.' instead of a ' ' for surviving the trip
around the net better. I am also advocating the use of patch's Index: line
to indicate the name, rather than the ***/--- comments.
For comparison, here's a simple context diff:
*** orig/file Wed May 4 22:19:48 1990
--- file Wed May 4 22:19:54 1990
***************
*** 15,22 ****
one
two
three
! OLD VERSION
four
five
six
seven
--- 15,23 ----
one
two
three
! NEW VERSION
four
+ EXTRA LINE
five
six
seven
which looks like this in protext diff format:
Index: file
@@-15,8+15,9@@
one
two
three
-OLD VERSION
+NEW VERSION
four
+EXTRA LINE
five
six
seven
I've created a program (currently called "frob") that will take as input new-
or old-style context diffs plus the new protext diff format, and generate a
protext or new-style context diff as output (the default is to toggle the
diff's format unless you override it).
In addition, I've also extended Larry Wall's patch program to scan for and
parse the protext diff format.
If people like the protext diff concept, I'll post the patch to "patch" and
the code for "frob", and then people could start using the new patch format.
Then, after a few months of confusion and getting everyone up to speed, we
could actually start saving some net bandwidth. Later, the protext diff
format could be added to diff programs and the need for frob would eventually
die out.
Comments? Do you it think it's worth pursuing? If so, any design issues we
should consider?
Here's a few real-world examples of protext diff savings in action. All the
patches have been trimmed of comments (automatically by frob's -s option) and
checked for accuracy. Patches marked with an asterisk (*) were distributed as
old-style context diffs, and thus the savings are quite a bit more than those
distributed as new-style context diffs. Since frob can generate new from old,
I've included the size the patch could have been if it had been a new-style
context diff, just in case you wanted to know.
Patch context protext Saves (new-style)
============ ======= ======= ===== ===========
rn patch 41 61050 32952 46.0% * (45995)
rn patch 42 63262 35087 44.5% * (48879)
rn patch 43 62212 34917 43.9% * (41469)
rn patch 44 1854 1002 46.0% * (1548)
rn patch 45 61732 44136 28.5%
rn patch 46 50830 26077 48.7% * (38367)
C news 24Aug89 50211 34325 31.6%
C news 14Sep89 46093 34232 25.7%
C news 13Nov89 44530 31313 29.7%
C news 10Jan90 53315 40228 24.5%
C news 16Jan90 49912 39926 20.0%
C news 17Jan90 52755 39526 25.0%
perl patch 10 42482 30047 29.3%
perl patch 11 47175 32951 30.2%
perl patch 12 31363 22684 27.7%
perl patch 13 31799 23096 27.4%
perl patch 14 32109 23850 25.7%
gcc1.36to1.37 400085 313776 21.6%
My own patch 144046 106639 26.0%
--
Wayne Davison \ /| / /| \/ /| /(_) davison%drivax@uts.amdahl.com
davison@drivax.UUCP (_)/ |/ /\| / / |/ \ ...!amdahl!drivax!davison