davison@drivax.UUCP (Wayne Davison) (05/05/90)
I was just recently contemplating context diffs (I was mailing a 140k context diff and had applied a 50k patch to rn), when I thought that while new-style context diffs are much nicer than the old, we could save even more space if we optimized the change-bar case. And thus was born the "protext diff." Briefly, a protext diff is a context diff with all the changes and lines of context in one hunk. It takes the two line-number headers and puts them on one line, with each one's old ('-') and new ('+') starting line and section length. It also shortens the initial '+', '-', ' ' field to one character, and offers an option of using a '.' instead of a ' ' for surviving the trip around the net better. I am also advocating the use of patch's Index: line to indicate the name, rather than the ***/--- comments. For comparison, here's a simple context diff: *** orig/file Wed May 4 22:19:48 1990 --- file Wed May 4 22:19:54 1990 *************** *** 15,22 **** one two three ! OLD VERSION four five six seven --- 15,23 ---- one two three ! NEW VERSION four + EXTRA LINE five six seven which looks like this in protext diff format: Index: file @@-15,8+15,9@@ one two three -OLD VERSION +NEW VERSION four +EXTRA LINE five six seven I've created a program (currently called "frob") that will take as input new- or old-style context diffs plus the new protext diff format, and generate a protext or new-style context diff as output (the default is to toggle the diff's format unless you override it). In addition, I've also extended Larry Wall's patch program to scan for and parse the protext diff format. If people like the protext diff concept, I'll post the patch to "patch" and the code for "frob", and then people could start using the new patch format. Then, after a few months of confusion and getting everyone up to speed, we could actually start saving some net bandwidth. Later, the protext diff format could be added to diff programs and the need for frob would eventually die out. Comments? Do you it think it's worth pursuing? If so, any design issues we should consider? Here's a few real-world examples of protext diff savings in action. All the patches have been trimmed of comments (automatically by frob's -s option) and checked for accuracy. Patches marked with an asterisk (*) were distributed as old-style context diffs, and thus the savings are quite a bit more than those distributed as new-style context diffs. Since frob can generate new from old, I've included the size the patch could have been if it had been a new-style context diff, just in case you wanted to know. Patch context protext Saves (new-style) ============ ======= ======= ===== =========== rn patch 41 61050 32952 46.0% * (45995) rn patch 42 63262 35087 44.5% * (48879) rn patch 43 62212 34917 43.9% * (41469) rn patch 44 1854 1002 46.0% * (1548) rn patch 45 61732 44136 28.5% rn patch 46 50830 26077 48.7% * (38367) C news 24Aug89 50211 34325 31.6% C news 14Sep89 46093 34232 25.7% C news 13Nov89 44530 31313 29.7% C news 10Jan90 53315 40228 24.5% C news 16Jan90 49912 39926 20.0% C news 17Jan90 52755 39526 25.0% perl patch 10 42482 30047 29.3% perl patch 11 47175 32951 30.2% perl patch 12 31363 22684 27.7% perl patch 13 31799 23096 27.4% perl patch 14 32109 23850 25.7% gcc1.36to1.37 400085 313776 21.6% My own patch 144046 106639 26.0% -- Wayne Davison \ /| / /| \/ /| /(_) davison%drivax@uts.amdahl.com davison@drivax.UUCP (_)/ |/ /\| / / |/ \ ...!amdahl!drivax!davison