gudeman@cs.arizona.edu (David Gudeman) (04/25/91)
Some programs such as SCCS and RCS store multiple versions of a source file in a single version file. They work by storing edit scripts such as are produced by diff. For example SCCS stores only the first version in its entirety, all other versions are stored in the form of an edit script that edits the original file to the new one. These aggregate files have several characteristics that would make them good candidates for keeping in compressed form: they are not accessed frequently, they have special formats anyway (so compression would not make them less accessible), and they can get very large. Would this sort of compression benefit much from special compression algorithms? For example, do diffs have any special properties that could be used? The process of making an edit script seems similar to some compression algorithms, could compression and diff'ing be combined? Could you use an "mutual-dictionary" compression method (one where you build the dictionary over all versions of the file) to get an effect similar to edit scripts (since it would tend to put common strings in the dictionary)? As far as I know, edit scripts are line-based (or otherwise coarse-grained), would a mutual-dictionary compression method be finer-grained? Is this already being done and my software tools book is outdated? Enquiring minds want to know. -- David Gudeman gudeman@cs.arizona.edu noao!arizona!gudeman