jbrown@jato.Jpl.Nasa.Gov (Jordan Brown) (10/06/88)
(gotta avoid the A word... :-) I'm considering building a PUBLIC DOMAIN (that means *no* restrictions on anything) file packaging and compression program. I would attempt to maintain portability across a wide variety of environments (obviously MS-DOS and UNIX; others as appropriate) and would distribute the source code. I wouldn't promise that this would be the most featureful or fastest such program ever built, but it would be PUBLIC DOMAIN. And since I'd be distributing source code, if somebody else figured out a way to be a little faster or better, we could arrange to work TOGETHER to build a better program. (I anticipate compression ratios comparable to the existing A-word programs, because everybody really uses compress.) I don't have any arguments with SEA and PK. I'm not sure who is in the wrong, but it's clear we're all suffering. I agree completely with somebody who said that we (USENET, BBSes, etc) simply should not be depending on a commercial product. The initial interesting-feature list would include hierarchy support, compression, and multivolume archive support. So, what do people think? Would anybody be interested in working on such a project? Would anybody support (as in use) such a program? Jordan Brown jbrown@jato.jpl.nasa.gov
bobmon@iuvax.cs.indiana.edu (RAMontante) (10/06/88)
jbrown@jato.UUCP (Jordan Brown) writes: } }I'm considering building a PUBLIC DOMAIN (that means *no* restrictions on }anything) file packaging and compression program. I would attempt to }maintain portability across a wide variety of environments (obviously }MS-DOS and UNIX; others as appropriate) and would distribute the source }code. How about basing your program on the Zoo archive format? You get to step into an existing format which already has supporters. As I said elsewhere, the format IS public-domain; one of Rahul's specific programs (the only one that can generate zoo archives, I think) is the only part that has restrictions. -- -- bob,mon (bobmon@iuvax.cs.indiana.edu) -- "Aristotle was not Belgian..." - Wanda
dhesi@bsu-cs.UUCP (Rahul Dhesi) (10/07/88)
In article <259@jato.Jpl.Nasa.Gov> jbrown@jato.UUCP (Jordan Brown) writes: >I'm considering building a PUBLIC DOMAIN (that means *no* restrictions on >anything) file packaging and compression program. Think about the following issue carefully. If the archive is a concatenation of files like the cpio, tar, and arc formats, then updating it requires copying the whole archive. If the archive contains more structure, e.g. a linked list of directory entries like the zoo format, then updates need direct access writes but allow you to avoid copying the whole archive. Also, if the compressed file is preceded by length information, as in cpio, tar, and arc, then you can't easily add a compressed file to the archive without knowing the compressed size *first*, which means compressing to a temporary file, which I don't like. Take a look at the way zmodem protocol works: it does not precede file data with length information. Instead, it uses an escape sequence of bytes to denote the end of a file. This may need some tricky programming, and will slow down the speed with which archive contents are listed, but it will let you add a compressed file directly to an archive without creating a temporary file first. The first has the advantage that archives can be read from and written to standard input/output, allowing easy use of pipes in UNIX. The second has the advantage that users with limited disk space can still create and update large archives, and updating a large archive by adding a tiny file does not need much overhead in CPU or I/O time. (The tar format allows appending a file to a tar archive, but then you can get two instances of the same file in the archive, and to extract the file you extract both and let the second one overwrite the first -- not very elegant.) If you can combine the advantages of both in an easy way, you have achieved something very useful. -- Rahul Dhesi UUCP: <backbones>!{iuvax,pur-ee}!bsu-cs!dhesi
les@chinet.UUCP (Leslie Mikesell) (10/08/88)
In article <4225@bsu-cs.UUCP> dhesi@bsu-cs.UUCP (Rahul Dhesi) writes: >If the archive is a concatenation of files like the cpio, tar, and arc >formats, then updating it requires copying the whole archive. > >If the archive contains more structure, e.g. a linked list of directory >entries like the zoo format, then updates need direct access writes but >allow you to avoid copying the whole archive. >Also, if the compressed file is preceded by length information, as in >cpio, tar, and arc, then you can't easily add a compressed file to the >archive without knowing the compressed size *first*, which means >compressing to a temporary file, which I don't like. There is also the problem with cpio even without compressing that if the length of the file changes between writing the cpio header and reading the end of the file the rest of the archive is corrupted. I think the archiver should work in a streaming mode if necessary so that it can handle tape drives that don't seek, but there should be a length field that can be filled in if you can seek on the media. Your idea of a magic escape sequence to mark the end of an entry solves 2 problems - the file length where you can't seek on the device, and also the problem of re-syncing on an archive with a corrupted entry or part of a multi-volume set. The program could also keep a separate directory (optional) in another file or tacked on to the end of the archive. This could be used for several purposes with obvious advantages when the archive spans volumes. A minor extension would be to allow the directory portion to contain entries for files that are not contained in the archive which would allow (a) preserving links that would otherwise not be possible and (b) restoring a directory tree to exactly the condition that it was in when the last incremental backup was done (i.e. delete extraneous files that had been deleted before the incremental but still existed on the last full backup or intermediate incrementals). A fairly simple program could manipulate the information from the directory files to determine where to find archive copies (disk n of set xxx) and also determine exactly which files need to be copied in in incremental backup. Les Mikesell