sam@delftcc.UUCP (Sam Kendall) (04/02/86)
I've had some thoughts recently about features that cpio(1) needs. Some of these apply to tar(1) also. (1) Optional error recovery. If the header of just one file in a cpio archive is munged, cpio will issue the pitiful message "Out of phase--get help" and terminate. This message is confusing to ordinary users, and it then takes a guru to recover the files in the archive past the garbled point. This is a bit ridiculous. There should be some optional error recovery, like the ability to retrieve the file following the garbled header (even if its name is unknown), and then to recognize the next file header in the garbled archive and proceed from there. This might break down if another cpio archive were one of the files in the garbled archive, but no big deal. (2) Automatic recognition of -c vs. non-"-c" formats. The -c option could be ignored with -i (copy in); cpio should recognize which format the archive is in. This is easy to implement. It complicates error recovery, though, in the case that the beginning of the file is munged. (3) Fix the bug that -m (restore file modification times) is ineffective on directories that are being copied. This is vital for the next feature: (4) Optional save and restore of directory contents, with file deletion. The purpose of this feature is to correctly handle full and incremental backups with cpio; specifically, to correctly restore a directory in which files have been removed after the full backup was made, but before the incremental backup was made. Currently, when -o (copy out) gets the name of a directory, it outputs a header for that directory, but no contents. My proposal is for an option "-D" which would work with both -o and -i. With -o, a list of files in a directory is saved along with the directory. With -i, when a directory is being restored and is "replacing" an already existing directory on disk, all files that are in the existing directory but NOT in the archived directory are REMOVED. Another way to look at it: with a cpio -i, the action of a file replacing an already existing file means, of course, that the archived contents replace the contents on disk. But there is no corresponding action for directories. -D adds such an action. N.B.: as with files, the archived directory will replace the existing directory only if it is newer or the -u option is given; this is why (3) above is necessary. -D would also work with -p (pass), of course. Example: a directory "d" contains files "a" and "b". A full backup (using cpio) is made including "d" and its contents. The file "b" is deleted. Now an incremental backup of files that have changed since the full backup is made using cpio -D. "d" is on the incremental backup, because it has changed since the full backup was made. (It changed when "b" was deleted.) Now suppose "d" is lost on disk, and we try to restore it to disk from backup. We first restore the full backup; "d" contains "a" and "b" again. We next restore the incremental backup. On the incremental backup, "d" contains "a" but not "b". So "b" is deleted from disk. The restore has worked correctly. With the current cpio, "b" would still exist, incorrectly, after the incremental backup was restored. This is extremely useful for backup purposes. It sounds complicated, but it fits in beautifully. (5) Preservation of printable ASCII + short lines. It is too late for this, since the format is already frozen, but it would have been good. The idea here is that an archive of mailable files should be itself mailable, except perhaps for its size. A file that is mailable has only printable ASCII characters, and has no lines longer than some length, maybe 80 characters (I'm not sure). A cpio -c archive has headers which are about 80 characters plus the length of the pathname; this can get too long. Also, the header includes a NUL character or two. I wish someone had thought about this a little bit more before designing the format. It is so close to preserving mailability! Of course, "shar", and also Martin Minow's (decvax!minow; I think it's his) "arch" programs do preserve mailability in almost all cases. (6) Should be public domain. This would avoid the annoying scenario where people get cpio archives but cannot unpack them. I haven't recommended that checksums be introduced into cpio, because I think this can be handled by some other filter. (There are some tools to package software for transmission, available through the AT&T Toolchest, that probably do what I want here.) One could argue that mailability can also be handled by other filters; but I would rather keep things simple for unpacking mailed archives. Comments? ---- Sam Kendall { ihnp4 | seismo!cmcl2 }!delftcc!sam Delft Consulting Corp. ARPA: delftcc!sam@NYU.ARPA
dricej@drilex.UUCP (Craig Jackson) (04/05/86)
Sam @ Deflt Consulting Corporation recently proposed several enhancements to cpio(1). I think that is a very interesting area of discussion. I'm not sure where it leads, but it can at least be useful for persons modifying a system or doing a port. Sam left out the one change that we have found that we needed the most: byte swapping the headers. We have gotten cpio tapes from VAXes that we could not read on our big-endian 68000 and Z8000 machines. We ended up adding a -h option to cpio, but ideally it would be done automatically, upon detecting a swapped magic number. The various byteswapping options which are present today are of limited utility if you can't read the header. The -c option solves this problem, but only if the person who made the tape thought to use it. -- Craig UUCP: {harvard,linus}!axiom!drilex!dricej BIX: cjackson
allyn@sdcsvax.UUCP (Allyn Fratkin) (04/06/86)
I don't see why the "-c" option isn't the default in the first place. What advantages are there in having a binary header over an ASCII header? ASCII headers are portable (that's the point), no byteswapping, no int/long size problems, and are easier to recover when cpio barfs on a bad block. I definitely think cpio needs to recover from errors. -- From the virtual mind of Allyn Fratkin allyn@sdcsvax.ucsd.edu or UCSD EMU/Pascal Project {ucbvax, decvax, ihnp4} U.C. San Diego !sdcsvax!allyn "Generally you don't see that kind of behavior in a major appliance."
ed@mtxinu.UUCP (Ed Gould) (04/07/86)
In article <109@drilex.UUCP> dricej@drilex.UUCP (Craig Jackson) writes: > >Sam @ Deflt Consulting Corporation recently proposed several enhancements >to cpio(1). I think that is a very interesting area of discussion. >I'm not sure where it leads, but it can at least be useful for persons >modifying a system or doing a port. One of the places it leads is backwards in time, or perhaps just sideways to another stream. The dump and restor programs have always had the facility to remove files that disappeared between the full dump and a later incremental. Does anyone *know* why USG decided to drop dump/restor and, for file-transfer functions, tar in favor of cpio? This decision was made fairly early. PWB 1.0 had dump/restor, I don't know about 2.0. They were gone in 3.0, which was released external to the (then) Bell System as "System III". -- Ed Gould mt Xinu, 2910 Seventh St., Berkeley, CA 94710 USA {ucbvax,decvax}!mtxinu!ed +1 415 644 0146 "A man of quality is not threatened by a woman of equality."
ka@hropus.UUCP (Kenneth Almquist) (04/12/86)
> Does anyone *know* why USG decided to drop dump/restor and, for > file-transfer functions, tar in favor of cpio? This decision > was made fairly early. PWB 1.0 had dump/restor, I don't know > about 2.0. They were gone in 3.0, which was released external > to the (then) Bell System as "System III". I don't know, but I can guess. They probably dropped dump/restore because no one wanted them. Volcopy was faster. I expect that dump and restore would have been retained if anybody had had a use for them. They didn't drop tar in favor of cpio; they dropped tp. There were a number of deficiencies with tp, including: 1) It could read archives from a disk file rather than tape, but it could not write them. 2) It could handle only a limited number of files specified by name because the names had to be passed as arguments (exec used to limit argument lists to 512 bytes). 3) It didn't understand about multiple links to a file. So USG released cpio and announced that they would drop tp eventually. I doubt that tar existed at this point. (If it did, USG might have reasonably rejected it on the grounds that it solved the first problem with tp, but not the latter two.) Some time later USG picked up tar, and a while after that they dropped tp, as promised. Tar was not dropped; it is still in System V today. It was not widely used because there was no good reason for users not to continue to use cpio. Kenneth Almquist ihnp4!houxm!hropus!ka (official name) ihnp4!opus!ka (shorter path)