samperi@mancol.UUCP (Dominick Samperi) (02/01/88)
I've heard that cpio will be used as the unix standard archiver, yet many people seem to prefer tar. While implementing these programs on a PC I noticed several advantages/disadvantages of each. In a tar archive, file headers and file data always begin on a block (512 byte) boundary, thus making it easier to seek to the beginning of a particular file, or to append files to a tar archive. On the other hand, files in a cpio archive can begin at any byte (character format), so a file header could even span two volumes (floppies), making it difficult to append files to a cpio archive. It seems that directories and special device files cannot be written to a tar archive (on the unix systems that I checked), while they can be written to a cpio archive. This means that more information is stored in a cpio archive, thus facilitating file restores after a crash. Another disadvantage of tar archives is the fact that they tend to waste space, since every file must occupy at lease 1K bytes (512 for a header, and 512 for data). I'd be interested to hear about any published standards for tar and/or cpio (AT&T, POSIX, etc.), especially standards that define how to deal with multi-volume archives (e.g., how do you start reading starting at volume N?). Perhaps people can add to the list of advantages/disadvantages of tar and cpio. Differences in the user interface (command syntax) is not really important, since tar can be used like cpio, and vis versa, via shell scripts. -- Dominick Samperi, Manhattan College, NYC manhat!samperi@NYU.EDU ihnp4!cmcl2!manhat!samperi (cmcL2) ihnp4!cmcl2!phri!dasys1!samperi
al@gtx.com (0732) (02/02/88)
In article <246@mancol.UUCP> samperi@mancol.UUCP (Dominick Samperi) writes: > >I'd be interested to hear about any published standards for tar and/or >cpio (AT&T, POSIX, etc.), especially standards that define how to deal >with multi-volume archives (e.g., how do you start reading starting at >volume N?). Perhaps people can add to the list of advantages/disadvantages I too would like to know if such information is available. I once did a cpio backup onto multiple floppies (AT&T 3B1), and when I tried to restore, I had a hard read error on one of the floppies, early on in the sequence. Luckily, I had the information elsewhere. Would there have been any way to bypass the bad floppy and continue the restore? Where can one get a document describing cpio format? ---------------------------------------------------------------------- | Alan Filipski, GTX Corp, 2501 W. Dunlap, Phoenix, Arizona 85021, USA | | {ihnp4,cbosgd,decvax,hplabs,amdahl}!sun!sunburn!gtx!al (602)870-1696 | ----------------------------------------------------------------------
mmengel@cuuxb.ATT.COM (Marc W. Mengel) (02/03/88)
In article <246@mancol.UUCP> samperi@mancol.UUCP (Dominick Samperi) writes: >I've heard that cpio will be used as the unix standard archiver, yet >many people seem to prefer tar. >... >I'd be interested to hear about any published standards for tar and/or >cpio (AT&T, POSIX, etc.), especially standards that define how to deal >with multi-volume archives (e.g., how do you start reading starting at >volume N?). Perhaps people can add to the list of advantages/disadvantages >of tar and cpio. Differences in the user interface (command syntax) is not >really important, since tar can be used like cpio, and vis versa, via >shell scripts. Well, you missed (about 1 month ago) a LONG discussion (TAR WARS (-:) in comp.std.unix, which can be summarized (this off the top of my head, so I won't try to credit the appropriate folks) as follows (tar and cpio here refer to their respective archive formats): 1) There is much confusion as to whether tar or cpio is older. 2) tar implementations are more prevalent (almost every release has some version of tar, many (i.e. the BSD releases and v7 derivatives) do not have any version of cpio) 3) tar format is easily extensible to handle special files such as device nodes, named pipes, etc. and has been so extended in the public domain version of tar (posted many months ago in comp.sources and a PC version about 2 months ago..) 4) cpio assumes too many things about inode numbers, (limiting their range, etc.) 5) non-character format cpio archives are not easily moveable to machines with different byte ordering. 6) cpio builds in information to handle file links properly regardless of file extraction order. (however it uses inode numbers to do this, see (4) As to the command format 1) taking files on stdin is more convenient for backups (used with find(1)) 2) taking files as arguments is more convenient for archives constructed "by hand" 3) cpio will copy directory trees with an option, tar needs 2 tar's in a pipeline to do this. 4) points 1 and 2 are resolved in the public domain tar (it has an option to read filenames from stdin.) These were the points discussed, and the tar format has been chosen (as of the last I heard) for the POSIX (a.k.a IEEE 1003) standard. >Dominick Samperi, Manhattan College, NYC -- Marc Mengel attmail!mmengel ...!{moss|lll-crg|mtune|ihnp4}!cuuxb!mmengel
snoopy@doghouse.gwd.tek.com (Snoopy) (02/05/88)
In article <246@mancol.UUCP> samperi@mancol.UUCP (Dominick Samperi) writes: >I've heard that cpio will be used as the unix standard archiver, yet >many people seem to prefer tar. - Tar needs fewer options to do what I want it to do. - Tar handles symbolic links. Most implementations of cpio don't. (I added this to UTek's cpio. Great fun.) - The code for tar is nice and clean, easy to figure out, return codes are checked for errors, etc. The code for cpio is a mess. => I trust tar farther than I trust cpio. (If you are writing your own from scratch this isn't a consideration.) - Most implementations of tar don't handle multiple volumes. (I haven't checked John's PD tar, perhaps it does?) If it doesn't fit on one volume, you're stuck with cpio or using one of those multivolume programs. Snoopy tektronix!doghouse.gwd!snoopy snoopy@doghouse.gwd.tek.com NFS: No Frigging Security
lenny@icus.UUCP (Lenny Tropiano) (02/06/88)
In article <556@gtx.com> al@gtx.UUCP (Al Filipski 839-0732) writes: |> [... reply to a question on the POSIX standards of cpio or tar ...] |> |>I too would like to know if such information is available. I once did |>a cpio backup onto multiple floppies (AT&T 3B1), and when I tried to |>restore, I had a hard read error on one of the floppies, early on in |>the sequence. Luckily, I had the information elsewhere. Would there |>have been any way to bypass the bad floppy and continue the restore? |>Where can one get a document describing cpio format? |> |> What you need to get is a program that was posted to the net a while back, it was called "afio". I think it was in comp.sources.unix (check with your local archive site). It was a program that acted just like cpio, but nicely skipped bad records and jumped to next ASCii header record (as long as you used the "-c" option to cpio or afio) and continued with that file. This means if you have a bad floppy, you might loose one file if there is some bad data. This also allows for starting at backup disk #50 is you like (any disk) and skip to the first good header. Nice program! Very useful, especially if you backup on floppy. The program has an option to compile it with -DCTC3B2 (for 3B2 cartridge tape). -Lenny -- ============================ US MAIL: Lenny Tropiano, ICUS Computer Group IIIII CCC U U SSSS PO Box 1 I C C U U S Islip Terrace, New York 11752 I C U U SSS PHONE: (516) 968-8576 [H] (516) 582-5525 [W] I C C U U S AT&T MAIL: ...attmail!icus!lenny TELEX: 154232428 IIIII CCC UUU SSSS UUCP: ============================ ...{uunet!godfre, harvard!talcott}!\ ...{ihnp4, boulder, mtune, bc-cis, ptsfa, sbcs}! >icus!lenny "Usenet the final frontier" ...{cmcl2!phri, hoptoad}!dasys1!/
twh@mibte.UUCP (Tim Hitchcock) (02/10/88)
> >I've heard that cpio will be used as the unix standard archiver, yet > >many people seem to prefer tar. > >... > Well, you missed (about 1 month ago) a LONG discussion (TAR WARS (-:) in > comp.std.unix, which can be summarized (this off the top of my head, so > I won't try to credit the appropriate folks) as follows (tar and cpio > here refer to their respective archive formats): > > 3) tar format is easily extensible to handle special files such as > device nodes, named pipes, etc. and has been so extended > in the public domain version of tar (posted many months > ago in comp.sources and a PC version about 2 months ago..) > "cpio -u" will copy special files. > > 5) non-character format cpio archives are not easily moveable to > machines with different byte ordering. > The "DD" command will swap bytes. In many cases find, cpio & dd are used. > As to the command format > > 1) taking files on stdin is more convenient for backups (used > with find(1)) > > 2) taking files as arguments is more convenient for archives > constructed "by hand" There is a limit to how many args are allowed on a command line. There are many UNIX tools one can use to manipulate pathnames. This seems to be resolved in the public domain tar (4). > > 3) cpio will copy directory trees with an option, tar needs > 2 tar's in a pipeline to do this. > > 4) points 1 and 2 are resolved in the public domain tar (it > has an option to read filenames from stdin.) > > These were the points discussed, and the tar format has been chosen (as > of the last I heard) for the POSIX (a.k.a IEEE 1003) standard. >
guy@gorodish.Sun.COM (Guy Harris) (02/10/88)
> > 5) non-character format cpio archives are not easily moveable to > > machines with different byte ordering. > > The "DD" command will swap bytes. In many cases find, cpio & dd are used. Unfortunately, "dd" will swap the bytes in every 16-bit quantity written to the tape (I don't know how any of this works with non-8-bit-"char" machines). This is not useful under these circumstances. Equally unfortunately, the byte-swapping options of "cpio" will swap only the bytes in the data blocks. This is also not useful, under almost *any* circumstances. What you *want* to do is swap only the bytes in the headers, *not* the bytes in the data blocks and *not* the bytes in the pathnames. Unfortunately, as the astute reader will note, the combination of the byte-swapping of "dd" and the byte-swapping of "cpio" results in the data blocks being unswapped and the headers being swapped - BUT it also results in the pathnames being swapped! The net result is precisely what you think is. The System III "cpio"'s byte-swapping option swapped bytes in the data blocks *and* in the pathname; combining this with "dd" provided a stupid and inefficient way of swapping only the bytes in the headers. The System V "cpio"s options cannot be combined with "dd" in this fashion to yield something useful. (Please do not tell me that this works, or can be made to work. I have tried it, when attempting to use the System V "cpio" on a big-endian Sun-3 to read a binary "cpio" tape made with the System V "cpio" on a little-endian VAX; it does not work, and cannot be made to work without hacking up "cpio". This is what I eventually did.) The *correct* thing to do is to make "cpio" detect that the magic number in the "cpio" header is byte-swapped from its proper value when reading a tape, and automatically decide to swap the bytes in the headers, and *only* the headers, as it reads the data. After trying to read the aformentioned "cpio" tape, I fixed the 3.2 SunOS "cpio" to do exactly that. This is, of course, useful only when you have a binary "cpio" archive. Everybody now should be using "cpio -c" to make "cpio" archives. They should also, if using "find" without "cpio" to make "cpio" archives, be using the "-ncpio" option, which produces ASCII "cpio" headers, rather than the "-cpio" option. Unfortunately, the implementor of this option didn't see fit to document it.
dhesi@bsu-cs.UUCP (Rahul Dhesi) (02/11/88)
In article <41499@sun.uucp> guy@gorodish.Sun.COM (Guy Harris) writes: >The *correct* thing to do is to make "cpio" detect that the magic number in the >"cpio" header is byte-swapped from its proper value when reading a tape, and >automatically decide to swap the bytes in the headers, and *only* the headers, >as it reads the data. An even more correct thing to do is for cpio to always write archive headers in a canonical format that is not dependent on the byte-ordering of the hardware. E.g., all header data written least significant byte first. In other words, portability ought to be achieved by making the cpio *format* portable, not just by compensating for nonportability in the format (in this case, ambiguity in byte ordering). This could well be a matter of religion. Follow-ups to talk.religion.misc. -- Rahul Dhesi UUCP: <backbones>!{iuvax,pur-ee,uunet}!bsu-cs!dhesi
gst@gnosys.UUCP (Gary S. Trujillo) (02/16/88)
In article <1629@cuuxb.ATT.COM> mmengel@cuuxb.UUCP (Marc W. Mengel) writes: | In article <246@mancol.UUCP> samperi@mancol.UUCP (Dominick Samperi) writes: | | I've heard that cpio will be used as the unix standard archiver, yet | | many people seem to prefer tar. | | ... | | I'd be interested to hear about any published standards for tar and/or | | cpio (AT&T, POSIX, etc.)... | | Well, you missed (about 1 month ago) a LONG discussion (TAR WARS (-:) in | comp.std.unix, which can be summarized (this off the top of my head, so | I won't try to credit the appropriate folks) as follows (tar and cpio | here refer to their respective archive formats): | | (deleted Marc's summary) | | These were the points discussed, and the tar format has been chosen (as | of the last I heard) for the POSIX (a.k.a IEEE 1003) standard. | | | Dominick Samperi, Manhattan College, NYC | | | -- | Marc Mengel | | attmail!mmengel | ...!{moss|lll-crg|mtune|ihnp4}!cuuxb!mmengel In reviewing my archives, I came across a copy of a message from the Usenix Association's representatives to the committee responsible for deciding on a standard for file interchange via magnetic tape. I thought readers of this discussion might find it interesting: | From husc6!ut-sally!std-unix Wed Aug 26 17:14:10 EDT 1987 | Article 114 of comp.std.unix: | Path: husc6!ut-sally!std-unix | From: jsq@usenix.uucp (John Quarterman) | Newsgroups: comp.std.unix | Subject: cpio format objections | Message-ID: <8832@ut-sally.UUCP> | Date: 24 Aug 87 23:24:22 GMT | Sender: std-unix@ut-sally.UUCP | Reply-To: jsq@usenix.uucp (John Quarterman) | Lines: 128 | Approved: fletcher@sally.utexas.edu (Guest Moderator, Fletcher Mattox) | | From: jsq@usenix.uucp (John Quarterman) | | cpio format objections Page 1 of 2 IEEE P1003.1 N.117 | 24 August 1987 | | John S. Quarterman | | Institutional Representative from USENIX | usenix!jsq | | | | Secretary, IEEE Standards Board | Attention: P1003 Working Group | 345 East 47th St. | New York, NY 10017 | | Cc: 1003.1 Technical Reviewers | for Section 10: for Rationale: | Stephen Dum Lorraine Kevra Hal Jespersen | tektronix!athena!steved attunix!kevra ucbvax!unisoft!hlj | | The USENIX Association ballots no on the test balloting of | IEEE 1003.1 Draft 11, objecting to the proposed inclusion of | cpio format, for the following reasons: | | 1. The need for extensions for symbolic links and | contiguous files has not been properly addressed. | Although three type codes are reserved, no indication | is given of what they should be used for. This does | not promote the need for those who implement such | extensions to implement them the same way. It is true | that the text of the standard cannot refer to symbolic | links or high performance files, because they are not | defined in the standard. But the USTAR format | indicates the use of its codes for those extensions | both by the name of the code given in the standard, | and by explicit recommendations in the Rationale. The | cpio proposal does neither. | | 2. The need for implementation-specific extensions that | do not conflict with present or future standard file | types has not been addressed. The USTAR format | addresses the problem by reserving 26 codes for | implementations to use as they see fit. The cpio | proposal does not address the problem at all. | | 3. The c_ino field of the cpio format is derived from the | UNIX inode number. Many implementations of cpio use | only 16 bits for this number, and thus cannot properly | resolve links noted in cpio archives that use more | bits for this number. Tar and USTAR formats do not | have this problem, because they do not use a number | like this to resolve links. While some USTAR file | types cannot be read by historical tar | implementations, an error will usually be produced. | This cpio problem will cause silent creation of | | | | | | | | cpio format objections Page 2 of 2 IEEE P1003.1 N.117 | | | | erroneous links, which is worse. | | 4. There are few, if any, distributions of UNIX systems | that do not include the tar program, which is | compatible with the POSIX USTAR format. There are | many UNIX systems that do not include cpio. | | 5. There is a public domain implementation of USTAR | format. There is no public domain implementation of | cpio format, with or without extensions. | | There should be one data interchange/archive format in IEEE | 1003.1. | | + The proposed cpio format is technically inferior to | USTAR format. | | + The program that cpio format is based on is not as | widely available as the one that USTAR format is based | on, and the same is true of the proposed cpio format | and of USTAR format, respectively. | | Therefore, the one format in the standard should be USTAR. | | Specific action: deny the cpio format proposal, and do not | include in the standard any references to that format or to | cpio. | | Thank you, | | | | John S. Quarterman | Texas Internet Consulting | 701 Brazos, Suite 500 | Austin, TX 78701-3243 | 512-320-9031 | | | | | | | | | | | | | | | | | | | | | | | Volume-Number: Volume 12, Number 21 | | Gary S. Trujillo {ihnp4,harvard,husc6,linus,ima,bbn,m2c}!spdcc!gnosys!gst Somerville, Massachusetts -- Gary S. Trujillo {ihnp4,harvard,husc6,linus,ima,bbn,m2c}!spdcc!gnosys!gst Somerville, Massachusetts