tut@ucbopal.CC.Berkeley.ARPA (08/07/84)
Could someone justify the existence of cpio? What's wrong with tar? As the saying goes, "if it ain't broke, don't fix it." The only portability problems I've ever encountered with tar were, I believe, caused by 1) a strange Plexus tape drive, and 2) the unavailability of tar on bare System V. Tar descends directory hierarchies, while cpio requires the aid of find. Tar always uses character at a time I/O, while cpio must be passed the -c flag to do this. So what have I overlooked? Bill Tuthill ucbvax!opal.tut
gwyn@brl-tgr.ARPA (Doug Gwyn <gwyn>) (08/10/84)
I changed from tar to cpio for several reasons: - tar would overflow its link table (running on a PDP-11) frequently and produce random behavior - cpio by default will not overwrite files during extraction if the archive copy is older than the current one - cpio will match files using general patterns whereas tar has no such feature - cpio can create a copy of a hierarchy using links rather than copies of the files I don't understand two of your comments, Bill. "cpio -c" makes the headers ASCII instead of binary; I don't know what "character at a time I/O" is supposed to mean but this isn't it. Also, tar is supplied with UNIX System V as it comes from AT&T. I move archives and especially partially-modified archives around a lot and find cpio to be just what I need for this task. I think the choice between the two depends on: - whether one is exporting to a non-cpio site - whether the above differences are important - personal preference
greg@sdcsvax.UUCP (Greg Noel) (08/10/84)
In article <198@ucbopal.CC.Berkeley.ARPA> tut@ucbopal.CC.Berkeley.ARPA writes: >Could someone justify the existence of cpio? What's wrong with tar? Actually, it's easy to justify the existance of cpio. There are at least three reasons: (1) History. They grew up at different organizations. Tar comes from Berkeley and cpio from Bell (now AT&T). (This may not be completely accurate -- I first saw cpio in the PWB release and I first saw tar in a Berkeley system; for all I know, tar may have come with Version 7. The point remains the same, even if the different organizations were within Bell. Tar has changed (grown?) (groan?) at Berkeley (witness the recent complaints about the incompatible tar formats); they picked tar and ran with it, while AT&T went with cpio. I don't justify it, or point fingers at either organization; I just report it.) (2) Problem. They grew up to solve different problems. Tar is a "tape archiver" and its major function is to produce backups of filesystems. (This was in the days when a filesystem would fit on a single tape.) (All right, that's oversimplified.) Its functionality is based upon a program called tp from Version 6 (did tp make it into Version 7?). On the other hand, cpio was designed from the ground up to solve a very different problem -- selectivly copying lists of files (actually, filesystem elements). Thus, it is useful for distributions, or for copying recently-changed files for backup, or for copying a selected part of a directory tree somewhere else, or ..... Tar takes its list of files from the command line, effectivly limiting the number of arguments, while cpio takes them from the standard input, giving no such limitation (this is why tar copies directory trees -- otherwise you couldn't get enough on the tape to make it useful). (I actually consider it a flaw of tar that you MUST copy ALL of a directory tree; there is no way to make the choice of files conditional.) (3) Philosophy. Cpio is more in keeping with the Unix (tm) philosophy, since it seperates the job of SELECTING the files from the job of COPYING the files. ANY algorithm can be used to select the files to be copied, but cpio can still be used to copy them. In fact, I have an application that tries to keep two sets of files in sync on different computers -- it does it by running a shell script that scans a set of files and determines which files have changed since the last run and then passes the names to cpio to be copied. There are about six thousand files to select from; on a given day, anywhere from a hundred to several thousand will be selected for transfer. I don't think tar could do that as well. In case you hadn't noticed, I prefer cpio. There are times when tar is better (if what you really want to do is copy all of a directory, it's just fine, and the interface is simpler), but I find that if it is complicated enough to need to write a shell script then cpio is usually the program of choice. Don't get me wrong -- cpio isn't perfect. Internally, it's a nightmare, and AT&T would be better off to rewrite the whole thing. But it works just fine, and it does what I want it to. BTW, the -c option of cpio does not cause it to write one character at a time; it causes the headers for each file to be in ASCII characters instead of binary. The output is still blocked. Now if the null at the end of the header could be changed into a carriage return, we could use cpio instead of shar format...... (tm) Unix is a footnote of AT&T Bell Laboratories -- -- Greg Noel, NCR Torrey Pines Greg@sdcsvax.UUCP or Greg@nosc.ARPA
wescott@ncrcae.UUCP (Mike Wescott) (08/11/84)
There are several reasons why I prefer cpio to tar, the biggest
one being that I used cpio first and got used to its peculiarities
rather than tar. But prejudice aside:
1. cpio handles special files (device nodes)
2. pass option (-p) in combo with find allows me
to easily move entire subtrees around easily
(I realize it can be done with a tar-to-tar
pipeline, but cpio is more straightforeward)
3. rename option when de-archiving allows greater flexiblity
in where to put/name things
4. for archiving, find has -cpio and -ncpio options which
do not require the pipe
Drawbacks in cpio:
1. No easy way to override full pathname in the archive
2. Loses phase easily, bad spot on some records makes rest of
archive unsalvagable
3. File extraction does not include the directory and
(recursively) everything in it if a directory
is specified as the file to be extracted, its
annoying to remember to spec. xyz* to get the
xyz directory and its contents
4. To be portable to the v7-based UNIXes I still need
to use tar
Mike Wescott
NCR Corp.
mcnc!ncsu!\
>ncrcae!wescott
akgua!usceast!/
johnl@haddock.UUCP (08/11/84)
#R:ucbopal:-19800:haddock:16700033:000:1189
haddock!johnl Aug 10 12:45:00 1984
>Could someone justify the existence of cpio? What's wrong with tar?
There's nothing wrong with tar, but I like cpio better because it is a
lot more powerful than tar:
- reading file names from stdin is a feature, not a bug. You can use
find to enumerate just the files you want rather than having to dump
everything in a directory tree, e.g.
$ find somedir -mtime -14 -print | cpio -oB >/dev/rmt0
(dump only files modified within the last two weeks.) Doing this
with tar is pretty hard. For the most common cases of cpio, we
usually have little shell scripts. Also (hack) the find command has
a little of cpio built into it so the above example could be:
$ find somedir -mtime -14 -cpio /dev/rmt0
- cpio knows about special files and FIFOs. Most versions of tar
don't. Could be fixed, of course.
- cpio -p lets you copy a directory tree by linking (so that you
have new names but the same files.) Tar can't do that.
Basically, anything you can do with tar, you can do with cpio but not the
converse.
John Levine, ima!johnl
PS: I offer no defense of the internal coding style of cpio, which
still has innumerable MERT-isms. Ugh.
henry@utzoo.UUCP (Henry Spencer) (08/12/84)
Cpio pre-dates tar, I believe, and the USG/USDL/whatever-it's-called- -this-week turkeys didn't have the brains to drop cpio when tar arrived. The last /usr/group standards meeting tentatively recommended using tar as the standard tape format for interchange work, due to much wider availability and better standardization ("cpio" is not a single format, it's at least three different ones). -- Henry Spencer @ U of Toronto Zoology {allegra,ihnp4,linus,decvax}!utzoo!henry
wjb@ariel.UUCP (W.BOGSTAD) (08/12/84)
With the versions of tar and cpio, I have used there is a big difference. "tar" does not backup special files correctly. The "dump" and "restor" programs available under version 7 are a real pain to use so cpio was used instead. I don't know if current versions of tar have this problem, but it would be one reason to justify cpio. Bill Bogstad
henry@utzoo.UUCP (Henry Spencer) (08/13/84)
It's worth noting that it is quite possible to write "tpio", i.e. a program with the cpio user interface but the tar data format. There is no doubt that some of cpio's facilities are useful, but this is a case where it's quite possible to have your cake and eat it too, by combining some of the neat bits of user interface in cpio with the much-more-widely-standard tar data format. -- Henry Spencer @ U of Toronto Zoology {allegra,ihnp4,linus,decvax}!utzoo!henry
barmar@mit-eddie.UUCP (Barry Margolin) (08/13/84)
In article <226@haddock.UUCP> johnl@haddock.UUCP writes: > - reading file names from stdin is a feature, not a bug. You can use > find to enumerate just the files you want rather than having to dump > everything in a directory tree, e.g. > > $ find somedir -mtime -14 -print | cpio -oB >/dev/rmt0 > > (dump only files modified within the last two weeks.) Doing this > with tar is pretty hard. It isn't really very hard: tar <options> `find ...` Accepting file names on the command line is the Unix convention. Note that I have no real opinion on the debate. I have only used tar so far. In response to someone's mention of "tp", the predecessor to "tar": it is still available in 4.2BSD. -- Barry Margolin ARPA: barmar@MIT-Multics UUCP: ..!genrad!mit-eddie!barmar
dan@idis.UUCP (08/13/84)
I believe that the history of the "tar" program reported during the "tar .vs. cpio" debate in net.unix is a bit confused. The "tar" program is from unix V7 and was intended to provide a convenient and machine independent way of sending collections of files (e.g. a software distribution) to other machines. The most immediate ancestor of "tar" is probably "ar". Before version 7 (i.e. version 6), we used ad hoc combinations of "tp", "ar", and "dd". I suspect that "tp" was originally designed to work with dectapes (small replaceable blocks) and extended to handle 9 track mag tapes (potentially large but nonreplaceable blocks) as an afterthought. I doubt that "tar" was originally intended to be used for system backup. This is why most versions of "tar" cannot handle special files or multiple volumes. The "cpio" program comes from PWB unix (which was developed at about the same time as V7 but by a different group of people). I imagine that "cpio" was developed for some of the same reasons as "tar" ("tp", "ar", and "dd" being inadequate). Dan Strick University of Pittsburgh [decvax|mcnc]!idis!dan
gwyn@brl-tgr.ARPA (Doug Gwyn <gwyn>) (08/14/84)
There is a (relatively small) limit on the total length of command line options, typically 5120 bytes. I have lots of archives whose component file names far exceed this limit.
wls@astrovax.UUCP (William L. Sebok) (08/14/84)
>In article <226@haddock.UUCP> johnl@haddock.UUCP writes: >> - reading file names from stdin is a feature, not a bug. You can use >> find to enumerate just the files you want rather than having to dump >> everything in a directory tree, e.g. >> >> $ find somedir -mtime -14 -print | cpio -oB >/dev/rmt0 >> >> (dump only files modified within the last two weeks.) Doing this >> with tar is pretty hard. > >It isn't really very hard: > tar <options> `find ...` > >Accepting file names on the command line is the Unix convention. > Barry Margolin No. It may be the Unix convention but it is not useful for dumping large numbers of files (like when doing a backup). There is a limit on how large an argument list can be passed to a program. On the Vax under 4.2 BSD as distributed this is 10240 characters. This can be easily exceeded in a medium large file system. Heck, it is sometimes exceeded in a run-away uucp spool directory. -- Bill Sebok Princeton University, Astrophysics {allegra,akgua,burl,cbosgd,decvax,ihnp4,noao,princeton,vax135}!astrovax!wls
jhall@ihuxu.UUCP (John R. Hall-"the samurai MTS") (08/14/84)
If Berkeley UNIX has the xargs command, I believe you could use the following technique to avoid exceeding the MAXARGS parameter (usually 10 blocks default): find ...| xargs [options] tar xargs reads an argument list from standard input, and repeatedly builds up and executes a command line for the specified command. xargs is part of System V; it's not in my very old Berkeley UNIX manuals... -- --John R. Hall, ...ihnp4!ihuxu!jhall "And may your days be celebrations"
adm@cbneb.UUCP (08/14/84)
#R:ucbopal:-19800:cbnap:27300004:000:579 cbnap!whp Aug 14 09:36:00 1984 >>It isn't really very hard: >> tar <options> `find ...` >> >>Accepting file names on the command line is the Unix convention. >> Barry Margolin > >No. It may be the Unix convention but it is not useful for dumping large >numbers of files (like when doing a backup). There is a limit on how large >an argument list can be passed to a program. I don't know about BSD UNIX, but in sys V you can always use: find <options> |xargs tar <tar options> (xargs is a program that reads stdin, constructs and executes a proper command line, and repeats until eof is found on stdin)
marcus@pyuxt.UUCP (M. G. Hand) (08/15/84)
I've always found cpio quite adequate and usable, except for one thing: I want a -U option which would UNCONDITIONALLY copy files over existing ones whose permissions were, eg 444. Of course, it should still obey the other rules about ownership (ie it would overwrite those files which would be deleted by rm -f). This would save a lot of hassle. Marcus Hand (pyuxt!marcus)
bsa@ncoast.UUCP (The WITNESS) (08/15/84)
First, both tar and tp were in V7 Unix. Second, V7-flavored Unixes (including 99% of Xenix) lack cpio; -c format with ^J at the end of a header would not be readable by us. I find the #!/bin/sh stuff bad enough as it is; PLEASE don't do THAT to me! :-) --bsa -- Brandon Allbery: decvax!cwruecmp{!atvax}!bsa: R0176@CSUOHIO.BITNET ^ Note name change! 6504 Chestnut Road, Independence, OH 44131 <> (216) 524-1416 "The more they overthink the plumbin', the easier 'tis tae stop up the drain."
geoff@callan.UUCP (Geoff Kuenning) (08/17/84)
Nobody seems to have mentioned special files. Cpio correctly saves and restores the files in /dev and fifo's. Most tar's break on this (although, again, this can be fixed easily). -- Geoff Kuenning Callan Data Systems ...!ihnp4!wlbr!callan!geoff
jab@uokvax.UUCP (08/23/84)
#R:ihuxu:-37200:uokvax:6100040:000:824 uokvax!jab Aug 22 22:18:00 1984 /***** uokvax:net.unix / ihuxu!jhall / 12:56 pm Aug 14, 1984 */ If Berkeley UNIX has the xargs command, I believe you could use the following technique to avoid exceeding the MAXARGS parameter (usually 10 blocks default): --John R. Hall, ...ihnp4!ihuxu!jhall "And may your days be celebrations" /* ---------- */ Umm, I had the understanding that MAXARGS/NCARGS was the number of bytes passed as arguments from one program to the program it was exec'ing. The code in the exec(2) system call allocates only so much space for the arguments when it begins to fabricate the running program, and it's quite unwilling to let you pass MORE than those numbers. xargs(1) is only a program --- it still runs the command in question using the exec(2) system call, and is still stuck with those constraints. Jeff Bowles LIsle, IL
barmar@mit-eddie.UUCP (Barry Margolin) (08/23/84)
In article <3948@brl-tgr.ARPA> gwyn@brl-tgr.ARPA (Doug Gwyn <gwyn>) writes: >There is a (relatively small) limit on the total length of >command line options, typically 5120 bytes. I have lots of >archives whose component file names far exceed this limit. I guess I'm completely spoiled by Multics. I'm used to command lines that can take up a 1 megabyte segment and stack frames that can handle argument lists that have up to 16K parameters (64Kwords/max-stack-frame, two pointers/parameter, 2 words/pointer). -- Barry Margolin ARPA: barmar@MIT-Multics UUCP: ..!genrad!mit-eddie!barmar