std-unix@ut-sally.UUCP (Moderator, John Quarterman) (05/09/87)
Section 10 of the POSIX Trial Use Standard (and of the current draft) describes a data interchange format based on the tar program. The P1003.1 Working Group has recently received two related proposals regarding that section: one to add cpio format (including old-style, non c option format); the other to replace the tar format with cpio format. It was also proposed in the latest Working Group meeting to drop section 10 altogether and let P1003.2 handle the issue. As the moderator of this newsgroup, I solicit comments about what should be done with section 10. As a Working Group member, I will take such comments into account when I submit a proposal in a few weeks. If there is sufficient interest, I will post an outline of that proposal in the newsgroup (as myself, rather than as the moderator). I can also post the already-submitted proposals. Volume-Number: Volume 11oree:elp th
guy@sun.com (Guy Harris) (05/10/87)
From guy@sun.com Sun May 10 02:30:14 1987 From: guy@sun.com (Guy Harris) > As the moderator of this newsgroup, I solicit comments about what should > be done with section 10. One thing that should not be done, under any circumstances, is to replace "tar" with "cpio" - *especially* if it includes the old non-"-c" form. The non-portable form is completely useless for moving data between systems with different byte orders unless you have a clever "cpio" that figures out that the byte order is backwards and undoes the damage. I discovered this when trying to read a "cpio" tape made on a VAX in the old format; no combination of "cpio" byte-swapping options and "dd conv=swab" would help. I finally ended up fixing our "cpio" to do the aforementioned look-at-the-header-and-undo-the-damage stuff. The X/OPEN standard uses "cpio". The rationale given exhibits a distressing degree of incompetence: If an exchange mdeium is to be read on a target machine that is architecturally different from the source machine, problems may arise concerning the ordering of bytes within a word and words within a long word (see the portability guides in Part III). These can easily be handled when using "cpio" as an exchange utility, while with "tar" it may be a little more difficult. Now, I will first note here that the *only* time I had a problem moving "tar" tapes between machines was when I had to move things to a Plexus. The problem was *not* that the machines had different byte orders; the problem was that the Plexus had a typical brain-damaged Multibus tape controller that swapped bytes when it transferred data to and from memory. "cpio" would not have made this any easier; the System III byte-swapping option did not swap the bytes on *all* blocks read, but just swapped the bytes on data blocks and in file names. The intent here was clearly that you would read a tape written on a machine with a different byte order by doing something like dd if=/dev/rmt0 conv=swab | cpio -ids "dd" would swap everything; "cpio -s" would un-swap everything but the binary data in the header. (We pause to note that merely swapping the binary data in the header would be much more efficient, especially given that "dd" is somewhat of a pig.) This works, but is less than wonderful. (And it doesn't solve the problem with the Plexus; to solve that you just stick the "dd" in front of "cpio" and don't bother with "-s" at all.) The System V "cpio" byte-swapping and word-swapping options work *only* on data blocks; they have no effect whatsoever on binary data in the header or on file names. This means that the trick that worked with the System III "cpio" wouldn't work at all - and the problem with the Plexus still isn't fixed, if that was the intent. The S5 options are useless for old-style non-"-c" tapes. They are of some use with "-c" tapes - but only if all the files on the tape consist solely of "short"s or "long"s, since the data in the data blocks are all byte-swapped or word-swapped in the same fashion. Most files I tend to put on or extract from "cpio" tapes are text files, which obviously need no swapping. In short, the arguments offered by X/OPEN in favor of "cpio" are completely bogus. Now for the arguments against "cpio" format: 1) It is somewhat more UNIX-specific, in that the "mode" field of the "stat" structure is written out numerically. POSIX does not specify required numeric values for this field. "tar" indicates the file type with a standard symbolic code, so you can read "tar" tapes even if the machine on which the tape was written and the machine on which it is being read do not have the same values for this field. 2) It does not handle hard links particularly elegantly. "cpio" knows nothing of files with multiple hard links when it writes a tape; if it is told to write "foo" and "bar" to the tape, and they are both hard links to the same file, it writes two copies of this file to the tape. The hard links are established when the tape is read. If the files appear on the tape in the order "foo" and then "bar", "foo" will be read in first. Once "bar" has been read in, "cpio" will check to see if it has already read in a file with the same dev/inumber value. If so, it will delete "bar" and make a hard link to "foo" called "bar". 3) It is less common. Almost all UNIX systems that support "cpio" also support "tar"; many UNIX systems that support "tar" do not support "cpio". 4) POSIX has already chosen "tar" format; why should it change horses in midstream, especially given that the new horse is lame and, despite the claims made by the person selling the horse, is not capable of pulling any heavier loads than the existing one? Anyway, I'll have to dig up the proposal made to POSIX that "cpio" supplement or replace "tar" and cast a very strong "no" vote citing the above. Now, as for the proposal for handing the whole thing off to P1003.2 - I have some inclination to support this. It could, in some ways, be considered neither part of the scope of P1003.1 nor of P1003.2, but to be a separate standardization topic entirely. However, if I had to choose which of the two items - C-language binding to OS system call and library functions, or command-language functions - the data interchange standard belonged to, I'd vote in favor of the latter. There is no library of functions for reading or writing "tar" tapes, but there is a command (namely, "tar") for reading and writing them, so I think it belongs in that category - especially given that Section 10 currently says "A conforming system shall implement a user utility..." which really sounds a lot more like a P1003.2 requirement than a P1003.1 requirement. Volume-Number: Volume 11, Number 9
caf@omen.uucp (Chuck Forsberg) (05/10/87)
From: caf@omen.uucp (Chuck Forsberg) I have played with a program "afio" which is what cpio should/might have been. Main features are much faster than cpio, and reads all sorts of cpio archives, most notably damaged archives. Out of Sync --- It gets its own help! This could be the basis of a POSIX cpio program. Chuck Forsberg WA7KGX Author of Pro-YAM communications Tools for PCDOS and Unix ...!tektronix!reed!omen!caf Omen Technology Inc "The High Reliability Software" 17505-V Northwest Sauvie Island Road Portland OR 97231 Voice: 503-621-3406 TeleGodzilla BBS: 621-3746 2400/1200 CIS:70007,2304 Genie:CAF Source:TCE022 omen Any ACU 1200 1-503-621-3746 se:--se: link ord: Giznoid in:--in: uucp omen!/usr/spool/uucppublic/FILES lists all uucp-able files, updated hourly Volume-Number: Volume 11, Number7 7
ralph@ralmar.uucp (Ralph Barker) (05/11/87)
From: ralph@ralmar.uucp (Ralph Barker) Relative to posting proposals already received and the proposal which you will make to the working group: I, for one, would be most interested in seeing the proposals you have already received (assuming that the writers have included both their suggestions and the underlying reasoning). [ They're the articles just before yours. -mod ] I suspect that such interim postings might stir additional discussion, as well. As an aside, THANKS for your efforts within the working group. The results of your efforts (and the efforts of all members of the committees) are of great importance to all of us in the UNIX community. [ You're welcome. -mod ] --- Ralph Barker, RALMAR Business Systems, 640 So Winchester Blvd, San Jose,CA 95128 uucp: ...{ucbvax,hplabs}!sun!idi---\!ralmar!ralph ...pyramid!amdahl!unixprt----/ Voice: (408) 559-6202 Volume-Number: Volume 11, Number 16
hedrick@topaz.rutgers.edu (Charles Hedrick) (05/11/87)
From: hedrick@topaz.rutgers.edu (Charles Hedrick) Since tar exists (as far as I know) on all Unix systems, and cpio only on ATT ones, tar seems like the best choice for portable use. Obviously any real POSIX will have both tar and cpio, but why not leave the standard at tar? (Note that I haven't read the POSIX standard, so all I know about the question is what you mentioned in your note. I'm responding as if the choice is between the existing tar or cpio programs. If this is a new facility that will require a new program to be written, then I have no comment.) [ The format in POSIX is upwardly compatible with existing tar programs. The format proposed for cpio is that of the current cpio program. -mod ] By the way, what ever happened about job control? I recall some discussion, but not the final resolution. I had hoped that POSIX would manage to get a few BSD features into general circulation. Clearly from the end user's point of view, job control is the one most important thing missing from System V. (Actually networking is more important, but it's not clear whether that is the sort of thing POSIX should be concerned about.) [ Job control (the HP proposal) is in the current draft. Networking is outside the scope of P1003.1, but there is a /usr/group Technical Committee addressing the subject with the intention of eventually providing input to an appropriate IEEE Working Group. -mod ] Volume-Number: Volume 11, Number 10
gwyn@brl.arpa (Doug Gwyn) (05/11/87)
From: gwyn@brl.arpa (Doug Gwyn) Let's get the 1003.1 standard adopted and worry about perfection later. In the real world one HAS to have a working "tar" if one exchanges files with many random UNIX sites, even if "cpio" might be better technically. Any proposal for CPIO format that is system-dependent ("old cpio") rather than portable ("new cpio") should be rejected out of hand. 1003.2 should probably include the "cpio" utility, which has many uses besides tape archives. 1003.1 should stick to "tar" for tape archives, or remove that section altogether. I would prefer to remove tape archive format altogether from what is supposed to be a program/system interface specification (1003.1). There simply isn't a single universal interchange medium anyway (not every system has 1/2" magtape, for example). Volume-Number: Volume 11, Number L(B
std-unix@ut-sally.UUCP (Moderator, John Quarterman) (05/11/87)
There have been three documents submitted to the IEEE P1003.1 Working Group recently regarding section 10: N.043 April 22 1987 ``X/OPEN Proposals to IEEE P1003.1,'' X/OPEN Group, Section 3.25, ``Data Interchange format.'' N.048 April 15, 1987 ``a proposal for a cpio format to be added to Chapter 10,'' Lorraine C. Kevra, AT&T. N.064 April 23, 1987 ``Comments on 1003.1 N.048,'' Dominic Dunlop. They're about a page apiece. I may feel energetic enough to type them in. Volume-Number: Volume 11, Number 11
jsdy@hadron.uucp (Joseph S. D. Yao) (05/12/87)
In article <8006@ut-sally.UUCP> guy@sun.com (Guy Harris) writes: > 3) It is less common. Almost all UNIX systems that support > "cpio" also support "tar"; many UNIX systems that support > "tar" do not support "cpio". Guy's arguments are mostly good, especially when reasoning about the byte-order problem. It should perhaps be noted, though, that cpio pre-dates tar, and that there are probably numerous systems "out there" that have cpio but not tar. This, at least, seems to be one of the arguments used by X/OPEN. Of course, terms like "numerous", "almost all", and "many" are hard to argue against, because they're so fuzzy. Personally, I have found good use for both (cpio -p is rather more elegant than the 2-tar equivalent kludge). However, I have had minutely more foul-ups with cpio than with tar. (At least, with current versions of tar.) Joe Yao jsdy@hadron.COM (not yet domainised) hadron!jsdy@{seismo.CSS.GOV,dtix.ARPA,decuac.DEC.COM} {arinc,att,avatar,cos,decuac,dtix,ecogong,kcwc}!hadron!jsdy {netex,netxcom,rlgvax,seismo,smsdpg,sundc}!hadron!jsdy Volume-Number: Volume 11, Number 22
trb@ima.ISC.COM (Andrew Tannenbaum) (05/20/87)
From: trb@ima.ISC.COM (Andrew Tannenbaum) I don't have Section 10 of the POSIX Trial Use Standard, but I am interested in what happens to tar and cpio in POSIX. I see that the netnews discussion of this has been partly a popularity contest between tar and cpio. There are more important issues to discuss than people's provincial biases. If you come from BSD land, you probably like tar. If you come from AT&T land, you probably like cpio. I have some comments about cpio, since it is my personal favorite. They apply to both the file format and to the program function. Some comments apply to tar as well. I like the idea of cpio taking a list of files on stdin. I wish tar had this option. tar cv `find / -print | fgrep -v -f except.file` doesn't cut it. [ Evidently John Gilmore's public domain implementation that he posted to comp.sources has this. I know of no proprietary version that does. -mod ] cpio's binary format should have been killed off long ago. cpio has a 'portable' format, which still has several problems: - Byte swapping and its friends. There are systems which swap bytes and/or halfwords. There are even systems which xor 0 and 1 bits on tapes. If CPIO wrote a magic number 0x12345678 in the header, it could resolve these problems painlessly. - I agree that the binary cpio header is silly. The portable header is all printable ASCII data, but the filename is terminated by a null, which makes it harder to play with the archive. Here is a shar-like program which makes a cpio archive which can unpack itself. <<>><<>><<>><<>><<>><<>><<>><<>><<>><<>><<>><<>><<>><<>><<>> #! /bin/sh # take a list of files on stdin and make them into a bundle which # can be passed through sh to extract them. cat << \! #!/bin/sh # cpio archive (read a; read a; read a; read a; cat) < $0 | cpio -icdm exit ! cpio -oac <<>><<>><<>><<>><<>><<>><<>><<>><<>><<>><<>><<>><<>><<>><<>> As I recall, it can have problems because of the fact that the filename is null-terminated, like if you try to read its output into a mail message with an editor. It would also be neighborly if the ASCII header was more human readable, a space or carriage return here or there wouldn't hurt at all. I realize that this is a standardization effort, but if you are going to enhance the format for some portability reason, you might want to consider my enhancement suggestions. - The familiar problems with damaged archives should be fixed (Out of phase--get help). - There are systems which need to extend the archive formats in local ways, for instance, to add extra mode information for a secure UNIX implementation, or file type information when the UNIX system deals with other types of file systems. It would be very useful to have a compatible way to extend the header such that any system could check the local field and either use or ignore the information. Right now, there is little hope for compatibility, the only solutions I can think of (various kinds of shadow files which contain mode info) are quite kludgy. - These programs should deal with multiple tape archives in a standard way. I have seen many local hacks to do this. - Blocksizes for speed, space, and streaming efficiency are best handled by blocking filters rather than by hacks like -B. I have heard of ctccpio, but can only worry about what it actually is. How many programs are going to have to have knowledge about how many goofy devices? I don't understand why cpio has to know anything about a device. This is UNIX, isn't it? Which brings us back to the question of multi-tape archives, maybe the blocking filter should also handle the multi-tape problem? This means lots of data travels over a pipe. Modern OS's should be able to do something smart here. (The multi-tape question leaves me with a queasy feeling.) I would like to see a discussion about tar and cpio rather than opinions about which is better. I am particularly concerned about extending the header format to deal with atypical file types. Andrew Tannenbaum Interactive Boston, MA +1 617 247 1155 Volume-Number: Volume 11, Number 34
henry@utzoo.uucp (Henry Spencer) (05/28/87)
From: henry@utzoo.uucp (Henry Spencer) Andy's comments about the facilities offered by tar and cpio are worthy of note, but irrelevant to the P1003.1 issues. This was actually raised at the original /usr/group standards meeting when the question of a standard intercharge format came up: the facilities offered by the current programs are quite irrelevant to the choice of format, since the format does not dictate the user interface. It is not especially difficult to write "cpar" or "tpio", to get one user interface with the other format. I thought the choice of tar by /usr/group was a huge win, and still think so; the extensions added in the Trial Use Standard strengthen this view. The cpio binary format is a travesty: unportable, non-extensible (for example, it is sure that inode numbers are only 16 bits, often not true today), and generally a mess. Cpio ASCII format is better, but it still shares some of these problems, since its field widths are sized to fit old systems (for example, it can't deal with 32-bit inode numbers either). Furthermore, I would note that at least the cpio binary format, possibly the ASCII one as well, has existed in two different versions. People who claim that cpio is older than tar are half-lying: the current version of cpio is not. I submit that the mere existence of multiple incompatible versions of the cpio format is a major black mark against it. Tar format is virtually universal, with only minor (compatible!) extensions having been made here and there. Andy makes a good point about extensibility. The tar format extends gracefully because it has extra room in its header (which the existing programs helpfully zero rather than filling with random trash) and in its file-type code space. (Note that the Trial Use Standard explicitly reserves certain type codes for local extensions, and others for future standards. Note also that the Trial Use Standard's own extensions are upward-compatible with the existing format and existing programs.) Chapter 10 of the Trial Use Standard is a valuable part of the standard, it is not broken, and it does not need fixing. Leave it there. Henry Spencer @ U of Toronto Zoology {allegra,ihnp4,decvax,pyramid}!utzoo!henry Volume-Number: Volume 11, Number 39
std-unix@ut-sally.UUCP (06/02/87)
From: seismo!scgvacd!stb!michael (Michael Gersten) I have some comments on Tar/Cpio. First, Radio SHack does sell a tar that has an argument F for "here is a file with the files you should read". Works with - for stdin. However, you can't read the files from stdin and write the output to stdout, although you can write to a named pipe (does mostly the same thing). Secondly, whatever you use for backups should know about blocksizes. In particular, if you lose one floppy, users should be able to restore all the information on the other floppies. Tar does not do this--linking information gets lost if this occurs. In particular, Floppy A has file X on it Floppy B has "Link X to Y" If you lose floppy A, you've got garbage for Y. Worse, if you restore out of order, no warning is given other than "Cannot link". Finally, I feel that tar, in order to be usable as a backup facility, should be required to unlink a file before it restores it. Otherwise, consider this: Customer uses initialization floppy to initialize hard disk, which puts basic commands (ls, tar, cp, etc) on disk, then restores the entire system from tarred floppies. Initialization system had /bin/l linked to /bin/ls (AT&T versions) Customer had /bin/ls linked to lc, lf, lx (Berkeley versions), and the AT&T as ls.old After untarring, the Berkeley version was lost, and the AT&T version was under all the names. Took me a while to figure this one out. Guess who the customer was. I do not consider backup/restore usable as they take 5 minutes per file to recover individual files. I am not kidding; maybe R/S mucked something, but that is ridiculus. Sure, you can get faster, but only if you first format the disk, which takes 2 hours, and also do an incremental dump first. [ Do you mean dump and restor? (Or dump and restore?) -mod ] --- : Michael Gersten seismo!scgvaxd!stb!michael : The above is the result of being educated at a school that discriminates : against roosters. Volume-Number: Volume 11, Number 48
std-unix@ut-sally.UUCP (06/02/87)
From: tony@uqcspe.oz.au (Tony O'Hagan) Last year I wrote an off-line tape archive system for use on our UNIX machines. Tar and cpio were carefully compared to decide the appropiate format for the tape archives. Eventually we chose cpio because it permitted retrieval of any pattern of files. About 120+ cpio files are stored on each archive tape and from bitter experience I know that sometimes when appending/retrieving files to/from tape an insufficient number of files are skipped. (We count them using taprd now) It would have been useful to write a "file label" included in the header of each cpio file and be able to check this when reading. I would have used it to check the file number but I'm sure it would have other uses. ( I would skip to the file before and check it's label when appending. ) I recently adapted the archive system from V7 to BSD 4.2 and added the facility to drive remote tape drives using the blocking filters which fitted in well with cpio. [ These are useful points, though they are not problems with the data interchange format, rather with the program that uses it. -mod ] P.S. There are a few other bugs in cpio which I had to fix for the local version. * creating new otherwise unstored directories with the current mask (not mode 777). (with -d switch) * not changing the ownership/group/permissions of existing directories back to their values at the time of archiving. ============================================================================== Tony O'Hagan Australia: (07) 3774125 International: +61 7 3774125 University of Queensland CSNET: tony@uqcspe.oz ACSnet: tony@uqcspe.oz Dept. of Computer Science UUCP: ...!seismo!munnari!uqcspe.oz!tony St. Lucia, Brisbane, ARPA: tony%uqcspe.oz@seismo.css.gov AUSTRALIA 4067 JANET: uqcspe.oz!tony@ukc Volume-Number: Volume 11, Number 42