[comp.std.unix] Benefits of CPIO over TAR

std-unix@uunet.UUCP (08/13/87)

Phone: +44 1 251 2128
Telex: 295467 inset g
From: Jim R Oldroyd <mcvax!inset!jr@seismo.css.gov>

[ This has been a bit delayed due to miscommunication.  -mod ]

I would like to present a number of points regarding CPIO which I
feel are relevant to the ongoing discussion concerning a Data
Interchange Format for the POSIX 1003.1 standard.

I shall correct a number of important points regarding the CPIO
format; points which have been incorrectly stated in recent articles.

1.  At no time has a proposal been made to standardise the binary
    cpio format.  Only the `cpio -c' format is under consideration.

2.  The `cpio -c' format is widely in use in Europe for both Data
    Interchange and Archival purposes.  It's widespread use can
    be attributed, in part, to its endorsement by the X/OPEN Group.

3.  Only one version of the `cpio -c' format is currently in use.
    It is this format being proposed for standardisation.

4.  The `cpio -c' header is written entirely in character form.
    No numerical information is stored in machine-dependent binary
    form.

5.  The `cpio -c' format is capable of archiving and restoring
    all POSIX file types: directories, block special files,
    character special files, regular files and fifos.

6.  The `cpio -c' format can handle pathnames up to 256 bytes.
    This is the length guaranteed on all POSIX systems.

7.  The `cpio -c' format is in the public domain.  (See X/OPEN
    Portability Guide, Volume 2, cpio(4)).

8.  Inode numbers are not recorded.  Symbolic values (derived from a 
    file's inode and device numbers) are stored in the header
    block.  These values are used solely for hard link resolution.

9.  File types are stored in symbolic form.  Symbols are derived from
    historical UNIX file type values.  There is room for 64
    file types; currently only 5 are supported.

A number of points have recently been raised as drawbacks of CPIO.
These points seem to be problems with a particular implementation
of a cpio utility.  As the characteristics of the utility are not
relevant for 1003.1, I present only a short summary of points:

	- file names are terminated by '\0'
		This is normal UNIX practice for string termination
		and applies to TAR (and USTAR) equally.  On CPIO,
		the '\0' is redundant information and need not
		be interpreted as the file name length is also provided.

	- the user interface is less convenient
		This is subjective; many people feel that the
		opposite is true.  The user interface is easily
		alterable (discuss with 1003.2).

	- file name size is 128 bytes
		Wrong!  It is 256; see above.

	- cpio header is full of OS dependent information
		Wrong!  All information describes file
		characteristics.  There is no OS dependent
		information.  See point 9, above.

	- header must start on a word boundary
		Wrong!  The header is character oriented and
		can be read as individual bytes from the archive.

	- format cannot be extended to meet future requirements
		Wrong!  Implementations already exist which can
		archive symbolic links and contiguous files.
		There is far more scope for future extension
		than available in the proposed USTAR format.


Independent of the archive format used, some guidelines must
be followed to ensure that an archive can be extracted on ANY
POSIX system.  Note that the following are NOT rules for using
cpio; they apply equally well to other interchange formats
if portability across ALL systems is to be achieved:

	- only POSIX defined file types should be archived
	- headers should be written in US ASCII character set
	- minumum values in section 2.9 for h_uid, h_gid,
	  h_nlinks, etc should not be exceeded
	- no portion of any filename should exceed 14 characters
	- one cpio archive should fit on a single medium
	- only one archive should exist per medium
	- relative pathnames (ie, no leading /) should be used
	- tapes should be written in `raw' mode
	- tapes should be written with 5120 byte blocks

Any archive intended for use only between systems supporting
more capabilities than the minimum required by POSIX need
not be so restrictive.



I believe that the `cpio -c' tape format has a number of strong
advantages over both the existing tar and the POSIX extended tar
formats.  The `cpio -c' format handles all POSIX file types
correctly, it has been extended to handle other known file types
and there is adequate opportunity for further extension.

Thank you,
	Jim R Oldroyd.

Volume-Number: Volume 12, Number 10

guy@Sun.COM (Guy Harris) (08/13/87)

From: uunet!Sun.COM!guy (Guy Harris)

> 1.  At no time has a proposal been made to standardise the binary
>     cpio format.  Only the `cpio -c' format is under consideration.

A proposal was made by Lorraine Kevra of AT&T to standardize both the binary
and character "cpio" format.  Fortunately, this proposal appears to have been
dropped in favor of a character-format-only proposal.

> 2.  The `cpio -c' format is widely in use in Europe for both Data
>     Interchange and Archival purposes.  It's widespread use can
>     be attributed, in part, to its endorsement by the X/OPEN Group.

Both the "tar" and "cpio -c" format are in wide use throughout the world.  The
major commonly available UNIX source distributions all include implementations
of "tar", but not all of them include implementations of "cpio".

> 8.  Inode numbers are not recorded.  Symbolic values (derived from a 
>     file's inode and device numbers) are stored in the header
>     block.  These values are used solely for hard link resolution.

> 9.  File types are stored in symbolic form.  Symbols are derived from
>     historical UNIX file type values.  There is room for 64
>     file types; currently only 5 are supported.

The proposal does not match what is stated in the X/OPEN Portability Guide
(January 1987).  In Volume 2, section "File Formats", under CPIO(4), it states:

	When the "-c" option of "cpio(1)" is used, the header information is
	described by:

	printf or scanf(<big string>, <list of arguments>);

	...The meanings of the items "h_dev" through "h_mtime" are explained in
	"stat(2)".

The items in question include "h_dev", "h_ino", and "h_mode".

Under "stat(2)", those fields are given their customary UNIX meanings.

I presume X/OPEN plans to alter CPIO(4) to reflect the fact that "h_dev",
"h_ino", and "h_mode" are no longer directly connected to the "st_dev",
"st_ino", and "st_mode" fields of the "stat" structure.

> 	- format cannot be extended to meet future requirements
> 		Wrong!  Implementations already exist which can
> 		archive symbolic links and contiguous files.
> 		There is far more scope for future extension
> 		than available in the proposed USTAR format.

Could you please indicate how this is the case?

Volume-Number: Volume 12, Number 11

std-unix@uunet.UUCP (08/13/87)

From: gwyn@BRL-SMOKE.ARPA (Doug Gwyn )

In article <890@uunet.UU.NET> Jim R Oldroyd <mcvax!inset!jr@seismo.css.gov> writes:
>3.  Only one version of the `cpio -c' format is currently in use.

Ahem.  Cray changed theirs.  Admittedly that was very short-sighted!

>8.  Inode numbers are not recorded.  Symbolic values (derived from a 
>    file's inode and device numbers) are stored in the header
>    block.  These values are used solely for hard link resolution.

Unfortunately, on systems where the cpio fields for this information
are not big enough, one can find that the wrong links are planted
when files are de-archived.  This has actually happened to me.

I never did understand what inter-system archive interchange formats
had to do with specification of a portable environment for
applications.  You probably couldn't read my 1/4" tape cartridge no
matter what archive format I used on it.  This issue seems to be a
waste of time for 1003.1 and I recommend that it be delegated to
another subgroup, preferably 1003.2 which needs to specify the utility
to cope with such archives anyway.

Volume-Number: Volume 12, Number 13