[comp.unix.admin] Summary on TAR and CPIO formats....

dmdata@login.dkuug.dk (Bent Bertelsen) (05/08/91)

Summary on differences on TAR and CPIO.

I got several good and informative replys on my request for information
on differences between TAR and CPIO programs.

I thanks all who replyed, I got the informations i whanted and some
more. 
There was replys from both users and programmers whitch put some
points from both's side.
				Kristen Nielsen.
                                E-mail: dmdata@login.dkuug.dk


Here are the replyes.

From: guy@auspex.com (Guy Harris)

>I am looking for advantages and dis-advantages for both of the two.

"tar" handles symbolic links in the form in which it comes in BSD;
"cpio" doesn't handle symbolic links in the form in which it comes in
System V prior to S5R4, and some vendors may have added symlinks to
their system without enhancing "cpio" to know about them.  Others may
have enhanced it in a way other than the way I did it at Sun, and which
was adopted by AT&T (and which is, I think, also present in the "cpio"
that Berkeley picked up from AT&T and put into a later BSD release - I
think I gave them my changes).

(S5R4 does some funny stuff with "tar"; basically, its "cpio" can handle
"tar" format input, and write it on output, and it probably handles
symbolic links.  They may not have bothered doing anything to enhance
"tar" as a result.)

"cpio" handles special files; "tar", unless you're talking about a
POSIXish version, doesn't.

"tar" comes with V7, System III, System V, and BSD source; "cpio" comes
only with System III, System V, and later BSD (4.3-tahoe and later).

"tar"'s way of handling multiple hard links to a file can handle file
systems that support 32-bit inumbers (e.g., the BSD file system);
"cpio"s way requires you to play some games (in its "binary" format,
i-numbers are only 16 bits, and in its "portable ASCII" format, they're
18 bits - it would have to play games with the "file system ID" field of
the header to make sure that the file system ID/i-number pairs of
different files were always different), and I don't know which "cpio"s,
if any, play those games.  Those that don't might get confused and think
two files are the same file when they're not, and make hard links
between them.

"tar"s way of handling multiple hard links to a file places only one
copy of the link on the tape, but the name attached to that copy is the
*only* one you can use to retrieve the file; "cpio"s way puts one copy
for every link, but you can retrieve it using any of the names.

>What type of check sum (if any) is used, and how is this calculated. 

See the attached manual pages for "tar" and "cpio" format.  "tar" uses a
checksum which is the sum of all the bytes in the "tar" header for a
file; "cpio" uses no checksum.

>If anyone knows why cpio was made when tar was prasent at the unix scene,

It wasn't.  "cpio" first showed up in PWB/UNIX 1.0; no
generally-available version of UNIX had "tar" at the time.  I don't know
whether any version that was generally available *within AT&T* had
"tar", or, if so, whether the people within AT&T who did "cpio" knew
about it.

Manual pages:

#! /bin/sh
# This is a shell archive.  Remove anything before this line, then unpack
# it by saving it into a file and typing "sh file".  To overwrite existing
# files, type "sh file -c".  You can also feed this as standard input via
# unshar, or by typing "sh <file", e.g..  If this archive is complete, you
# will see the following message at the end:
#		"End of shell archive."
# Contents:  tar.5 cpio.5
# Wrapped by guy@auspex on Tue Feb 26 10:57:37 1991
PATH=/bin:/usr/bin:/usr/ucb ; export PATH
if test -f 'tar.5' -a "${1}" != "-c" ; then 
  echo shar: Will not clobber existing file \"'tar.5'\"
else
echo shar: Extracting \"'tar.5'\" \(3801 characters\)
sed "s/^X//" >'tar.5' <<'END_OF_FILE'
X.\" @(#)tar.5 1.8 89/03/27 SMI; from UCB 4.2
X.TH TAR 5  "19 October 1987"
X.SH NAME
Xtar \- tape archive file format
X.SH DESCRIPTION
X.IX  "tar file"  ""  "\fLtar\fP \(em tape archive file format"
X.LP
X.BR tar ,
X(the tape archive command)
Xdumps several files into one, in a medium suitable for transportation.
X.LP
XA ``tar tape'' or file is a series of
Xblocks.  Each block is of size
X.SM TBLOCK\s0.
XA file on the tape is represented by a
Xheader block which describes
Xthe file, followed by zero or more blocks
Xwhich give the contents of the
Xfile.  At the end of the tape are two blocks
Xfilled with binary zeros, as an
X.SM EOF
Xindicator.
X.LP
XThe blocks are grouped for physical I/O
Xoperations.  Each group of
X.I n
Xblocks (where
X.I n
Xis set by the
X.B b
Xkeyletter on the
X.BR tar (1)
Xcommand line \(em default is 20 blocks) is
Xwritten with a single system call; on nine-track
Xtapes, the result of this write is a single tape
Xrecord.  The last group is always written
Xat the full size, so blocks after
Xthe two zero blocks contain random data. 
XOn reading, the specified or
Xdefault group size is used for the
Xfirst read, but if that read returns less than
Xa full tape block, the reduced
Xblock size is used for further reads, unless the
X.B B
Xkeyletter is used.
X.LP
XThe header block looks like:
X.RS
X.LP
X.ft B
X.nf
X#define \s-1TBLOCK\s0	512
X#define \s-1NAMSIZ\s0	100
Xunion hblock {
X	char dummy[\s-1TBLOCK\s0];
X	struct header {
X		char name[\s-1NAMSIZ\s0];
X		char mode[8];
X		char uid[8];
X		char gid[8];
X		char size[12];
X		char mtime[12];
X		char chksum[8];
X		char linkflag;
X		char linkname[\s-1NAMSIZ\s0];
X	} dbuf;
X};
X.ft R
X.fi
X.RE
X.LP
X.IR name
Xis a
X.SM NULL\s0-terminated
Xstring.  The other fields are zero-filled
Xoctal numbers in
X.SM ASCII\s0.
XEach field (of width
X.IR w )
Xcontains w-2 digits, a
X.SM SPACE\s0,
Xand a
X.SM NULL\s0,
Xexcept
X.IR size
Xand
X.IR mtime ,
Xwhich do not contain the trailing
X.SM NULL\s0.
X.IR name
Xis the name of the file, as specified on the
X.B tar
Xcommand line.  Files dumped because they were
Xin a directory which was named in the command
Xline have the directory name as prefix and
X.I /filename
Xas suffix.
X.  \"Whatever format was used in the command line
X.  \"will appear here, such as
X.  \".I \&./yellow
X.  \"or
X.  \".IR \&../../brick/./road/.. .
X.  \"To retrieve a file from a tar tape, an exact prefix match must be specified,
X.  \"including all of the directory prefix information used on the command line
X.  \"that dumped the file (if any).
X.IR mode
Xis the file mode, with the top bit masked off.
X.IR uid
Xand
X.IR gid
Xare the user and group numbers which own the file.
X.IR size
Xis the size of the file in bytes.
XLinks and symbolic links are dumped
Xwith this field specified as zero.
X.I mtime
Xis the modification time of the file at
Xthe time it was dumped.
X.I chksum
Xis a decimal
X.SM ASCII
Xvalue which represents the sum of all the bytes in the
Xheader block.  When calculating the checksum, the
X.IR chksum
Xfield is treated as if it were all blanks.
X.IR linkflag
Xis
X.SM ASCII
X`0' if the file is ``normal'' or a special file,
X.SM ASCII
X`1' if it is an hard link, and
X.SM ASCII
X`2' if it is a symbolic link.
XThe name linked-to, if any, is in
X.IR linkname ,
Xwith a trailing
X.SM NULL\s0.
XUnused fields of the header are binary
Xzeros (and are included in the checksum).
X.LP
XThe first time a given inode number is dumped,
Xit is dumped as a regular file.  The second and
Xsubsequent times, it is dumped as a link instead.
XUpon retrieval, if a link entry is retrieved,
Xbut not the file it was linked to, an error message
Xis printed and the tape must be manually
Xre-scanned to retrieve the linked-to file.
X.LP
XThe encoding of the header is designed to be
Xportable across machines.
X.SH "SEE ALSO"
X.BR tar (1)
X.SH BUGS
XNames or linknames longer than
X.SM NAMSIZ
Xproduce error reports and cannot be dumped.
END_OF_FILE
if test 3801 -ne `wc -c <'tar.5'`; then
    echo shar: \"'tar.5'\" unpacked with wrong size!
fi
chmod +x 'tar.5'
# end of 'tar.5'
fi
if test -f 'cpio.5' -a "${1}" != "-c" ; then 
  echo shar: Will not clobber existing file \"'cpio.5'\"
else
echo shar: Extracting \"'cpio.5'\" \(2473 characters\)
sed "s/^X//" >'cpio.5' <<'END_OF_FILE'
X.\" @(#)cpio.5 1.12 89/03/27 SMI; from UCB 4.2
X.TH CPIO 5  "22 March 1989"
X.SH NAME
Xcpio \- format of cpio archive
X.SH DESCRIPTION
X.IX  "cpio file"  ""  "\fLcpio\fP \(em cpio archive format"
X.LP
XThe old format
X.I header
Xstructure, when the
X.B \-c
Xoption of
X.B cpio
Xis not used, is:
X.LP
X.RS
X.nf
X.ft B
Xstruct {
X	short	h_magic,
X		h_dev;
X	ushort	h_ino,
X		h_mode,
X		h_uid,
X		h_gid;
X	short	h_nlink,
X		h_rdev,
X		h_mtime[2],
X		h_namesize,
X		h_filesize[2];
X	char	h_name[h_namesize rounded to a word];
X} Hdr;
X.fi
X.ft R
X.RE
X.LP
XThe byte order here is that of the machine on which the tape was written.
XIf the tape is being read on a machine with a different byte order, you have
Xto use
X.BR swab (3)
Xafter reading the header.  You can determine what byte order the tape was
Xwritten with by examining the
X.I h_magic
Xfield; if it is equal to
X0143561 (octal), which is the standard magic number 070707 (octal) with the
Xbytes swapped, the tape was written in a byte order opposite to that of the
Xmachine on which it is being read.  If you are producing a tape to be read
Xon a machine with the opposite byte order to that of the machine on which it
Xis being produced, you can use
X.B swap
Xbefore writing the header.
X.LP
XWhen the
X.B \-c
Xoption is used, the
X.I header
Xinformation is
Xdescribed by the statement below:
X.LP
X.ft B
X.nf
X	sscanf(Chdr, "%6o%6o%6o%6o%6o%6o%6o%6o%11lo%6o%11lo%s",
X		&Hdr.h_magic, &Hdr.h_dev, &Hdr.h_ino, &Hdr.h_mode,
X		&Hdr.h_uid, &Hdr.h_gid, &Hdr.h_nlink, &Hdr.h_rdev,
X		&Hdr.h_mtime, &Hdr.h_namesize, &Hdr.h_filesize, &Hdr.h_name);
X.fi
X.ft R
X.LP
X.I Longtime
Xand
X.I Longfile
Xare equivalent to
X.I Hdr.h_mtime
Xand
X.IR Hdr.h_filesize ,
Xrespectively.  The contents of each file is
Xrecorded in an element of the array of varying length structures,
X.IR archive ,
Xtogether with other items describing the
Xfile.
XEvery instance of
X.I h_magic
Xcontains the constant 070707 (octal).
XThe items
X.I h_dev
Xthrough
X.I h_mtime
Xhave meanings explained in
X.BR stat (2).
XThe length of the
X.SM NULL\s0-terminated
Xpath name
X.IR h_name ,
Xincluding the
X.SM NULL
Xbyte, is given by
X.IR h_namesize .
X.LP
XThe last record of the
X.I archive
Xalways contains the name
X.BR \s-1TRAILER\s0!!! .
XSpecial files, directories, and the trailer, are recorded
Xwith
X.I h_filesize
Xequal to zero.  Symbolic links are recorded similarly
Xto regular files, with the ``contents'' of the file being the name of the
Xfile the symbolic link points to.
X.SH "SEE ALSO"
X.BR cpio (1),
X.BR find (1),
X.BR stat (2),
X.BR swab (3)
END_OF_FILE
if test 2473 -ne `wc -c <'cpio.5'`; then
    echo shar: \"'cpio.5'\" unpacked with wrong size!
fi
chmod +x 'cpio.5'
# end of 'cpio.5'
fi
echo shar: End of shell archive.
exit 0


----------------------------- oooOOOooo ------------------------------------

From: talgras!david@uunet.UU.NET (David Hoopes)
Organization: Tallgrass Technologies Inc.

In article <dmdata.667565216@dkuugin> you write:
>
>Hello out there.
>
>I am going to do some lessons in the area of unix sysadm, history etc.
>When I was running trough the present material, i would like to get 
>some answers about differences in tar and cpio.n
>
>Could anyone please tell me the differences between tar and cpio. 
>
>I am looking for advantages and dis-advantages for both of the two.
>

tar does not backup special files.  I got bite by this once.  After a system
crash I did a total restore and the tty ports for my multi-port serrial card
did not get restored.  Cpio does restore special files (I checked).

On restore if there is a coruption on then tape tar will stop at that point,
while cpio will skip over it and try to restore the rest of the files.

Cpio seems to do a better job of restoreing links.

Please post the results that you get.

-- 
---------------------------------------------------------------------
David Hoopes                              Tallgras Technologies Inc. 
uunet!talgras!david                       11100 W 82nd St.          
Voice: (913) 492-6002 x323                Lenexa, Ks  66214        

----------------------------- oooOOOooo ----------------------------


From: Leslie Mikesell <les@chinet.chi.il.us>
Organization: Chinet - Chicago Public Access UNIX


>Could anyone please tell me the differences between tar and cpio. 

The main difference is just in the command syntax and header format.

>I am looking for advantages and dis-advantages for both of the two.

Tar is a little more tape-oriented in that everything is blocked to
start on a block boundary.  Cpio knows about special files (devices
and FIFOS and is thus more suitable for complete backups on systems
that don't have dump.
 
>Is there any differences between the ability to recover crashed archives
>between the two of them. (Is there any chance of recovering crashed archives
>at all.)

Theoretically it should be easier under tar since the blocking lets you
find a header with some variation of "dd skip=nn".  However, modern
cpio's and variations have an option to just search for the next
file header after an error with a reasonable chance of re-syncing.  However,
lots of tape driver software won't allow you to continue past a
media error which should be the only reason for getting out of sync
unless a file changed sizes while you were writing the archive.


>If anyone knows why cpio was made when tar was prasent at the unix scene, 
>please tell me about this too.

Probably because it is more media efficient (by not blocking everything
and using only the space needed for the headers where tar always uses
512 bytes per file header) and it knows how to archive special files.

You might want to look at the freely available alternatives.  The major
ones are  afio, GNU tar, and PAX, each of which have their own extensions
with some backwards compatibility.

Les Mikesell
  les@chinet.chi.il.us


----------------------------------- oooOOOooo -------------------------

Thanks to all who replyed.
             Kristen Nielsen. 
             E-mail: dmdata@login.dkuug.dk