[net.unix] tar format

barmar@mit-eddie.UUCP (Barry Margolin) (12/03/83)

Thanks s to those who responded to my tar format query (the Basic-Plus
program that someone sent me was not really useful, though, as I could
barely recognize this as the Basic I remember (Dartmouth standard,
TSS-8, and Microsoft).  Here is what seems to be the best answer, a
manual page for tar(5) (one of our administrators put it online today,
presumably due to my posting).  This is the nroff source for the manual
page for 4.2:

.TH TAR 5  "15 January 1983"
.SH NAME
tar \- tape archive file format
.SH DESCRIPTION
.IR Tar ,
(the tape archive command)
dumps several files into one, in a medium suitable for transportation.
.PP
A ``tar tape'' or file is a series of blocks.  Each block is of size TBLOCK.
A file on the tape is represented by a header block which describes
the file, followed by zero or more blocks which give the contents of the
file.  At the end of the tape are two blocks filled with binary
zeros, as an end-of-file indicator.  
.PP
The blocks are grouped for physical I/O operations.  Each group of
.I n
blocks (where
.I n
is set by the 
.B b
keyletter on the 
.IR tar (1)
command line \(em default is 20 blocks) is written with a single system
call; on nine-track tapes, the result of this write is a single tape
record.  The last group is always written at the full size, so blocks after
the two zero blocks contain random data.  On reading, the specified or
default group size is used for the
first read, but if that read returns less than a full tape block, the reduced
block size is used for further reads.
.PP
The header block looks like:
.RS
.PP
.nf
#define TBLOCK	512
#define NAMSIZ	100

union hblock {
	char dummy[TBLOCK];
	struct header {
		char name[NAMSIZ];
		char mode[8];
		char uid[8];
		char gid[8];
		char size[12];
		char mtime[12];
		char chksum[8];
		char linkflag;
		char linkname[NAMSIZ];
	} dbuf;
};
.ta \w'#define 'u +\w'SARMAG 'u
.fi
.RE
.LP
.IR Name
is a null-terminated string.
The other fields are zero-filled octal numbers in ASCII.  Each field
(of width w) contains w-2 digits, a space, and a null, except
.IR size
and
.IR mtime ,
which do not contain the trailing null.
.IR Name
is the name of the file, as specified on the 
.I tar
command line.  Files dumped because they were in a directory which
was named in the command line have the directory name as prefix and
.I /filename
as suffix.
.  \"Whatever format was used in the command line
.  \"will appear here, such as
.  \".I \&./yellow
.  \"or
.  \".IR \&../../brick/./road/.. .
.  \"To retrieve a file from a tar tape, an exact prefix match must be specified,
.  \"including all of the directory prefix information used on the command line
.  \"that dumped the file (if any).
.IR Mode
is the file mode, with the top bit masked off.
.IR Uid
and
.IR gid
are the user and group numbers which own the file.
.IR Size
is the size of the file in bytes.  Links and symbolic links are dumped
with this field specified as zero.
.IR Mtime
is the modification time of the file at the time it was dumped.
.IR Chksum
is a decimal ASCII value which represents the sum of all the bytes in the
header block.  When calculating the checksum, the 
.IR chksum
field is treated as if it were all blanks.
.IR Linkflag
is ASCII `0' if the file is ``normal'' or a special file, ASCII `1'
if it is an hard link, and ASCII `2'
if it is a symbolic link.  The name linked-to, if any, is in
.IR linkname,
with a trailing null.
Unused fields of the header are binary zeros (and are included in the
checksum).
.PP
The first time a given i-node number is dumped, it is dumped as a regular
file.  The second and subsequent times, it is dumped as a link instead.
Upon retrieval, if a link entry is retrieved, but not the file it was
linked to, an error message is printed and the tape must be manually
re-scanned to retrieve the linked-to file.
.PP
The encoding of the header is designed to be portable across machines.
.SH "SEE ALSO"
tar(1)
.SH BUGS
Names or linknames longer than NAMSIZ produce error reports and cannot be
dumped.
-- 
			Barry Margolin
			ARPA: barmar@MIT-Multics
			UUCP: ..!genrad!mit-eddie!barmar

jsdy@hadron.UUCP (Joseph S. D. Yao) (12/27/85)

In article <358@ukecc.UUCP> edward@ukecc.UUCP (Edward C. Bennett) writes:
>	I'm sure this has been beaten to death before so please MAIL
>your responses.
Strangely, not, so I'll post my response.

>	When BSD tar creates an archive of a directory, it (by default)
>writes a tar header block for the directory itself. SysV tar doesn't do
>this. Is one way more 'correct' than the other? Is there an actual "standard"
>for tar written down somewhere? Should I bother to hack my SysV tar to
>write directory blocks?
>UUCP: ihnp4!cbosgd!ukma!ukecc!edward

On 4BSD, the 'o' flag specifies that directory blocks not be written
"for compatibility with previous versions."  This is not to be confused
with s5's 'o' flag which specifies on input that the files' ownership
not be changed to match the tape.  Personally, I prefer that the
directory information be passed, but so far I haven't bothered to hack
s5 tar (and make it non-SV-standard).

I'm not sure whether SVID addresses it (I'm pretty sure it doesn't).
However, the IEEE OS standard has a tape information interchange
standard that looks remarkably, I am told, like tar.	;-)
-- 

	Joe Yao		hadron!jsdy@seismo.{CSS.GOV,ARPA,UUCP}

edward@ukecc.UUCP (Edward C. Bennett) (01/02/86)

	I'm sure this has been beaten to death before so please MAIL
your responses.

	When BSD tar creates an archive of a directory, it (by default)
writes a tar header block for the directory itself. SysV tar doesn't do
this. Is one way more 'correct' than the other? Is there an actual "standard"
for tar written down somewhere? Should I bother to hack my SysV tar to
write directory blocks?

-- 
Edward C. Bennett

UUCP: ihnp4!cbosgd!ukma!ukecc!edward

/* A charter member of the Scooter bunch */

"Goodnight M.A."

jsq@im4u.UUCP (John Quarterman) (01/05/86)

In article <165@hadron.UUCP> jsdy@hadron.UUCP (Joseph S. D. Yao) writes:
>I'm not sure whether SVID addresses it (I'm pretty sure it doesn't).
>However, the IEEE OS standard has a tape information interchange
>standard that looks remarkably, I am told, like tar.	;-)

The current IEEE P1003 draft standard (Draft 6) includes a data interchange
format which is modeled after tar.  It has some extensions to the V7 one,
including a format for directory entries on the tape which is the same
as the one 4BSD uses:  like plain files, but with a slash on the end of
the filename.
-- 
John Quarterman, UUCP:  {gatech,harvard,ihnp4,pyramid,seismo}!ut-sally!im4u!jsq
ARPA Internet and CSNET:  jsq@im4u.UTEXAS.EDU, jsq@sally.UTEXAS.EDU

gnu@hoptoad.uucp (John Gilmore) (01/10/86)

In article <714@im4u.UUCP>, jsq@im4u.UUCP (John Quarterman) writes:
> The current IEEE P1003 draft standard (Draft 6) includes a data interchange
> format which is modeled after tar.  It has some extensions to the V7 one,
> including a format for directory entries on the tape which is the same
> as the one 4BSD uses:  like plain files, but with a slash on the end of
> the filename.

Hmm...my copy of Draft 6 defines a new "file type" 5 for directories,
which is incompatible with both Berkeley and V7 Unixes.  It mentions
that some systems put out tapes with these trailing-slash files, but
doesn't say whether a standard system is required to do anything
about that.  (From P1003.D6.doc:)

> Implementors should be aware that the previous file format did not include
> a mechanism to archive directory type files.  For this reason, the
> convention of using a file name which ended with a slash (/) was adopted
> to specify the archiving of a directory.

Maybe jsq is referring to a draft later than D6?

jsq@im4u.UUCP (John Quarterman) (01/11/86)

In article <420@hoptoad.uucp> gnu@hoptoad.uucp (John Gilmore) writes:
>Maybe jsq is referring to a draft later than D6?

No, it changed between D5 and D6 and I didn't notice.

There's a P1003 committee meeting next week, Monday through Wednesday,
13-15 January at the Denver Marriott City Center Hotel, just before USENIX.
That will be nearly the last chance for input before the Trial Use standard.
-- 
John Quarterman, UUCP:  {gatech,harvard,ihnp4,pyramid,seismo}!ut-sally!im4u!jsq
ARPA Internet and CSNET:  jsq@im4u.UTEXAS.EDU, jsq@sally.UTEXAS.EDU