[comp.sys.amiga] PD tar, part 2

dpz@klinzhai.RUTGERS.EDU (David P. Zimmerman) (01/16/87)

Hello all,

	Here is shar #2 of 2.  Enjoy!

						dpz

#	This is a shell archive.
#	Remove everything above and including the cut line.
#	Then run the rest of the file through sh.
#----cut here-----cut here-----cut here-----cut here----#
#!/bin/sh
# shar:    Shell Archiver
#	Run the following text with /bin/sh to create:
#	Makefile
#	PORTING
#	README
#	TODO
#	tar.1
#	tar.5
# This archive created: Thu Jan 15 21:01:40 1987
cat << \SHAR_EOF > Makefile
# Makefile for public domain tar program.
# @(#)Makefile 1.13	86/10/29

# Berserkeley version
DEFS = -DBSD42
LIBS = 
LINTFLAGS = -abchx
DEF_AR_FILE = \"/dev/rmt8\"
DEFBLOCKING = 20

# USG version
#DEFS = -DUSG
#LIBS = -lndir
#LINTFLAGS = -bx
#DEF_AR_FILE = \"/dev/rmt8\"
#DEFBLOCKING = 20

# UniSoft's Uniplus SVR2 with NFS
#DEFS = -DUSG -DUNIPLUS -DNFS -DSVR2
#LIBS = -lndir
#LINTFLAGS = -bx
#DEF_AR_FILE = \"/dev/rmt8\"
#DEFBLOCKING = 20

# V7 version
#DEFS = -DV7 -Dvoid=int
#LIBS = -lndir
#LINTFLAGS = -abchx
#DEF_AR_FILE = \"/dev/rmt8\"
#DEFBLOCKING = 20

CFLAGS = $(COPTS) $(DEFS) \
	-DDEF_AR_FILE=$(DEF_AR_FILE) \
	-DDEFBLOCKING=$(DEFBLOCKING)
# next line for Debugging
#COPTS = -g
# next line for Production
COPTS = -O

# Add things here like getopt, readdir, etc that aren't in your
# standard libraries.
SUBSRC=	
SUBOBJ=	

SRCS =	tar.c create.c extract.c buffer.c getoldopt.c list.c names.c \
	port.c $(SUBSRC)
OBJS = 	tar.o create.o extract.o buffer.o getoldopt.o list.o names.o \
	port.o $(SUBOBJ)
AUX =	README PORTING Makefile TODO tar.1 tar.5 tar.h port.h

tar:	$(OBJS)
	cc -o tar $(COPTS) $(OBJS) $(LIBS)

lint:	$(SRCS)
	lint $(LINTFLAGS) $(CFLAGS) $(SRCS)

clean:
	rm -f errs *.o tar

tar.shar: $(SRCS) $(AUX)
	shar >tar.shar $(AUX) $(SRCS)

tar.tar.Z: $(SRCS) $(AUX)
	/bin/tar cf - $(AUX) $(SRCS) | compress -v >tar.tar.Z

$(OBJS): tar.h port.h
SHAR_EOF
cat << \SHAR_EOF > PORTING
		Porting hints for public domain tar
		  John Gilmore, ihnp4!hoptoad!gnu
		     @(#)PORTING 1.5	86/10/29

The Makefile should be edited to comment out all the undesired
versions, and create the following configuration lines for the system
you are compiling it on:

DEFS = the proper #define's to conditionally compile for
your system.
LIBS = the system libraries and/or object modules to link
with the program.
LINTFLAGS = a good strong way to invoke 'lint' on your system.
DEF_AR_FILE = the name of the default archive file on your system.
It should be enclosed in quoted quotes, e.g. \"/dev/foo\" .
DEFBLOCKING = the default blocking factor on your system.

A copy of "getopt", the standard argument parser, is required.  It's in
libc on Missed'em V systems and 4.3BSD; on most other systems, you'll
need a copy of a public domain getopt, available through mod.sources
or the AT&T Toolchest if you can't find it elsewhere.

A copy of the Berkeley directory access routines is also required.
These are in libc and <sys/dir.h> on Berkeley systems.  A public domain
version is available through mod.sources.  There is an #include you
have to change in create.c for this, to set the name of the include
file you have.  Some systems have the include file in <sys/ndir.h>.
You'll have to find it on your system, or get the public domain one
and place it somewhere.

Grep for FIXME to find places that aren't finished or which have
portability problems.  Also see the file TODO.
SHAR_EOF
cat << \SHAR_EOF > README
This is a public domain tar(1) replacement.  It implements the 'c',
'x', and 't' commands of Unix tar, and many of the options.  It creates
P1003 "Unix Standard" [draft 6] tapes by default, and can read and
write both old and new formats.  It can decompress tar archives when
reading them from disk files (using the 'z' option), but cannot do so
when writing, or when reading from a tape drive.  Its verbose output
looks more like "ls -l" than the Unix tar, and even lines up the
columns.  It is a little better at reading damaged tapes than Unix tar.

It is designed to be a lot more efficient than the standard Unix tar;
it does as little bcopy-ing as possible, and does file I/O in large
blocks.  On the other hand, it has not been timed or performance-tuned;
it's just *designed* to be faster.

On the Sun, the tar archives it creates under the 'old' option are
byte-for-byte the same as those created by /bin/tar, except the trash
at the end of each file and at the end of the archive.

It was written and initially debugged on a Sun Workstation running
4.2BSD.  It has been run on Xenix, Unisoft, Vax 4.2BSD, V7, and USG
systems.  I'm interested in finding people who will port it to other
types of (Unix and non-Unix) systems, use it, and send back the
changes; and people who will add the obscure tar options that they
happen to use and I don't.  In particular, VMS, MSDOS, Mac, Atari and
Amiga versions would be handy.

It still has a number of loose ends, marked by "FIXME" comments in the
source.  For example, it does not chown() extracted files.  Fixes to
these things are also welcome.

I am the author of all the code in this program.  I hereby place it in
the public domain.  If you modify it, or port it to another system,
please send me back a copy, so I can keep a master source.

	John Gilmore
	Nebula Consultants
	1504 Golden Gate Ave.
	San Francisco, California, USA  94115
	+1 415 931 4667		voice
	hoptoad!gnu		data
	jgilmore@lll-crg.arpa	data
Hoptoad talks to sun, ptsfa, well, lll-crg, ihnp4, cbosgd, ucsfcgl, pyramid.

@(#)README 1.5	86/10/29
SHAR_EOF
cat << \SHAR_EOF > TODO
@(#) TODO 1.6 86/10/29

Install new mkdir from the net for non-Berkeley systems.

Look at SUID, SGID; look at -p and -m options.  (test them).

Handle owner/group on extraction.

creation of links and symlinks doesn't follow the -k (f_keep) guidelines;
if the file already exists, it is not replaced, even though no -k.

Check stderr and stdout for errors after writing, and quit if so.

Compression option to automatically pipe thru compress (both input&output).
(Need a 3rd process to reblock compress's output for output case, and when
reading from tape drives.)

Preliminary design of Multifile option to handle EOFs on input and output.
Multifile can just write EOF when it hits end of archive, and ask for
archive to be changed.  Start off 2nd archive medium with odd header
block, duplicating original, but with offset to start of data spec'd.
Reading such a header causes tar non-'M' to complain while extracting
(but to seek there and do it anyway!)  Big win -- this works on
cartridge tapes, should work on floppies, might work on magtape.
It would encourage the *&%#$ systems programmers to fix their drivers, too!

Profile it and see where the time, call counts, etc are going.

Test reading compressed tapes with odd blocksizes.
(real tape drives, that is...)
(may need buffer proc no matter what.)

Fix directory timestamps after inserting files into them.  Wait til next
file that's not in the directory.  Need a stack of them.

Add option to delete N matching(?) chars from the front of a file to
be extracted/listed.  Great for reading tapes written with names starting
from "/"...

Option to seek the input file (in skip_file) rather than reading
and tossing it?  (Could just jump in buffer if stuff is in core.)
Could misalign archive reads versus filesys and slow it down, who knows?

Add -C option for creating from odd directories a la 4.2BSD?

Break out odd bits of code into separate support modules.

Add the r, u, X, l, F, C, and digit options of Unix tar.
SHAR_EOF
cat << \SHAR_EOF > tar.1
.TH TAR 1 "31 October 1986"
.SH NAME
tar \- tape (or other media) file archiver
.SH SYNOPSIS
\fBtar\fP \-[\fBBcDhikmopstvxzZ\fP]
[\fB\-b\fP \fIN\fP]
[\fB\-f\fP \fIF\fP]
[\fB\-T\fP \fIF\fP]
[ \fIfilename\fP\| .\|.\|.  ]
.SH DESCRIPTION
\fItar\fP provides a way to store many files into a single archive,
which can be kept in another Unix file, stored on an I/O device
such as tape, floppy, cartridge, or disk, or piped to another program.
It is useful for making backup copies, or for packaging up a set of
files to move them to another system.
.LP
\fItar\fP has existed since Version 7 Unix with very little change.
It has been proposed as the standard format for interchange of files
among systems that conform to the P1003 ``Portable Operating System''
standard.
.LP
This version of \fItar\fP supports the extensions which
were proposed in the P1003 draft standards, including owner and group
names, and support for named pipes, fifos, and block and character devices.
.LP
When reading an archive, this version of \fItar\fP continues after
finding an error.  Previous versions required the `i' option to ignore
checksum errors.
.SH OPTIONS
\fItar\fP options can be specified in either of two ways.  The usual
Unix conventions can be used: each option is preceded by `\-'; arguments
directly follow each option; multiple options can be combined behind one `\-'
as long as they take no arguments.  For compatability with the Unix
\fItar\fP program, the options may also be specified as ``keyletters,''
wherein all the option letters occur in the first argument to \fItar\fP,
with no `\-', and their arguments, if any, occur in the second, third, ...
arguments.  Examples:
.LP
Normal:  tar -f arcname -cv file1 file2
.LP
Old:  tar fcv arcname file1 file2
.LP
At least one of the \fB\-c\fP, \fB\-t\fP, or \fB\-x\fP options
must be included.  The rest are optional.
.LP
Files to be operated upon are specified by a list of file names, which
follows the option specifications (or can be read from a file by the
\fB\-T\fP option).  Specifying a directory name causes that directory
and all the files it contains to be (recursively) processed.  In general,
specifying full path names when creating an archive is a bad idea,
since when the files are
extracted, they will have to be extracted into exactly where they were
dumped from.  Instead, \fIcd\fP to the
root directory and use relative file names.
.IP "\fB\-b\fP \fIN\fP"
Specify a blocking factor for the archive.  The block size will be
\fIN\fP x 512 bytes.  Larger blocks typically run faster and let you
fit more data on a tape.  The default blocking factor is set when
\fItar\fP is compiled, and is typically 20.  There is no limit to the
maximum block size, as long as enough memory can be allocated for it,
and as long as the device containing the archive can read or write
that block size.
.IP \fB\-B\fP
When reading an archive, reblock it as we read it.
Normally, \fItar\fP reads each
block with a single \fIread(2)\fP system call.  This does not work
when reading from a pipe or network socket under Berkeley Unix.
With this option, it
will do multiple \fIread(2)\fPs until it gets enough data to fill 
the specified block size.  \fB\-B\fP can also be used to speed up
the reading of tapes that were written with small blocking factors,
by specifying a large blocking factor with \fB\-b\fP and having \fItar\fP
read many small blocks into memory before it tries to process them.
.IP \fB\-c\fP
Create an archive from a list of files.
.IP \fB\-D\fP
With each message that \fItar\fP produces, print the record number
within the archive where the message occurred.  This option is especially
useful when reading damaged archives, since it helps to pinpoint the damaged
section.
.IP "\fB\-f\fP \fIF\fP"
Specify the filename of the archive.  If the specified filename is ``\-'',
the archive is read from the standard input or written to the standard output.
If this option is not used,
a default archive name (which was picked when tar was compiled) is used.
The default is normally set to the ``first'' tape drive or other transportable
I/O medium on the system.
.IP \fB\-h\fP
When creating an archive, if a symbolic link is encountered, dump
the file or directory to which it points, rather than
dumping it as a symbolic link.
.IP \fB\-i\fP
When reading an archive, ignore blocks of zeros in the archive.  Normally
a block of zeros indicates the end of the archive,
but in a damaged archive, or one which was
created by appending several archives, this option allows \fItar\fP to 
continue.  It is not on by default because there is garbage written after the
zeroed blocks by the Unix \fItar\fP program.
.IP \fB\-k\fP
When extracting files from an archive, keep existing files, rather than
overwriting them with the version from the archive.
.IP \fB\-m\fP
When extracting files from an archive, set each file's modified timestamp
to the current time, rather than extracting each file's modified
timestamp from the archive.
.IP \fB\-o\fP
When creating an archive, write an old format archive, which does not
include information about directories, pipes, or device files, and 
specifies file ownership by uid's and gid's rather than by
user names and group names.  In most cases, a ``new'' format archive
can be read by an ``old'' tar program without serious trouble, so this
option should seldom be needed.
.IP \fB\-p\fP
When extracting files from an archive, restore them to the same permissions
that they had in the archive.  If \fB\-p\fP is not specified, the current
umask limits the permissions of the extracted files.  See \fIumask(2)\fP.
.IP \fB\-t\fP
List a table of contents of an existing archive.  If file names are
specified, just list files matching the specified names.
.IP \fB\-s\fP
When specifying a list of filenames to be listed
or extracted from an archive,
the \fB\-s\fP flag specifies that the list
is sorted into the same order as the tape.  This allows a large list
to be used, even on small machines, because
the entire list need not be read into memory at once.  Such a sorted
list can easily be created by running ``tar \-t'' on the archive and
editing its output.
.IP "\fB\-T\fP \fIF\fP"
Rather than specifying the file names to operate on as arguments to
the \fItar\fP command, this option specifies that the file names should
be read from the file \fIF\fP, one per line.
If the file name specified is ``\-'',
the list is read from the standard input.
This option, in conjunction with the \fB\-s\fP option,
allows an arbitrarily large list of files to be processed, 
and allows the list to be piped to \fItar\fP.
.IP \fB\-v\fP
Be verbose about the files that are being processed or listed.  Normally,
archive creation or file extraction are silent, and archive listing just
gives file names.  The \fB\-v\fP option causes an ``ls \-l''\-like listing
to be produced.
.IP \fB\-x\fP
Extract files from an existing archive.  If file names are
specified, just extract files matching the specified names, otherwise extract
all the files in the archive.
.IP "\fB\-z\fP or \fB\-Z\fP"
When extracting or listing an archive,
these options specify
that the archive should be decompressed while it is read, using the \-d
option of the \fIcompress(1)\fP program.  The archive itself is not
modified.
.SH "SEE ALSO"
shar(1), tar(5), compress(1), ar(1), arc(1), cpio(1), dump(8), restore(8),
restor(8)
.SH BUGS
The \fBr, u, w, X, l, F, C\fP, and \fIdigit\fP options of Unix \fItar\fP
are not supported.
.LP
It should be possible to create a compressed archive with the \fB\-z\fP option.
.LP
SHAR_EOF
cat << \SHAR_EOF > tar.5
.TH TAR 5 "31 October 1986"
.SH NAME
tar \- tape (or other media) archive file format
.SH DESCRIPTION
A ``tar tape'' or file contains a series of records.  Each record contains
TRECORDSIZE bytes (see below).  Although this format may be thought of as
being on magnetic tape, other media are often used.
Each file archived is represented by a header record
which describes the file, followed by zero or more records which give the
contents of the file.  At the end of the archive file there may be a record
filled with binary zeros as an end-of-file indicator.  A conforming
system must write a record of zeros at the end, but must not assume that
an end-of-file record exists when reading an archive.

The records may be blocked for physical I/O operations.  Each block of
\fIN\fP records (where \fIN\fP is set by the \fB\-b\fP option to \fItar\fP)
is written with a single write() operation.  On
magnetic tapes, the result of such a write is a single tape record.
When writing an archive, the last block of records shall be written
at the full size, with records after the zero record containing
undefined data.  When reading an archive, a confirming system shall
properly handle an archive whose last block is shorter than the rest.

The header record is defined in the header file <tar.h> as follows:
.nf
.sp .5v
.DT
/*
 * Standard Archive Format - Standard TAR - USTAR
 */
#define	RECORDSIZE	512
#define	NAMSIZ	100
#define	TUNMLEN	32
#define	TGNMLEN	32

union record {
	char		charptr[RECORDSIZE];
	struct header {
		char	name[NAMSIZ];
		char	mode[8];
		char	uid[8];
		char	gid[8];
		char	size[12];
		char	mtime[12];
		char	chksum[8];
		char	linkflag;
		char	linkname[NAMSIZ];
		char	magic[8];
		char	uname[TUNMLEN];
		char	gname[TGNMLEN];
		char	devmajor[8];
		char	devminor[8];
	} header;
};

/* The checksum field is filled with this while the checksum is computed. */
#define	CHKBLANKS	"        "		/* 8 blanks, no null */

/* The magic field is filled with this if uname and gname are valid. */
#define	TMAGIC	"ustar  "		/* 7 chars and a null */

/* The linkflag defines the type of file */
#define	LF_OLDNORMAL '\\0'		/* Normal disk file, Unix compatible */
#define	LF_NORMAL	'0'		/* Normal disk file */
#define	LF_LINK	'1'		/* Link to previously dumped file */
#define	LF_SYMLINK	'2'		/* Symbolic link */
#define	LF_CHR	'3'		/* Character special file */
#define	LF_BLK	'4'		/* Block special file */
#define	LF_DIR		'5'		/* Directory */
#define	LF_FIFO	'6'		/* FIFO special file */
#define	LF_CONTIG	'7'		/* Contiguous file */
/* Further link types may be defined later. */

/* Bits used in the mode field - values in octal */
#define	TSUID		04000		/* Set UID on execution */
#define	TSGID		02000		/* Set GID on execution */
#define	TSVTX		01000		/* Save text (sticky bit) */

/* File permissions */
#define	TUREAD	00400		/* read by owner */
#define	TUWRITE	00200		/* write by owner */
#define	TUEXEC	00100		/* execute/search by owner */
#define	TGREAD	00040		/* read by group */
#define	TGWRITE	00020		/* write by group */
#define	TGEXEC	00010		/* execute/search by group */
#define	TOREAD	00004		/* read by other */
#define	TOWRITE	00002		/* write by other */
#define	TOEXEC	00001		/* execute/search by other */
.fi
.LP
All characters in header records
are represented using 8-bit characters in the local
variant of ASCII.
Each field within the structure is contiguous; that is, there is
no padding used within the structure.  Each character on the archive medium
is stored contiguously.

Bytes representing the contents of files (after the header record
of each file) are not translated in any way and
are not constrained to represent characters or to be in any character set.
The \fItar\fP(5) format does not distinguish text files from binary
files, and no translation of file contents should be performed.

The fields \fIname, linkname, magic, uname\fP, and \fIgname\fP are
null-terminated
character strings.  All other fields are zero-filled octal numbers in
ASCII.  Each numeric field (of width \fIw\fP) contains \fIw\fP-2 digits, a space, and
a null, except \fIsize\fP and \fImtime\fP,
which do not contain the trailing null.

The \fIname\fP field is the pathname of the file, with directory names
(if any) preceding the file name, separated by slashes.

The \fImode\fP field provides nine bits specifying file permissions and three
bits to specify the Set UID, Set GID and Save Text (TSVTX) modes.  Values
for these bits are defined above.  When special permissions are required
to create a file with a given mode, and the user restoring files from the
archive does not hold such permissions, the mode bit(s) specifying those
special permissions are ignored.  Modes which are not supported by the
operating system restoring files from the archive will be ignored.
Unsupported modes should be faked up when creating an archive; e.g.
the group permission could be copied from the `other' permission.

The \fIuid\fP and \fIgid\fP fields are the user and group ID of the file owners,
respectively.

The \fIsize\fP field is the size of the file in bytes; linked files are archived
with this field specified as zero.

The \fImtime\fP field is the modification time of the file at the time it was
archived.  It is the ASCII representation of the octal value of the
last time the file was modified, represented as in integer number of
seconds since January 1, 1970, 00:00 Coordinated Universal Time.

The \fIchksum\fP field is the ASCII representaion of the octal value of the
simple sum of all bytes in the header record.  Each 8-bit byte in the
header is treated as an unsigned value.  These values are added to an
unsigned integer, initialized to zero, the precision of which shall be no
less than seventeen bits.  When calculating the checksum, the \fIchksum\fP
field is treated as if it were all blanks.

The \fItypeflag\fP field specifies the type of file archived.  If a particular
implementation does not recognize or permit the specified type, the file
will be extracted as if it were a regular file.  As this action occurs,
\fItar\fP issues a warning to the standard error.
.IP "LF_NORMAL or LF_OLDNORMAL"
represents a regular file.
For backward compatibility, a \fItypeflag\fP value of LF_OLDNORMAL
should be silently recognized as a regular file.  New archives should
be created using LF_NORMAL.
Also, for backward
compatability, \fItar\fP treats a regular file whose name ends
with a slash as a directory.
.IP LF_LINK
represents a file linked to another file, of any type,
previously archived.  Such files are identified (in Unix) by each file
having the same device and inode number.  The linked-to
name is specified in the \fIlinkname\fP field with a trailing null.
.IP LF_SYMLINK
represents a symbolic link to another file.  The linked-to
name is specified in the \fIlinkname\fP field with a trailing null.
.IP "LF_CHR or LF_BLK"
represent character special files and block
special files respectively.
In this case the \fIdevmajor\fP and \fIdevminor\fP
fields will contain the
major and minor device numbers respectively.  Operating
systems may map the device specifications to their own local
specification, or may ignore the entry.
.IP LF_DIR
specifies a directory or sub-directory.  The directory name
in the \fIname\fP field should end with a slash.
On systems where
disk allocation is performed on a directory basis the \fIsize\fP
field will contain the maximum number of bytes (which may be
rounded to the nearest disk block allocation unit) which the
directory may hold.  A \fIsize\fP field of zero indicates no such
limiting.  Systems which do not support limiting in this
manner should ignore the \fIsize\fP field.
.IP LF_FIFO
specifies a FIFO special file.  Note that the archiving of
a FIFO file archives the existence of this file and not its
contents.
.IP LF_CONTIG
specifies a contiguous file, which is the same as a normal
file except that, in operating systems which support it,
all its space is allocated contiguously on the disk.  Operating
systems which do not allow contiguous allocation should silently treat
this type as a normal file.
.IP "`A' \- `Z'"
are reserved for custom implementations.  None are used by this
version of the \fItar\fP program.
.IP \fIother\fP
values are reserved for specification in future revisions of the
P1003 standard, and should not be used by any \fItar\fP program.
.LP
The \fImagic\fP field indicates that this archive was output in the P1003
archive format.  If this field contains TMAGIC, then the
\fIuname\fP and \fIgname\fP
fields will contain the ASCII representation of the owner and group of the
file respectively.  If found, the user and group ID represented by these
names
will be used rather than the values contained
within the \fIuid\fP and \fIgid\fP fields.
User names longer than TUNMLEN-1 or group
names longer than TGNMLEN-1 characters will be truncated.
.SH "SEE ALSO"
tar(1), ar(5), cpio(5)
.SH BUGS
Names or link names longer than NAMSIZ-1 characters cannot be archived.

This format does not address multi-volume archives.
.SH NOTES
This manual page was adapted by John Gilmore
from Draft 6 of the P1003 specification
SHAR_EOF
#	End of shell archive
exit 0
-- 
David P. Zimmerman	"When I'm having fun, the world doesn't exist."

Arpa: dpz@rutgers.rutgers.edu
Uucp: ...{harvard | seismo | pyramid}!rutgers!dpz