[net.sources] Unix pgm to read DECSYSTEM-20 Dumper/Archive Tapes

guyton@randvax.ARPA (Jim Guyton) (01/19/84)

This has been of some value to me, hope others can make
use of it.

-- Jim Guyton ( ...decvax!randvax!guyton)
----------------------------------------------------------
#
#  Sources to "read20", a utilitiy to read Dump20 format tapes
#  in under Unix (4.1bsd only version tested).
#
#  Run this as a C shell script to extract the three
#  files, read20.1, dump.h and read20.c
#
#               -- Jim Guyton           8/83
#
echo "Read20 sources and manual page"
#
echo "Making read20.1"
cat > read20.1 << //EOF
.TH READ20 1 RAND
.SH NAME
read20 \- read a tape produced by the TOPS20 Dumper Program
.SH SYNOPSIS
.B read20
\%[\-f tapefile] \%[\-t] \%[-n number] \fIstring\fR
.SH DESCRIPTION
.I Read20
reads tapes produced by the TOPS20 backup and archival program
\fIDumper20\fR, producing directory listings and
optionally extracting files.

If no \fIstring\fR is specified, just a directory of the tape
is produced.  Otherwise, every file that contains \fIstring\fR
in its filename
will be extracted to the current working directory.
The UNIX filenames of extracted files are generated from the
filename on the tape by stripping off the device name, directory
names and the version number.

It should be noticed that \fIstring\fR is
.B not
a generalized
pattern, just a string that is matched against all the characters
in the filenames.  Special characters (such as '\<' and '\>') must
be quoted to get past the shell.

The directory listing prints out for every file
.IP
the size in pages of the file (on the twenty)
.br
the number of bytes in the file
.br
the \fIbytesize\fR (or number of bits per byte) of the file
.br
the time and date the file was last modified
.br
the full pathname of the file.

.LP
This program currently only extracts text files.  It decides
whether or not a file is text by examining the
\fIbytesize\fR of the file.  If it has 7-bit bytes, it is assumed to
be a text file.  If it is anything else, it is assumed to be
a binary file and requests to extract it are ignored.

Occasionally, text files have a \fIbytesize\fR 36 instead of the proper
\fIbytesize\fR 7.
The \-t flag forces these files to be considered as
text files and extracted (but only if their names contain the \fIstring\fR).

Alternate tape devices may be specified with the \-f flag.

Occasionally, the first 14 characters of extracted files can be
the same.  If this is the case, use the \-n flag.  This will cause
the Unix filenames to be numeric (starting with the number following
the -n).  The mapping between the number Unix filenames and the
original Tops-20 names is appended to the file "Logfile".

.SH DEFAULTS
/dev/rmt0 is the default tapefile.
.SH AUTHOR
Jim Guyton, The Rand Corporation
.SH BUGS
The
.I string
must be upper case to match the filename on the tape.

There is no option yet for discarding the carriage-return
from the newline sequence.  This can be done with most
editors or with the command
.IP
tr \-d '\\\\015' < file > newfile
.LP

TOPS20 allows longer filenames than the current version of UNIX.
Very long filenames are truncated as they are extracted, which
opens up the possibility of multiple tape files being
written to the same UNIX file.  There is currently no checking
done to prevent this.

Files which span tape boundaries are handled poorly.  To extract
such a file, extract each piece and then combine the files under
UNIX.

Files which have been
.I archived
can show up in directory listings with a
.I bytesize
of zero.

When extracting files, you still get a full directory listing;
this has led some folks to believe they were extracting the
entire tape.
//EOF
#
echo "Making dump.h"
cat > dump.h << //EOF
#ifdef COMMENT

	F O R M A T   O F   D U M P E R   T A P E S
	===========================================


EACH PHYSICAL RECORD WRITTEN BY DUMPER CONTAINS ONE OR MORE
LOGICAL RECORDS, EACH OF WHICH IS 518 (1006 OCTAL) WORDS LONG.

EACH LOGICAL RECORD HAS THE FOLLOWING FORMAT:

	!=======================================================!
CHKSUM  !          CHECKSUM OF ENTIRE 518-WORD RECORD           !  +0
	!-------------------------------------------------------!
ACCESS  !         PAGE ACCESS BITS (CURRENTLY NOT USED)         !  +1
	!-------------------------------------------------------!
TAPNO   !SCD!    SAVESET NUMBER     !        TAPE NUMBER        !  +2
	!-------------------------------------------------------!
PAGNO   !F1!F2!    FILE # IN SET    !      PAGE # IN FILE       !  +3
	!-------------------------------------------------------!
TYP     !              RECORD TYPE CODE (NEGATED)               !  +4
	!-------------------------------------------------------!
SEQ     !        RECORD SEQUENCE NUMBER (INCREASES BY 1)        !  +5
	!=======================================================!
	!                                                       !
	!         CONTENTS OF FILE PAGE IF DATA RECORD          !
	!        OTHER TYPES HAVE OTHER INFORMATION HERE        !
	!                                                       !
	!=======================================================!


TYPE	VALUE	MEANING
----	-----	-------
DATA	  0	CONTENTS OF FILE PAGE
TPHD	  1	NON-CONTINUED SAVESET HEADER
FLHD	  2	FILE HEADER (CONTAINS FILESPEC, FDB)
FLTR	  3	FILE TRAILER
TPTR	  4	TAPE TRAILER (OCCURS ONLY AFTER LAST SAVESET)
USR	  5	USER DIRECTORY INFORMATION
CTPH	  6	CONTINUED SAVESET HEADER
FILL	  7	NO MEANING, USED FOR PADDING


SCD (3 BITS) - 0=NORMAL SAVE, 1=COLLECTION, 2=ARCHIVE, 3=MIGRATION

F1 F2	MEANING
-- --	-------
 0  0	OLD-FORMAT TAPE (NO FILE # IN PAGNO BITS 2-17)
 1  1	OLD-FORMAT TAPE, CONTINUED FILE
 0  1	NEW-FORMAT TAPE (FILE # IN PAGNO BITS 2-17)
 1  0	NEW-FORMAT TAPE, CONTINUED FILE

A DUMPER TAPE IS A COLLECTION OF RECORDS ORGANIZED IN THE
FOLLOWING FASHION:


!=======================================================!
!            HEADER FOR FIRST SAVESET (TPHD)            !
!-------------------------------------------------------!
!          USER INFO (USR) OR FILE (SEE BELOW)          !
!-------------------------------------------------------!
!                   USER INFO OR FILE                   !
!-------------------------------------------------------!
!                           .                           !
!                           .                           !
!                           .                           !
!=======================================================!
!            HEADER FOR SECOND SAVESET (TPHD)           !
!-------------------------------------------------------!
!          USER INFO (USR) OR FILE (SEE BELOW)          !
!-------------------------------------------------------!
!                   USER INFO OR FILE                   !
!-------------------------------------------------------!
!                           .                           !
!                           .                           !
!                           .                           !
!=======================================================!
!                                                       !
!                  SUBSEQUENT SAVESETS                  !
!                                                       !
!=======================================================!
!                                                       !
!                     LAST SAVESET                      !
!                                                       !
!=======================================================!
!                  TAPE TRAILER (TPTR)                  !
!=======================================================!


NOTES:

1.  ON LABELED TAPES, THE TPTR RECORD APPEARS ONLY IF
    THE SAVESET IS CONTINUED ON ANOTHER TAPE.

2.  SOLITARY TAPE MARKS (EOF'S) ARE IGNORED ON INPUT.
    TWO CONSECUTIVE TAPE MARKS ARE INTERPRETED AS TPTR.

3.  ON LABELED TAPES, EACH SAVESET OCCUPIES EXACTLY ONE FILE.

4.  THE FIRST RECORD OF A CONTINUED SAVESET IS CTPH
    INSTEAD OF TPHD.

A DISK FILE SAVED ON A DUMPER TAPE ALWAYS HAS THIS
SEQUENCE OF RECORDS:

!=======================================================!
!                  FILE HEADER (FLHD)                   !
!-------------------------------------------------------!
!          DATA RECORD: 1 PAGE OF FILE (DATA)           !
!-------------------------------------------------------!
!          DATA RECORD: 1 PAGE OF FILE (DATA)           !
!-------------------------------------------------------!
!                           .                           !
!                           .                           !
!                           .                           !
!-------------------------------------------------------!
!                  FILE TRAILER (FLTR)                  !
!=======================================================!

#endif


				/* 5 bytes per 36-bit word */
				/* 518 word logical blocks */
#define TAPEBLK 518*5

				/* Checksum is first word */
#define WdoffChecksum      0
#define BtoffChecksum      0
#define BtlenChecksum     36
				/* Page access bits is second word */
#define WdoffAccess        1
#define BtoffAccess        0
#define BtlenAccess       36
				/* SCD, first 3 bits in next word */
#define WdoffSCD           2
#define BtoffSCD           0
#define BtlenSCD           3
				/* Number of saveset on tape */
#define WdoffSaveSetNum    2
#define BtoffSaveSetNum    3
#define BtlenSaveSetNum   15
				/* Tape number of dump */
#define WdoffTapeNum       2
#define BtoffTapeNum      18
#define BtlenTapeNum      18
				/* F1, F2 Flag bits */
#define WdoffF1F2          3
#define BtoffF1F2          0
#define BtlenF1F2          2
				/* File Number in Set (new format only) */
#define WdoffFileNum       3
#define BtoffFileNum       2
#define BtlenFileNum      16
				/* Page Number in file */
#define WdoffPageNum       3
#define BtoffPageNum      18
#define BtlenPageNum      18
				/* Record type (2's complement) */
#define WdoffRectype       4
#define BtoffRectype       0
#define BtlenRectype      36
				/* Record sequence number */
#define WdoffRecseq        5
#define BtoffRecseq        0
#define BtlenRecseq       36


				/* SCD Values */
#define SCDNormal       0
#define SCDCollection   1
#define SCDArchive      2
#define SCDMigration    3

				/* F1, F2 Values */
#define F1F2Old            0
#define F1F2OldContinue    3
#define F1F2New            1
#define F1F2NewContinue    2

				/* Record type values */
#define RectypeData     0
#define RectypeTphd     1
#define RectypeFlhd     2
#define RectypeFltr     3
#define RectypeTptr     4
#define RectypeUsr      5
#define RectypeCtph     6
#define RectypeFill     7

#define WdoffSSDate        8            /* Saveset date offset (type 1, 6) */
#define WdoffSSName        9            /* Saveset name offset (type 1, 6) */
#define WdoffFLName        6            /* Filename offset (type 2) */
#define WdoffFDB         134            /* FDB offset (type 2) */

					/* Number of bits per byte */
#define WdoffFDB_BSZ     011+WdoffFDB
#define BtoffFDB_BSZ       6
#define BtlenFDB_BSZ       6

					/* Number of pages in the file */
#define WdoffFDB_PGC     011+WdoffFDB
#define BtoffFDB_PGC      18
#define BtlenFDB_PGC      18

					/* Number of bytes in the file */
#define WdoffFDB_Size    012+WdoffFDB
#define BtoffFDB_Size      0
#define BtlenFDB_Size     36

					/* Date of last write to file */
#define WdoffFDB_Wrt     014+WdoffFDB
//EOF
#
echo "Making read20.c"
cat > read20.c << //EOF
/*
 *  Program to try and read Tops-20 Dumper format tapes
 *
 *                   Jim Guyton,  Rand Corp.
 *                   Version 1 (10/20/82)
 *                   jdg:   -n added 6/11/83
 */

#include <stdio.h>
#include <ctype.h>
#include "dump.h"
				/* logfile should be changable */
#define LOGFILE "Logfile"

char *ctime(), *index(), *rindex();
char *unixname();

int  fdTape;                    /* File handle for Dumper-20 format tape */
char tapeblock[TAPEBLK];        /* One logical record from tape */
FILE *fpFile;                   /* Output file handle on extracts */
int debug = 0;
int textflg = 0;                /* Non-zero if retr binary files as text */
int numflg = 0;                 /* Non-zero if using numeric filenames */
int number;                     /* Current output file "number" */

#define TAPE "/dev/rmt0"        /* Default input tape */

int  bytesize;          /* Number of bits/byte in current file */
int  numbytes;          /* Number of bytes in current file */
int  pgcount;           /* Number of twenex pages in file */

char *pattern = 0;      /* Filename match pattern */

/*
	pgm  [-f tapefile] [-t] [-n number] pattern

	no pattern == directory only
	no tapefile == /dev/rmt0
	-t == try to pretend files are 7-bit ascii
	-n == use numeric filenames in extracts, number is 1st name
*/

main(argc, argv)
int argc;
char *argv[];
{
	char *tape = TAPE;              /* Pathname for tape device/file */
	int rc;

	/* Do switch parsing */

	while(argc>1 && argv[1][0] == '-'){
		switch(argv[1][1]){
		case 'f':
			if (argc <= 2) punt("Need filename after -f\n");
			tape = argv[2];
			argc--; argv++;
			break;
		case 't':             /* Force text mode on "binary" files */
			textflg = 1;
			break;
		case 'd':
			debug = atoi(&argv[1][2]);
			printf("Debug value set to %d\n", debug);
			break;
		case 'n':               /* numeric output filenames */
			if (argc <= 2) punt("Need number after -n\n");
			number = atoi(argv[2]);         /* First file name */
			numflg = 1;
			argc--; argv++;
			break;
		default:
			printf("unknown flag %s\n", argv[1]);
			exit(1);
			break;
		}
		argc--;  argv++;
	}


	if (argc > 1)
		pattern = argv[1];

	fdTape = open(tape, 0);         /* Open tape for read */
	if (fdTape == -1)
		punt("Couldn't open 'tape' file %s\n", tape);

	for ( ; ; )             /* Loop till end of tape */
	{
					 /*** Read a block ***/
		rc = read(fdTape, tapeblock, TAPEBLK);
		if (rc != TAPEBLK)
		{       if (rc != 0)
			   punt("Oops.  Read block len=%d\n", rc);

			printf("\nEnd of tape.\n");
			exit(0);        /* Normal exit */
		}

					/*** Do something with it ***/
		switch(getrecordtype(tapeblock))
		{
		  case RectypeData:             /* Data block */
					doDatablock(tapeblock);
					break;

		  case RectypeTphd:             /* Saveset header */
					doSaveset(tapeblock, 0);
					break;

		  case RectypeFlhd:             /* File header */
					doFileHeader(tapeblock);
					break;

		  case RectypeFltr:             /* File trailer */
					doFileTrailer(tapeblock);
					break;

		  case RectypeTptr:             /* Tape trailer */
					doTapeTrailer(tapeblock);
					break;

		  case RectypeUsr:              /* User directory info ? */
					printf("User info record skipped\n");
					break;

		  case RectypeCtph:             /* Continued saveset hdr */
					doSaveset(tapeblock, 1);
					break;

		  case RectypeFill:             /* Fill record */
					printf("Fill record skipped\n");
					break;

		  default:
					punt("Unknown record type 0x%x\n",
						  getrecordtype(tapeblock));
					break;
		}
	}
}

/* Get the "record type" from the tape block header.  Since it */
/* is stored in 2's complement form, negate it before returning */

getrecordtype(block)
char *block;
{
	long int tl;
	tl = getfield(block, WdoffRectype, BtoffRectype, BtlenRectype);
	return( (int) -tl);
}

int   masks[32] =       /* bitmasks for different length fields */
{              0x00000001, 0x00000003, 0x00000007,
   0x0000000f, 0x0000001f, 0x0000003f, 0x0000007f,
   0x000000ff, 0x000001ff, 0x000003ff, 0x000007ff,
   0x00000fff, 0x00001fff, 0x00003fff, 0x00007fff,
   0x0000ffff, 0x0001ffff, 0x0003ffff, 0x0007ffff,
   0x000fffff, 0x001fffff, 0x003fffff, 0x007fffff,
   0x00ffffff, 0x01ffffff, 0x03ffffff, 0x07ffffff,
   0x0fffffff, 0x1fffffff, 0x3fffffff, 0x7fffffff,
   0xffffffff
};


long
getfield(block, wordoff, bitoff, bitlen)
char *block;            /* Tape block record */
int wordoff;            /* 36-bit word offset */
int bitoff;             /* Bit offset of field (from msb) */
int bitlen;             /* Bit length of field */
{
	char *p;                /* Used to point into record */
	long int w32;           /* First 32 bits of the 36 bit word */
	int   w4;               /* Last 4 bits of the 36 bit word */
	long  w = 0;            /* the word to return */


				/* First, the "illegal" kludge */
	if (bitoff == 0 && bitlen == 36)
	{       bitoff = 4;
		bitlen = 32;

	}
	if (bitlen > 32) punt("I can't get that large a field!\n");

	/* A PDP-10 (or 20) 36-bit word is laid out with the first 32 bits
	   as the first 4 bytes and the last 4 bits are the low order 4 bits
	   of the 5th byte.   The high 4 bits of that byte should be zero */

	p = block + (5*wordoff);        /* Get ptr to word of interest */
	w32 = *p++ & 0377;                      /* First byte */
	w32 = (w32 << 8) | (*p++ & 0377);       /* 2nd */
	w32 = (w32 << 8) | (*p++ & 0377);       /* 3rd */
	w32 = (w32 << 8) | (*p++ & 0377);       /* 4th */
	w4  = *p;                               /* 5th */
	if (w4 > 017) punt("Not a PDP-10 tape!  w4=%o\n", w4);


	/* Get the field right justified in the word "w".
	   There are three cases that I have to handle:
	      [1] field is contained in w32
	      [2] field crosses w32 and w4
	      [3] field is contained in w4
	*/

	if (bitoff+bitlen <= 32)        /* [1] field is contained in w32 */
	{
		w = w32 >> (32 - (bitoff+bitlen));
	}
	else if (bitoff <= 32)          /* [2] field crosses boundary */
	{
		w =  (w32 << (bitoff+bitlen-32))
		   | (w4  >> (36 - (bitoff+bitlen)));
	}
	else                            /* [3] field is contained in w4 */
	{
		w = w4 >> (36 - (bitoff+bitlen));
	}
	w = w & masks[bitlen-1];          /* Trim to proper size */
	return(w);
}


doDatablock(block)
char *block;
{
	static char buf[(512*5)+1];         /* A page of characters */
	int ct;
	if (debug > 10) printf("*");
	if (fpFile == NULL) return;
					    /* 7 bit ascii only for now */
	if (numbytes > 512*5) ct = 512*5;
	else ct = numbytes;

	getstring(block, buf, 6, ct);
	buf[ct] = 0;
	fprintf(fpFile, "%s", buf);
	numbytes -= ct;
}

doSaveset(block, contflag)
char *block;
int  contflag;
{
	static char name[100];
	long t;

	if (debug > 10) printf("\nSaveset header:");
	getstring(block, name, WdoffSSName, sizeof(name));

	t = unixtime(block, WdoffSSDate);
	printf("Saveset '%s', %s\n", name, ctime(&t));

}

doFileHeader(block)
char *block;
{
	static char name[100];
	long t;                 /* The time in unix format */
	char *ts;

	if (debug > 5) printf("File Header block:\n");

	getstring(block, name, WdoffFLName, sizeof(name));
	ts = index(name, ';');          /* Chop off ;Pprotection;Aacct */
	*ts = 0;

	t = unixtime(block, WdoffFDB_Wrt);
	ts = ctime(&t) + 4;             /* Skip over day-name field */
	ts[strlen(ts)-1] = 0;             /* Chop off \n at end */

	bytesize = getfield(block, WdoffFDB_BSZ, BtoffFDB_BSZ, BtlenFDB_BSZ);
	numbytes = getfield(block, WdoffFDB_Size, BtoffFDB_Size, BtlenFDB_Size);
	pgcount  = getfield(block, WdoffFDB_PGC, BtoffFDB_PGC, BtlenFDB_PGC);

	printf("%6d %11d(%2d) %s %s\n",
		pgcount, numbytes, bytesize, ts, name);

	if (pattern && match(name, pattern))
	{
					      /* Special hack for bad files */
		if (bytesize != 7 && textflg)
		{
			if (bytesize == 0 || bytesize == 36)     /* Sigh */
			{       bytesize = 7;
				numbytes = numbytes * 5;
			}
		}
		if (bytesize != 7)
			fprintf(stderr, "Skipping -- binary file.\n");
		else
		{
		    fpFile = fopen(unixname(name), "w");
		    if (fpFile == NULL)
			    punt("Can't open %s for write!\n", unixname(name));
		    printf("Extracting\n");
		}
	}
	else
		fpFile = NULL;
}

doFileTrailer(block)
char *block;
{
	if (debug > 10) printf(" File trailer\n");
	if (fpFile != NULL)
	{
		fclose(fpFile);
		fpFile = NULL;
	}
}

doTapeTrailer(block)
char *block;
{
	if (debug > 10) printf("Tape Trailer");
}

punt(s, arg)
char *s;
int  arg;
{
	fprintf(stderr, s, arg);
	exit(1);
}


getstring(block, s, wordoff, max)
char *block;            /* Tape block */
char *s;                /* Destination string buffer */
int  wordoff;           /* 36-bit offset from start of tape block */
int  max;               /* Max number of characters to xfer into s */
{
	register int i;         /* Counter for five characters per word */
	int ct = 0;             /* Number of characters loaded so far */
	char *orig = s;         /* Save for debugging */

	while (ct < max)
	{
		for (i = 0; i < 5; i++)
		{
			*s = getfield(block, wordoff, i*7, 7);
			if (*s == 0) return;
			s++;
		}
		wordoff++;
		ct += 5;
	}
   /**     punt("String greater than %d characters.\n", max);   **/
}

#define SecPerTick  (24.*60.*60.)/0777777
#define DayBaseDelta 0117213            /* Unix day 0 in Tenex format */

long
unixtime(block, wordoff)
char *block;
int  wordoff;
{
	long int t, s;

	t = getfield(block, wordoff, 0, 18);    /* First half is day */
	t -= DayBaseDelta;                      /* Switch to unix base */
						/* Now has # days since */
						/* Jan 1, 1970 */

	s = getfield(block, wordoff, 18, 18);   /* 2nd half is fraction day */
	s = s * SecPerTick;                     /* Turn into seconds */

	s += t*24*60*60;                        /* Add day base */
	return(s);
}


/* See if pattern is in name (very simple) */
match(name, pattern)
char *name, *pattern;
{
	int  plen = strlen(pattern);

	while ((name=index(name, *pattern)))
	{
		if (strncmp(name, pattern, plen)==0) return(1);
		name++;
		if (*name == 0) return(0);         /* May not need */
	}
	return(0);
}

char *
unixname(name)
char *name;
{
	static char newname[200];
	static FILE *log = NULL;
	char *t;

	if (numflg)             /* If numeric filenames */
	{
		if (log == NULL) log = fopen(LOGFILE, "a");
		fprintf(log, "%d is %s\n", number, name);
		sprintf(newname, "%d", number++);
		return(newname);
	}

	name = index(name, '<');        /* Trim off device */
	t = rindex(name, '>');          /* find end of directory */

	/* eventually make subdirectories */
	/* eventually optionally lowify filename */

	strcpy(newname, ++t);   /* Skip over the > */
	t = rindex(newname, '.');       /* find last . */
	*t = 0;                         /* zap it out */
	return(newname);
}
//EOF
#
echo "All done."