[comp.sources.d] UUCPing entire file heirarchies.

jc@cdx39.UUCP (John Chambers) (12/24/86)

Well, so many people said "No, I don't know how, but I want the code when
you get it working" that I did it and I'm posting it.  This is a first
try, and has only been tested on some SYS/V machines, so it probably won't
quite work everywhere.

The problem is:  Copy a whole heirarchy of files from one machine to another,
using limited-capacity file-transfer utilities like UUCP.  Simple variants
could handle other file-transfer utilities like kermit or xmodem.  

The hard part is arranging for all the multiply-linked files to get there
correctly, multiply linked in the same way.  UUCP doesn't like to do this.
What I've done is written something that creates a lot of file lists (called
"list_*"), such that multiply-linked files are all in the same list, and
the total size of the files in each list is less than the environment
variable UUCPMAX.  The lists are used to generate cpio archives ("cpio_*),
which are then uucp'd to the destination.

Try it out, and tell me what's wrong with it.
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - 
: This is a shar archive. Extract with sh, not csh
echo file: Makefile
cat > Makefile << '\!Funky\!Stuff\!'
D=/usr/lib/uucp
L=ln

all:	filefacts filelists

install:$D/filefacts $D/filelists $D/uucptree $D/uucptree.sed
$D/filefacts:	filefacts;	$L filefacts $D
$D/filelists:	filelists;	$L filelists $D
$D/uucptree:	uucptree;	$L uucptree $D
$D/uucptree.sed:uucptree.sed;	$L uucptree.sed $D

filefacts:	filefacts.c;	cc -o filefacts filefacts.c
filelists:	filelists.c;	cc -o filelists filelists.c

S=Makefile README RUN_ME filefacts.c filelists.c uucptree uucptree.sed uucptree.1
uucptree.shar:	$S;	shar $S >uucptree.shar

clean:	;	rm filefacts filelists
\!Funky\!Stuff\!
echo file: README
cat > README << '\!Funky\!Stuff\!'

This directory contains the 'uucptree' script and associated
programs.  The purpose of this script is to generate a set of
reasonably small cpio archives, and uucp them somewhere.  They
may then be unpacked at the receiving end, to reconstruct a set
of file trees.  

The uucptree script is called as:
	uucptree directory... destination
where one or more directories (or files) may be listed, and
the destination is a uucp path to a directory.  The result 
will be a set of cpio archives at the destination which, 
when unpacked, will reconstruct the original directories
and all their contents.

There are two C programs used: filefacts and filelist.  The
first takes a list of files (generated by 'find' and produces
the same list with information about their device, inode, and
size.  This list is then sorted, resulting in multiply-linked
files ending up together.  The sorted list is fed to filelist,
which produces a series of lists: list_1, list_2, ....  Each
one is just big enough to total $UUCPMAX, an environment variable
that defaults to '1M', or 1 Megabyte.  The resulting lists are
fed to cpio, to produce the files: cpio_1, cpio_2, ....  Each
of these is then uucp'd to the specified destination.

Note that this script leaves behind a set of files in the 
current directory named "list_*" and "cpio_*".  You might
wish to delete them when you have verified that the uucps
have completed, since they will occupy a fair amount of
space.

The Makefile is set up so that you can just type:
	make install
and everything will be compiled and installed in a default
directory (/usr/lib/uucp).  You might want to examine the
Makefile and the *.c files first, to see if there's anything
you want to change for your system.  To test it, try typing
a command like:
	setenv UUCPMAX 50000
	sh -x uucptree p sh csh i somewhere!~someone > & audit & 
When this terminates, you should have a lot of "cpio_*" files
that are mostly around 50K bytes, and a lot of uucp copies in
the hopper for somewhere!~someone;  log into somewhere and see
if they all get there OK, then unpack them with cpio.  Go back
to the first system and type:
	rm list_* cpio_*

This code was developed and tested on a reasonably generic
Unix SYS/V system.  If you have problems with non-portable
code, you might send patches to the author:

	John M Chambers			Phone: 617/364-2000x7304
Email: ...{adelie,bu-cs,harvax,inmet,mcsbos,mit-eddie,mot[bos]}!cdx39!{jc,news,root,usenet,uucp}
Smail: Codex Corporation; Mailstop C1-30; 20 Cabot Blvd; Mansfield MA 02048-1193
Clever-Saying: If we can't fix it, it ain't broke.
\!Funky\!Stuff\!
echo file: RUN_ME
cat > RUN_ME << '\!Funky\!Stuff\!'

UL=/usr/lib/uucp
make all
make install
\!Funky\!Stuff\!
echo file: filefacts.c
cat > filefacts.c << '\!Funky\!Stuff\!'
/*	filefacts <filelist [-option]...
**
** The standard input should be a list of file names.  
** For each file, a line of output is produced in the form:
** HHHH DDDD IIII size filename
**
** This data is intended to be used with filelists(1), to
** chop a single list of files into a lot of little lists,
** each of which totals less than N bytes.
**
** BUGS: files which have disappeared are ignored, and
** their names are not written to the output file.   
**
** Directories are not treated specially; perhaps they should be.
*/
#include <stdio.h>
#include <sys/types.h>
#include <sys/stat.h>


#define D1 if(debug>=1)pmsg
#define D2 if(debug>=2)pmsg
#define D3 if(debug>=3)pmsg
#define D4 if(debug>=4)pmsg
#define D5 if(debug>=5)pmsg
#define D6 if(debug>=6)pmsg
#define D7 if(debug>=7)pmsg
#define D8 if(debug>=8)pmsg
#define D9 if(debug>=9)pmsg
#define E pmsg

#define NMAX 1000	/* Longest filename we can handle */

int    debug = 1;
extern errno;
char   nbuf[NMAX+1];	/* Place to build file names */
char  *na = nbuf;	/* Start of name buffer */
char  *np = nbuf;	/* Next char in name buffer */
char  *nz = nbuf+NMAX;	/* End of name buffer */
int    outf = -1;	/* File number of output file */
char  *progname = "?";	/* This program's name */
long   total = 0;	/* Number of blocks so far */

main(ac,av)
  int    ac;
  char **av;
{ int    a, args, c, i;
  char  *cp;

  progname = av[0];
  args = 0;
  for (a=1; a<ac; a++) {
    switch (c = av[a][0]) {
    case '-':			/* -option */
      D4("main:option \"%s\"",av[a]);
      switch (av[a][1]) {
      case 'v': case 'V':
      case 'd': case 'D':
	i = sscanf(av[a]+2,"%d",&debug);
	if (i < 1) debug = 2;
	break;
      default:
        E("Unknown option \"%s\" ignored.",av[a]);
      }
      break;
    default:	
        E("Extra arg \"%s\" ignored.",av[a]);
    }
  }
  np = na; 		/* Start of name buffer */
  c = ' ';
  while (c != EOF) {
    c = getchar();
    switch(c) {		/* What sort of char is it? */
    default:		/* Most are just part of filename */
      if (np < nz) {
	*np++ = c;
	*np   = 0;
	D9("main: name=\"%s\"",na);
      } else {
	fprintf(stderr,"Name too long: \"%s%c",na,c);
	while ((c = getchar()) != EOF && c != '\n')
	  putc(c,stderr);
	putc('\n',stderr);
	fflush(stderr);
	continue;
      }
      break;
    case EOF:			/* List of possible filename terminators */
    case ' ':
    case '\t':
    case '\n':
    case '\r':
    case '\0':
       if (np <= na) {
	 D3("Null name ignored.");
	 break;
       }
       *np = 0;
       D6("before onefile(\"%s\")",na);
       onefile(na);
       D6("after  onefile(\"%s\")",na);
       np = na;
    }
  }
  exit(0);		
}
help()
{
  fprintf(stderr,"Usage: %s <filelist\n",progname);
}
/* Given one file name, figure out whether it will fit onto the current
** dump tape.  If not, go on to the output file for the next tape.
*/
onefile(name)
  char *name;
{ struct stat status;
  long  size;
  int   i;
  int   dev, ino;

  D5("onefile(\"%s\")",name);
  if (stat(name,&status) < 0) {
    E(" Can't access \"%s\" [errno=%d]",name,errno);
    return 0;
  }
  size = status.st_size;
  dev  = status.st_dev ;
  ino  = status.st_ino ;
  printf("%5d %5d %8ld %s\n",dev,ino,size,name);
}
pmsg(fmt,x0,x1,x2,x3,x4,x5,x6,x7,x8,x9) char *fmt;
{
  fprintf(stdout,"%s:",progname);
  fprintf(stdout,fmt,x0,x1,x2,x3,x4,x5,x6,x7,x8,x9);
  fprintf(stdout,"\n");
  fflush( stdout);
}
\!Funky\!Stuff\!
echo file: filelists.c
cat > filelists.c << '\!Funky\!Stuff\!'
/*	filelists <filelist [prefix] [-limit] [-option]...
**
** The standard input should be a list of file names.  
** The list is divided up into sublists, each of less
** than 25Mbytes blocks total, and written to tape1, ...,
** to produce lists for a set of 1-tape dumps.  The
** return value is the number of tape* files written.
**
** To ensure dump tapes without initial '/' in the names,
** this program strips off any initial '/' it sees.
**
** BUGS: files which have disappeared are ignored, and
** their names are not written to the output file. 
*/
#include <stdio.h>
#include <sys/types.h>
#include <sys/stat.h>

#define FUDGEA 512	/* Extra space needed per archive by cpio */
#define FUDGEF 128	/* Extra space needed per file by cpio */
#define NMAX   128	/* Longest filename cpio can handle */

#define D1 if(debug>=1)pmsg
#define D2 if(debug>=2)pmsg
#define D3 if(debug>=3)pmsg
#define D4 if(debug>=4)pmsg
#define D5 if(debug>=5)pmsg
#define D6 if(debug>=6)pmsg
#define D7 if(debug>=7)pmsg
#define D8 if(debug>=8)pmsg
#define D9 if(debug>=9)pmsg
#define E pmsg

long  limit = 1000000;	/* Max chars defaults to 1Mbytes */
int   debug = 1;
int   dev = -1;		/* Device number of current file */
extern errno;		/* Unix error status */
char  fbuf[NMAX+1];	/* Place to build output file name */
int   files  = 0;	/* Count of files in the current list */
int   filnum = 0;	/* Current output file number */
extern char*getenv();	/* For extracting UUCPMAX from environment */
int   ino = -1;		/* Inode number of current file */
char  nbuf[NMAX+1];	/* Place to build file names */
char *na = nbuf;	/* Start of name buffer */
char *np = nbuf;	/* Next char in name buffer */
char *nz = nbuf+NMAX;	/* End of name buffer */
int   olddev = -1;	/* Device number of previous file */
int   oldino = -1;	/* Inode number of previous file */
int   outf = -1;	/* File number of output file */
char *prefix = "list_";	/* Output file names start with this */
char *progname = "?";	/* This program's name */
long  total = 0;	/* Number of blocks so far */

main(ac,av)
	int    ac;
	char **av;
{	int    a, args, c, i;
	char  *cp;
	long   siz;

	progname = av[0];
	if (cp = getenv("UUCPMAX")) {
		getlimit(cp);
		D1("UUCPMAX=%ld",limit);
	}
	args = 0;
	for (a=1; a<ac; a++) {
		switch (c = av[a][0]) {
		case '-':			/* -option */
			D4("main:option \"%s\"",av[a]);
			switch (av[a][1]) {
			case 'v': case 'V':
			case 'd': case 'D':
				i = sscanf(av[a]+2,"%d",&debug);
				if (i < 1) debug = 2;
				break;
			default:
				E("Unknown option \"%s\" ignored.",av[a]);
				break;
			}
			break;
		case '0':
		case '1': case '2': case '3':
		case '4': case '5': case '6':
		case '7': case '8': case '9':
			cp = av[a];
			getlimit(av[a]);
			break;
		default:			/* Arg without '-' or digit is prefix */
			switch(args++) {	/* We only want one of them */
			case 0:
				prefix = av[a];
				D3("main:prefix=\"%s\"",prefix);
				break;
			default:
				E("Extra arg \"%s\" ignored.",av[a]);
			}
		}
	}
	D2("limit=%ld prefix=\"%s\"",limit,prefix);
	D6("main:before newoutfile()");
	newoutfile();
	D6("main: after newoutfile()");
	np = na; 
	c = ' ';
	olddev = -1;
	oldino = -1;
	while ((i = scanf("%d %d %ld %s",&dev,&ino,&siz,nbuf)) > 0) {
		if (i == 4) {
			 D5("dev=%d ino=%5d size=%6ld '%s'",dev,ino,siz,nbuf);
			 D6("before onefile(%ld,\"%s\") total=%ld",siz,na,total);
			 onefile(siz,na);
			 D6("after  onefile(%ld,\"%s\") total=%ld",siz,na,total);
			 np = na;
		} else {
			E("Invalid line in input, only %d fields.",i);
		}
	}
	D1("onefile: Total=%ld > limit=%ld; finishing list %d.",total,limit,filnum);
	exit(filnum);		/* Return the number of tapes required */
}
help()
{
	fprintf(stderr,"Usage: %s <filelist [blocklimit [prefix]]\n",progname);
}
/* Close the current output file and start a new one.
*/
newoutfile()
{	int i;

	D5("newoutfile()");
	++filnum;
	files = 0;
	sprintf(fbuf,"%s%d\0",prefix,filnum);
	D2("New output file %d = \"%s\"",filnum,fbuf);
	D6("newoutfile:before close(%d)",outf);
	i = close(outf);
	D6("newoutfile: after close(%d)=%d\t[errno=%d]",outf,i,errno);
	D6("newoutfile:before creat(\"%s\",0%o)",fbuf,0666);
	outf = creat(fbuf,0666);
	D6("newoutfile: after creat(\"%s\",0%o)=%d\t[errno=%d]",fbuf,0666,outf,errno);
	total = FUDGEA;
	return outf;
}
/* Given one file name, figure out whether it will fit onto the current
** dump tape.  If not, go on to the output file for the next tape.  Note
** the 'files' variable, to get around a logical problem:  if a file is
** listed which is bigger than limit, we would produce an infinite number
** of empty lists.  If such a file occurs, it is allowed as the first
** name in a list.  The eventual result will be a 1-file cpio archive.
*/
onefile(siz,name)
	long  siz;
	char *name;
{	struct stat status;
	long  size;
	int   i;
	char  newfl;

	D5("onefile(\"%s\")",name);
	if (stat(name,&status) < 0) {		/* Paranoia: validate the size */
		E(" Can't access \"%s\" [errno=%d]",name,errno);
		return 0;
	}
	size = status.st_size;
	if (siz != size)
		E("Size changed from %ld to %ld for '%s'",siz,size,name);
	newfl = 1;
	if (dev == olddev &&  ino == oldino) {
		newfl = 0;
		D2("Link: dev=%4d ino=%5d '%s'",dev,ino,name);
	/*
	** There are versions of cpio that don't fully understand
	** multipy-linked files.  These versions will include many
	** copied of the linked file in the archive, although only
	** one copy is necessary.  If your cpio behaves this way
	** (which may be determined by making a toy archive from
	** two linked files and examining a dump of the result),
	** you should comment out the following line.  If your
	** cpio produces only the link names, use this command.
	*/
	/*	size = FUDGEF;		/* Treat links as special */
	}
	size += strlen(name) + FUDGEF;
	total += size;
	D2("size=%5ld total=%8ld limit=%8ld name='%s'",size,total,limit,name);
	if (newfl && total > limit && files > 0) {
		D1("onefile: Total=%ld > limit=%ld; finishing list %d.",total,limit,filnum);
		D6("onefile:before newoutfile()");
		i = newoutfile();
		D6("onefile: after newoutfile()=%d");
		D2("size=%5ld total=%8ld limit=%8ld name=%s",size,total,limit,name);
	}
	write(outf,name,strlen(name));
	write(outf,"\n",1);
	++files;			/* File counter to prevent infinite loops */
	olddev = dev;
	oldino = ino;
}
pmsg(fmt,x0,x1,x2,x3,x4,x5,x6,x7,x8,x9) char *fmt;
{
	fprintf(stdout,"%s:",progname);
	fprintf(stdout,fmt,x0,x1,x2,x3,x4,x5,x6,x7,x8,x9);
	fprintf(stdout,"\n");
	fflush( stdout);
}
getlimit(cp)
	char *cp;
{	int   c;

	limit = 0L;
	while (c = *cp++) {
		switch(c) {
		case '0':
		case '1': case '2': case '3':
		case '4': case '5': case '6':
		case '7': case '8': case '9':
			limit = (limit * 10) + (c - '0');
			break;
		case 'k': case 'K':
			limit *= 1000;
			break;
		case 'm': case 'M':
			limit *= 1000000;
			break;
		default:
			E("Invalid char '%c' in limit",c);
			help();
			break;
		case ',': case '_':
			break;
		}
	}
}
\!Funky\!Stuff\!
echo file: uucptree
cat > uucptree << '\!Funky\!Stuff\!'
:
#	uucp [dir]... dest
#
# This script generates a list of all the files in the
# named directories, runs them through some filtering
# programs to divide them up into lists, each of whose
# total size is less than a threshold, creates a set
# of cpio archives, and uucps them to the destination.
#
# When this script ends, the current directory will contain
# files called "list_*" and "cpio_*", which are the file
# lists and cpio archives; uucp commands will have been
# submitted to copy the archives to 'dest'.
#
# The 'uucptree.sed' script is used to edit the list;
# a default is kept in /usr/lib/uucp, but one in the
# current directory or $HOME will be used first.
#
# The destination must be a valid uucp path to a directory;
# all the cpio archives will be put into that directory,
# which must be writable by uucp.
#
CD=`pwd`
UL=/usr/lib/uucp
T=/tmp
if [ $UUCPMAX'.' = '.' ] ; then UUCPMAX=1M ; fi
echo "|-------UUCPMAX:" $UUCPMAX
#
# Generate set of files list_* that contain the names
# of all the files under the named directories, divided
# up so that the total sizes of the files in each list
# is less than $UUCPMAX [default = 1Mbyte].  The files
# will be sorted by device and inode number.
#
# The uucptree.sed script may be used to delete unwanted
# files, such as *.bak, core, etc.
if	[ -f $CD/uucptree.sed ] ;	then S=$CD/uucptree.sed
elif	[ -f $HOME/uucptree.sed ] ;	then S=$HOME/uucptree.sed
elif	[ -f /usr/local/uucptree.sed ] ;then S=/usr/local/uucptree.sed
elif	[ -f $UL/uucptree.sed ] ;	then S=$UL/uucptree.sed
else	S=/dev/null			# No sed script.
fi
echo "|-------------S:" $S
if [ $# -lt 2 ] 
then echo "Usage:" $0 "directory... destination"
fi
echo "|----------Args:" $*
echo "|-----------Env:" ;env
echo Creating the list...
create $T/$$_A
while [ $# -gt 1 ]
do	find  $1 -print | filefacts >> $T/$$_A
	shift
done
echo Editing the list...
sed   < $T/$$_A  -f $S \
	| sort \
	| uniq > $T/$$_D
rm	$T/$$_A
#
# Finally, we can chop the list up into bite-sized portions.
echo "Dividing into "$UUCPMAX"-byte lists..."
filelists <$T/$$_D -d2 $UUCPMAX
N=$?
echo "Note: " $N "file list(s)."
rm	$T/$$_D
#
# Given a set of file lists "list_*", this
# script packages each with 'cpio'.
#
echo Building cpio archives...
I=0
while [ $I -lt $N ]
do	I=`expr $I + 1`
	cpio -oa <list_$I >cpio_$I
done
#
# Send our cpio files to specified user on another system.
echo Sending cpio archives...
I=0
while [ $I -lt $N ]
do	I=`expr $I + 1`
	uucp cpio_$I $1
done
echo $0 done.
exit 0
\!Funky\!Stuff\!
echo file: uucptree.sed
cat > uucptree.sed << '\!Funky\!Stuff\!'
\|/core$|d
\|\.bak$|d
\|\.ckp$|d
\|-$|d
\|/dev/|d
\|\#|d
\|^/*|s///
\!Funky\!Stuff\!
echo file: uucptree.1
cat > uucptree.1 << '\!Funky\!Stuff\!'
.TH UUCPTREE 1
.VE 0
.SH NAME
uucptree \- copy directories via uucp
.SH SYNOPSIS
.B uucptree
.I dir... uucppath
.SH DESCRIPTION
.PP
.I Uucptree
copies the named directories 
with all their files and subdirectories to the 
.I uucppath ,
which should be a directory.
.PP
The file list is sorted and broken into sublists
such that multiply-linked files are in the same
sublist.
The sublists are then used to generate a set of
cpio archives,
each of which is sent to the destination with a
separate uucp command.
.FI
This script creates files list_1, list2, ...,
which contain the file names for each uucp.
Then it creates cpio_1, cpio_2, ...,
which are the uucp archives.
When the uucps have finished, you should remove these files.
.PP
The upper bound on the size of each cpio archive is limited
by the environment variable UUCPMAX, which defaults to 1 Mbyte.
.SH SEE ALSO
tar(1), cpio(1), cptree(1)
.BU
Multiply-linked files may result in a cpio archive that
is bigger than the UUCPMAX limit.
Furthermore, some versions of cpio put multiple copies
of linked files into the archives.
.PP
File ownership on the recieving end is a difficult question.
\!Funky\!Stuff\!
-- 
	John M Chambers			Phone: 617/364-2000x7304
Email: ...{adelie,bu-cs,harvax,inmet,mcsbos,mit-eddie,mot[bos]}!cdx39!{jc,news,root,usenet,uucp}
Smail: Codex Corporation; Mailstop C1-30; 20 Cabot Blvd; Mansfield MA 02048-1193
Clever-Saying: If we can't fix it, it ain't broke.

ksh@scampi.UUCP (Kent S. Harris) (01/07/87)

In article <532@cdx39.UUCP>, jc@cdx39.UUCP (John Chambers) writes:
> Well, so many people said "No, I don't know how, but I want the code when
> you get it working" that I did it and I'm posting it.  This is a first
> try, and has only been tested on some SYS/V machines, so it probably won't
> quite work everywhere.

what about (for example):
	tar cf ~uucp/foo ./*
	uucp ~uucp/foo target!~uucp

If you need to uuencode the file, fine.

At the other end you use tar again for the extract.