[comp.sys.att] cpiosize

jdc@naucse.UUCP (John Campbell) (12/20/88)

A while back I decided I wanted more control over backup on my 3b1.  In 
particular I wanted to avoid backing up certain directories and to break
the whole task into a reasonable number of groups.  I found a major
obstacle was predicting the size of a given cpio save set.  

Out of this need I wrote ``cpiosize'', which reports the number of
floppies a given cpio command would write.  As far as I know the calculations
are exact--in other words the value that cpio -o >/dev/null would return
(in blocks) is the same that cpiosize returns.  Cpiosize, however, is 9 - 10
times as fast and reports the number of floppies required.  Options exist to 
report each file, the total number of bytes, and the time to write the 
floppies (as well as the default number of blocks and number of floppies).

There are no header files, just a man page and a 'C' program (cpiosize.1
and cpiosize.c):

------------------------cut here for cpiosize.1----------------------
.deTH
.PD
.nrIN \\n()Mu
.ift .ds ]H \\$1\^(\^\\$2\^)
.ifn .ds ]H \\$1(\\$2)
.if\\n()s .ds ]D
.ds]L
.if!\\$3 .ds ]L (\^\\$3\^)
.if!\\$4 .ds ]D \\$4
.wh0 }H
.wh-\\n(:mu }F
.em}M
.if\\n(nl .bp
.nr)I \\n()Mu
.nr)R 0
.}E
.DT
.ifn \{.na
.nh\}
.ift \{.bd S 3 3
.hy14 \}
..
.TH CPIOSIZE 1
.SH NAME
cpiosize \- determine size (in floppies) prior to using cpio
.SH SYNOPSIS
.B cpiosize
.B 
[
.B \-cBtv
]
.B
[
.B \-n
label
]
.B
[
.B \-s
size
]
.SH DESCRIPTION
.B Cpiosize
reads the standard input to obtain a list of path names
and determines what size (in blocks and number of floppies) the
resulting cpio -o archive would be.  The same effect could be
accomplished with cpio -o >/dev/null, but cpiosize is 9 to 10 times
as fast.
.PP
The meanings of the available options are:
.PP
.PD 0
.TP
.B c
Cpio, when given the c option, will write
.I header\^
information in
.SM ASCII
character form for portability.  This changes the size of the header line
for each file to be stored in the archive.  Cpiosize will take this into
account if given the c option.

.PP
.TP
.B B
When cpio is given the B option, the archive will be blocked with 5120
byte records.  This corresponds to 1 track on some floppies and is faster
than writing a block at a time.  Cpiosize will take into account the
different time factor (about 18% faster) and will report the same total
as cpio (rounded to 10 blocks).

.PP
.TP
.B t
Print out each file as it is checked.  If -v is also active then the
size will be printed, except for special files where <d>-directory,
<c>-character, <b>-block, <f>-fifo or <?>-unknown will be printed.

.PP
.TP
.B v
Verbose, report number of bytes, number of files, and approximate time
for the data to be written (assuming that floppies can be changed 
instantaneously).  If -t is active then the size of each file is printed.

.PP
.TP
.B n
.I label\^
Although cpiosize is fairly fast, it is often convenient to run it in the
background.  If one cpiosize can run then so can another--with no idea 
which will finish first.  Adding -n foo will result in the output line
starting with ``cpiosize(foo):'', allowing you to tell which job completing.

.PP
.TP
.B s
.I size\^
The number of floppies it takes to store a cpio archive depends, in part,
upon the size of the floppy disk.  The default size is 830 blocks, if you wish
to store data on a different size floppy you can override the default size
with this option.

.PD
.SH EXAMPLES
The first example determines how many floppies will be required to store
the files in /lib using block mode:
.PP
.RS
find /lib \-depth \-print \|\(bv \|cpiosize \|\-B
.PP
 cpiosize(1): 1720 blocks ( 2.18, or 3 floppies)
.RE
.PP
the second shows how many floppies and how long it would take to backup 
the files in the current directory.
.PP
.RS
ls \|.\| \|\(bv \|cpio \|\-v
.PP
cpiosize(1): bytes = 1040657, nfiles = 36
.br
cpiosize(1): Estimated time to write 0:02:57 (except changing floppies)
.PP
 cpiosize(1): 2033 blocks ( 2.57, or 3 floppies)
.RE
.PP
.PP
.SH SEE ALSO
cpio(1), find(1), ls(1).
.br
cpio(4) in the
\f2\s-1UNIX\s+1 System Programmer Reference Manual\fR.
.PP
.SH DIAGNOSTICS
Files that can't stat, due to protection violation or because they are missing,
are reported to stderr.
.PP
.SH BUGS
Path names are restricted to 512 characters.  
Files may disappear or change size from the time cpiosize is run and
the actual cpio job is started.  Time reporting is based on the 830 block
(415K) floppy,
and scaled to match other sizes.  Cpiosize, which only uses stat, may
succeed on files that cpio will not copy. 

.\"	@(#)cpiosize.1	1.0 of 11/19/88
------------------------cut here for cpiosize.c----------------------
#include <stdio.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <errno.h>

extern errno, sys_nerr;
extern char *sys_errlist[];

/* externs needed for getopt */
extern char *optarg;
extern int optind;

#define DEFAULT_FLOPPY_SIZE 830  /* 2 heads, 42 cylinders, 10 sectors */
/*
   These times were done using cpio -o >/dev/rfp021 and
   cpio -oB >/dev/rfp021 on a file that was 830 blocks (just fits on
   one floppy).  (average for N = 4, standard error < 0.7)  
*/
#define TIME_FOR_NORMAL  72.28   /* Avg seconds for 1 floppy without -B */
#define TIME_FOR_BLOCKED 58.47   /* Avg seconds for 1 floppy with -B */

/*
   Author: John Campbell
           ...!arizona!naucse!jdc
           (602) 523-6259 (work)
*/

main(argc, argv)
int argc;
char *argv[];
/*-
   Routine to tell you the size (in cpio disk blocks) and number of
   floppies it will take to back up a set of files.  If arguments are
   given then each is interpreted as a file (possibly a directory) and
   is recursively descended to find all files to back up.  If no arguments
   are given then the file names are read from standard input.

   Options:  -c : consider that cpio will use the character type header
                  (cpio -c).  Default is the binary header.
             -B : consider cpio will use 5120 byte blocking.  In this
                  case the blocks reported/used are multiples of 10.
          -snnn : size of each floppy in K, default is 395K.
          -nlabel: label to associate with this run.
          -v    : verbose--tell number files, bytes, estimated time.
-*/
/*
   Notes:  cpio has a header constructed out of a fixed size + length
           of the name (including room for '\0').  The data for a given 
	   file follows this header and the whole cpio archive is terminated 
	   with a 38 byte trailer record.  The end of the archive is then 
	   padded out to the block size (either 512 or 5120).

	   If cpio -oc is used then the header fixed size is 76 bytes +
	   the length+1, otherwise the fixed size constant is 26 bytes +
 	   the length+1 rounded up to an even byte boundary.

           This utility defaults to 3b1 floppies formatted for 830 blocks
           (iv -tv DEFAULT_FLOPPY_SIZE):
           
        Floppy disk
        Volume Name:   420k
        42 Cylinders. 2 Heads per Cylinder.
        There are 10 Physical Sectors (of 512 bytes) per Track.
           20 Physical Sectors per Cylinder, 840 Physical Sectors per Disk.
        There are 5 Logical Blocks (of 1024 bytes) per Track.
           10 Logical Blocks per Cylinder, 420 Logical Blocks per Disk.
        The Floppy is Single density
        The Step Rate supplied to the Controller is 0.
        Partition 0: start Track=0, size (in Blocks)=5
        Partition 1: start Track=1, size (in Blocks)=415

           It seems that cpio can not write to Partition 0, hence 415 logical
           blocks (8300 disk blocks) is the maximum archive that can be stored
	   on a floppy formatted as above.  Cpio reports disk blocks (physical
           sectors) when it terminates so the units used in here are in disk
           blocks.  Use -s to change cpiosize's idea of how much can be 
	   stored on a floppy.

   	   If the command cpio -oB is used then the writes are performed in
           5120 byte chunks (1 physical track).  Experiments indicate that
           this is approx. 18% faster than writing 512 bytes at a time.  
	   If -B is used then cpiosize will report the number of blocks
           rounded to the nearest 10 blocks (as cpio itself does).

	   cpio -v will also report the total number of bytes the archive
	   will have, the number of files in the archive, and an estimate
           of the amount of time it will take to write the data to the
	   floppy (not counting time to change floppies).  The time estimate
	   takes into account the -c and -B flags.
*/
{
#define STRSIZE 512
   static char version[]="cpiosize ver 1.0, 11/19/88 jdc";

/* Fifo, character, directory, and block special files. */
   static char *file_type[] = {"?","f", "c", "?", "d", "?", "b", "?"};

   unsigned long bytes;   /* Can account for at least 2048 Mb drive */
   int fblocks, floppies, slen;
   int c, tmp, nfiles, verbose;
   int cflag, Bflag, tflag;
   int hfixed, fbsize, filesize;
   char name[STRSIZE], label[STRSIZE];
   struct stat sbuf;

/* Initialize defaults. */
   label[0] = '1';
   label[1] = '\0';
   fbsize   = DEFAULT_FLOPPY_SIZE;  
   tflag = Bflag = cflag = verbose = nfiles = bytes = 0;
   hfixed   = 26;       /* Structure size without name in cpio header */

/* Get command options. */
   while ((c = getopt (argc, argv, "cvBtn:s:")) != EOF) {
      switch (c) {
      case 'c':         /* Portable 'c' type cpio header. */
         cflag = 1; 
         hfixed = 76;
      break;
      case 'v':         /* Print bytes, files, and times. */
         verbose = 1; 
      break;
      case 'B':         /* Block I/O (cpio -oB to be used). */
	 Bflag = 1;      
      break;
      case 't':         /* Tell all the files as they are processed. */
         tflag = 1;
      break;
      case 'n':         /* Label, useful when run in background. */
         label[0] = '\0';  /* Allow multiple, take last. */
         strncat (label, optarg, STRSIZE);
      break;
      case 's': 
         if ((fbsize = atoi (optarg)) <= 0) usage();
      break;
      case '?': usage();
      }
   }
   if (argc != optind) usage();

/* Get the names from stdin. */
   while (fgets(name,sizeof name, stdin) != NULL) {
      slen = strlen(name)-1;  
      name[slen] = '\0';      /* Get rid of \n */
      if (stat(name, &sbuf) != 0) {
         fprintf(stderr, "%s:: %s\n", name, sys_errlist[errno]);
         continue;
      }
   /* 
      Cpio headers (binary) have 26 bytes and then the length of the
      file + the null byte rounded up to an even number if odd.
      Portable headers are 76 bytes + strlen + 1.
   */
      if (!cflag)
         tmp = ((slen+2)>>1)<<1;
      else
	 tmp = slen+1;
   /* Only regular files have their data stored, all files have a header. */
      if (sbuf.st_mode & S_IFREG) {
      /* File sizes are written to even byte boundaries (like names). */
         filesize = ((sbuf.st_size+1)>>1)<<1;
         bytes += (filesize + hfixed + tmp);
      }
      else {
         bytes += (hfixed + tmp);
      }
      ++nfiles;
#ifdef DEBUG
	printf ("bytes: %ld \n", bytes);
#endif
      if (tflag) {
         if (verbose) {
         /* Print name and size. */
            if (sbuf.st_mode & S_IFREG)
	       printf ("%-15s\t%d\n", name, sbuf.st_size);
            else {
               int debug=(sbuf.st_mode>>12);
	       printf ("%-15s\t<%s>\n", name, 
                       file_type[(sbuf.st_mode>>12)&7]);
            }
         }
         else {
         /* Just print the name. */
            printf ("%s\n", name);
         }
      } 
   }
/*
   At the end cpio sets a "trailer" (38 bytes) is added.  Then the last 
   block is filled to the block boundary.  
*/
   bytes += 38;
   fblocks = (bytes + 511)/512;  /* Things seem 512 oriented. */
   if (Bflag)
      fblocks = 10*((fblocks + 9)/10);  /* round up to 10 blocks */

   floppies = (fblocks + fbsize - 1)/(fbsize);

   if (verbose) {
      int hours, minutes, seconds;
      float time_constant;

      if (Bflag)
         time_constant = TIME_FOR_BLOCKED;        
      else
         time_constant = TIME_FOR_NORMAL;       
   /* 
      Compute amount of time based upon past experience with 790K floppies.
      Do a simple scaling for other sizes--probably not valid, though.
   */
      seconds = time_constant*((float )fblocks/fbsize)+0.5;
      seconds = seconds * fbsize/DEFAULT_FLOPPY_SIZE;  
      minutes = seconds/60;
      hours = minutes/60;
      seconds %= 60;
      printf ("\ncpiosize(%s): bytes = %ld, nfiles = %d\n",label,bytes,nfiles);
      printf ("cpiosize(%s): Estimated time to write %d:%2.2d:%2.2d%s\n",
		label, hours, minutes, seconds,
		" (except changing floppies)");
   }
   printf ("\ncpiosize(%s): %d blocks ( %.2f, or %d floppies)\n\n", 
	   label, fblocks, (float )fblocks/fbsize, floppies);
}

usage()
{
   fprintf (stderr,"usage: cpiosize [-cBv] [-n label] [-s size]\n");
   exit(0);
}
------------------cut here for signature :-) -----------------
-- 
	John Campbell               ...!arizona!naucse!jdc
                                    CAMPBELL@NAUVAX.bitnet
	unix?  Sure send me a dozen, all different colors.

tanya@adds.newyork.NCR.COM (Tanya Katz) (03/03/89)

=-=-=-= help =-=-=-= help =-=-=-= help !!

Would someone please email me a copy of cpiosize.c, the extremely
useful utility that tells how many floppies are required to backup
cpio archives.  I have lost my copy!

Thanks!

	Tanya.Katz@adds.newyork.ncr.com

------------------------------------------------------------------------------
      ###   ######  ######   #####       Tanya Katz  
     #   #  #     # #     # #            
    #     # #     # #     #  #####       UUCP : ncrlnk!adds!tanya      
    ####### #     # #     #       #             tanya.katz@adds.newyork.ncr.com
    #     # ######  ######   #####       PHONE: (516) 231-5400 X430 

    Applied Digital Data Systems, Inc.   100 Marcus Blvd., Hauppauge, NY 11788 
------------------------------------------------------------------------------