steve@nuchat.UUCP (Steve Nuchia) (12/09/89)
The following is an experimental set of backup software I've been playing with for several months. It was inspired by the formidable problem of backing up a mid-sized unix system on floppies, and goes far toward making that prospect tolerable. It is running on two 386/ix machines, an A/UX box, and a 3b1. There are differences between the versions, this is my 386 version. It is not yet suitable for use by someone who won't or can't read and understand the code -- my purpose in posting it at this stage is to publish the basic algorithm and to get some input on which way to go with it. If anyone hacks on it I'd like to get a copy of your results. -- Steve Nuchia South Coast Computing Services (713) 964-2462 "Man is still the best computer that we can put aboard a spacecraft -- and the only one that can be mass produced with unskilled labor." - Wernher von Braun #! /bin/sh # This is a shell archive. Remove anything before this line, then unpack # it by saving it into a file and typing "sh file". To overwrite existing # files, type "sh file -c". You can also feed this as standard input via # unshar, or by typing "sh <file", e.g.. If this archive is complete, you # will see the following message at the end: # "End of shell archive." # Contents: README all_kill bkscan fileinfo.c fit.awk killcrunch.c # mcheck minit mrecycle script # Wrapped by steve@nuchat on Fri Dec 8 20:50:34 1989 PATH=/bin:/usr/bin:/usr/ucb ; export PATH if test -f README -a "${1}" != "-c" ; then echo shar: Will not over-write existing file \"README\" else echo shar: Extracting \"README\" \(4007 characters\) sed "s/^X//" >README <<'END_OF_README' XThis is a backup package based on an algorithm distinct from Xthe usual full/incremental method. It maintains a database Xof initialized media and the files backed up on each volume. XIt cycles through the set of volumes over time, filling each Xvolume with the files "most in need" of backing up, with the Xresult that the files backed up to the next volume to be written Xare always saved somewhere else before the volume is scratched. X XIt uses afio as its copy-out program, and maintains a simple Xmachine readable lable on each volume. It needs some smarts Xfor bulk restores, as it stands you read in all the volumes X(without the -u cpio option) and try to decide what to delete. X XThe system in use on three machines I am responsible for, and Xworks reasonably well. There is some customization required, Xnotably in minit, mcheck, and script. Script is the main Xperiodic backup entry point. X XThe kill files mechanism has worked but is currently in Xdissaray. The idea is to take .nobackup files scattered Xaround the filesystem containing patterns specifying what Xfiles from there on down don't need backing up. The X.nobackup files need to be observerd by a re-written Xdisk scan process -- as it is the kill_all file, a sed Xscript, is being maintained manually and should be checked. X XThe main things that need doing are to keep the databases Xunder some kind of dbms or at least indexes file structure Xand to make the disk scan a C (or perl?) program. And Xdevise an algorithm for restore scheduling. X X X[the rest of this file is my design notes scratch pad.] X XGoals: X X efficiency when filesystem is reorganized X efficiency in face of many links X work well with both small and large media available X positive id of volumes X support both backup and archiving X provide flexible user preference facility X media compatible with standard utility(s) X database human-readable and recoverable from media set X able to handle files larger than medium X use "cycle" algorithm X XEntities: X X Files: keyed by: mount-point, dev/inode, ctime, & mtime X contains: size, usr/grp, mode, volume assignment(s) X X Directories: X keep track of pathnames and deletions X X Volumes: X keep id, type, class, name + sequence X type = {tape,floppy...} X class = {offsite bkup, regular bkup, archive...} X time of creation X X Hint files: X user can specify patterns, subdirs, etc X to not backup per directory. X X Delta files: database update files. X X volume header: copy of volume database record. X XCommands: X X bkscan -- scan a directory recursively (normally /) and X write a delta file reflecting current reality. X takes .bkup-hint files into account. apply the X delta to the master by default if all goes well. X X bkinit -- initialize or reinitialize a medium. If reinitializing X it will delete references in the file table, use this X when a medium is lost or damaged. X X bkup -- examines the files list and copies files not yet backed up X to media of specified class. For media using the cycle X algorithm, consider the oldest N volumes lost when determining X what to back up. Verify each medium as it is brought online X and perform all I/O for the underlying program (cpio). Writes X a delta file and if all goes well applies it to the database. X X bkupdate -- apply a delta to a master database file X X bkreport -- format a report at various levels of detail from X a master or delta file. X X bkgrok -- scan a bk backup medium and produce a delta for it. X X bkselect -- interactive restore (or archive) file selector. X writes a delta file suitable for feeding to bkup X or bkrestore. Runs in batch mode to drive bkup... X X bkrestore -- from a delta file describing the files to be X restored and optional renaming parameters schedule X and drive restoration from specified media class. X XNotes: must update master before running bkup when scratching volumes. X what happens when the mount arrangement changes? X Xscan stage should be integrated with .nobackup facility Xto save time and prevent timing windows unpleasantries. END_OF_README if test 4007 -ne `wc -c <README`; then echo shar: \"README\" unpacked with wrong size! fi chmod +x README # end of overwriting check fi if test -f all_kill -a "${1}" != "-c" ; then echo shar: Will not over-write existing file \"all_kill\" else echo shar: Extracting \"all_kill\" \(199 characters\) sed "s/^X//" >all_kill <<'END_OF_all_kill' X/\/core$/d X/^\/tmp\/.*$/d X/^\/usr\/tmp\/.*$/d X/^\/files\/news\/.*\//d X/^\/files\/extra\/.*\//d X/^\/usr\/backup\/files$/d X/^\/usr\/backup\/saved$/d X/^\/usr\/backup\/tlist$/d X/^\/usr\/backup\/wlist$/d END_OF_all_kill if test 199 -ne `wc -c <all_kill`; then echo shar: \"all_kill\" unpacked with wrong size! fi chmod +x all_kill # end of overwriting check fi if test -f bkscan -a "${1}" != "-c" ; then echo shar: Will not over-write existing file \"bkscan\" else echo shar: Extracting \"bkscan\" \(235 characters\) sed "s/^X//" >bkscan <<'END_OF_bkscan' XPATH=/u/steve/bkup:$PATH; export PATH Xif [ $# = 0 ] Xthen X set `pwd` Xfi Xfor x Xdo X d=`cd $x; pwd` X find $d -name '.nobackup' -print | bk_buildkill > bk_sed$$ X find $d -print | sed -f bk_sed$$ Xdone | bk_fileprops > bk_delta$$ Xrm bk_sed$$ END_OF_bkscan if test 235 -ne `wc -c <bkscan`; then echo shar: \"bkscan\" unpacked with wrong size! fi chmod +x bkscan # end of overwriting check fi if test -f fileinfo.c -a "${1}" != "-c" ; then echo shar: Will not over-write existing file \"fileinfo.c\" else echo shar: Extracting \"fileinfo.c\" \(883 characters\) sed "s/^X//" >fileinfo.c <<'END_OF_fileinfo.c' X/* X * fileinfo -- print simplified info record for each filename X * on stdin or argv X */ X X#include <sys/types.h> X#include <sys/stat.h> X#include <stdio.h> X Xchar fnbuf[512]; X Xmain ( argc, argv ) X int argc; X char *argv[]; X{ X int i; X X if ( argc > 1 ) for ( i = 1; i < argc; i++ ) X fmtrec(argv[i]); X else while ( fgets ( fnbuf, sizeof(fnbuf), stdin ) ) X { X fnbuf[strlen(fnbuf)-1] = 0; X fmtrec(fnbuf); X } X} X X Xstruct stat st; X Xfmtrec(f) X char *f; X{ X long age; X X if ( stat ( f, &st ) ) X perror ( f ); X else X { X age = st.st_mtime; X if ( st.st_ctime > age ) age = st.st_ctime; X switch ( st.st_mode & S_IFMT ) X { X case S_IFIFO: /* fifo */ X case S_IFCHR: /* char */ X case S_IFBLK: /* blok */ X case S_IFDIR: /* diry */ X printf ( "0 0 %ld %s\n", age, f ); X break; X case S_IFREG: /* file */ X printf ( "0 %ld %ld %s\n", st.st_size, age, f ); X break; X } X } X} END_OF_fileinfo.c if test 883 -ne `wc -c <fileinfo.c`; then echo shar: \"fileinfo.c\" unpacked with wrong size! fi chmod +x fileinfo.c # end of overwriting check fi if test -f fit.awk -a "${1}" != "-c" ; then echo shar: Will not over-write existing file \"fit.awk\" else echo shar: Extracting \"fit.awk\" \(635 characters\) sed "s/^X//" >fit.awk <<'END_OF_fit.awk' XBEGIN { state = 1; nfile=0; bot=1 } Xstate == 3 { X print X exit X} Xstate == 2 { X size = int($3 / 512) + 3 X if ( size > maxcap ) X { X print $5 ": too big for any volume" > "assign.errors" X next X } X for ( i = bot; i <= nfile; i++ ) X { X if ( size <= cap[i] ) X { X cap[i] -= size; X break; X } X if ( cap[i] < 3 && i == bot && ++bot > nfile ) state = 3; X } X fname = "flist." X if ( i > nfile ) print X else print volno[i], $3, $4, $5 > vol[i] X} Xstate == 1 && NF == 3 { X nfile++; X cap[nfile] = $2 X volno[nfile] = $1 X vol[nfile] = "flist." $1 X if ( $2 > maxcap ) maxcap = $2 X next X} X$0 == "end-of-media-list" { X state = 2 X next X} END_OF_fit.awk if test 635 -ne `wc -c <fit.awk`; then echo shar: \"fit.awk\" unpacked with wrong size! fi chmod +x fit.awk # end of overwriting check fi if test -f killcrunch.c -a "${1}" != "-c" ; then echo shar: Will not over-write existing file \"killcrunch.c\" else echo shar: Extracting \"killcrunch.c\" \(1207 characters\) sed "s/^X//" >killcrunch.c <<'END_OF_killcrunch.c' X/* X * read .nobackup files names from stdin or argv and X * emit sed commands to be applied to find -print output X * implementing the requests found in those files. X */ X X#include <stdio.h> X#include <string.h> X Xchar fnbuf[512]; X Xmain ( argc, argv ) X int argc; X char *argv[]; X{ X int i; X X if ( argc > 1 ) for ( i = 1; i < argc; i++ ) X fmtkill(argv[i]); X else while ( fgets ( fnbuf, sizeof(fnbuf), stdin ) ) X { X fnbuf[strlen(fnbuf)-1] = 0; X fmtkill(fnbuf); X } X} X Xchar rebuf[512]; X Xfmtkill(fn) X char *fn; X{ X FILE *kf; X char *p; X X if ( ! (kf = fopen ( fn, "r" )) ) return perror(fn); X if ( p = strrchr(fn,'/') ) *p = 0; X else fn = "."; X while ( fgets ( rebuf, sizeof(rebuf), kf ) ) X { X rebuf[strlen(rebuf)-1] = 0; X fputs ( "/^", stdout ); X re_fmt(fn); X fputs ( "\\/", stdout ); X if ( rebuf[0] == '/' ) /* regexp */ X fputs ( rebuf+1, stdout ); X else X re_fmt(rebuf); X fputs ( "$/d\n", stdout ); X } X fclose(kf); X} X Xre_fmt(s) X char *s; X{ X while ( *s ) X { X switch ( *s ) X { X case '\\': X case '.': X case '/': X putc('\\',stdout); X break; X case '*': X fputs("[^/]",stdout); X break; X case '?': X fputs("[^/]",stdout); X s++; X continue; X } X putc(*s++,stdout); X } X} END_OF_killcrunch.c if test 1207 -ne `wc -c <killcrunch.c`; then echo shar: \"killcrunch.c\" unpacked with wrong size! fi chmod +x killcrunch.c # end of overwriting check fi if test -f mcheck -a "${1}" != "-c" ; then echo shar: Will not over-write existing file \"mcheck\" else echo shar: Extracting \"mcheck\" \(254 characters\) sed "s/^X//" >mcheck <<'END_OF_mcheck' Xvol=$1 Xwhile true Xdo X echo load volume $vol and press return\\c X line > /dev/null X dd if=/dev/rmt0 bs=512 count=1 of=tmp_header 2>/dev/null X vid=`head -1 tmp_header` X if [ $vid != $vol ] X then X echo "hey, that's volume " $vid X else X exit 0; X fi Xdone END_OF_mcheck if test 254 -ne `wc -c <mcheck`; then echo shar: \"mcheck\" unpacked with wrong size! fi chmod +x mcheck # end of overwriting check fi if test -f minit -a "${1}" != "-c" ; then echo shar: Will not over-write existing file \"minit\" else echo shar: Extracting \"minit\" \(320 characters\) sed "s/^X//" >minit <<'END_OF_minit' X# X# initialize a backup volume X# Xwhile true Xdo X vnum=`awk '$1 > top { top = $1 } END { print top + 1 }' media.db` X echo "insert volume " $vnum " and press return" X line > /dev/null X mt ret X echo $vnum > tmp_label X dd if=tmp_label count=1 conv=sync of=/dev/rmt0 X echo $vnum 110000 0 >> media.db Xdone END_OF_minit if test 320 -ne `wc -c <minit`; then echo shar: \"minit\" unpacked with wrong size! fi chmod +x minit # end of overwriting check fi if test -f mrecycle -a "${1}" != "-c" ; then echo shar: Will not over-write existing file \"mrecycle\" else echo shar: Extracting \"mrecycle\" \(78 characters\) sed "s/^X//" >mrecycle <<'END_OF_mrecycle' Xfor x Xdo X echo "recycling volume " $x X ed - saved <<foo Xg/^$x /d Xw Xq Xfoo Xdone END_OF_mrecycle if test 78 -ne `wc -c <mrecycle`; then echo shar: \"mrecycle\" unpacked with wrong size! fi chmod +x mrecycle # end of overwriting check fi if test -f script -a "${1}" != "-c" ; then echo shar: Will not over-write existing file \"script\" else echo shar: Extracting \"script\" \(3473 characters\) sed "s/^X//" >script <<'END_OF_script' X# backup script XPATH=/usr/backup:/usr/lbin:$PATH; export PATH Xcd /usr/backup X Xcompress < files > files.Z Xcompress < saved > saved.Z X X# N is number of media to use per run, max 9 XN=1 X X# DEV is the name of the backup device XDEV=/dev/rmt0 X X# select N oldest media into mlist Xsort -n +2 +0 media.db | head -$N > mlist X X# remember the age A of youngest XA=`tail -1 mlist | cut -d' ' -f3` X X# now sort them for user convenience Xsort -n -o mlist mlist X X# for unattended (tape, N=1) backup get the volume mounted, X# otherwise let the operator gather the volumes Xif [ $N = 1 ] Xthen X vol=`cut -d' ' -f1 mlist` X echo Will need volume $vol X mrecycle $vol X mcheck $vol Xelse X echo Will need volumes `cut -d' ' -f1 mlist` X mrecycle `cut -d' ' -f1 mlist` Xfi X X# now scan the disk Xecho Scanning the disk. Xfind / -print | sed -f all_kill | fileinfo > files X Xecho Assigning files to media. X X# format of files and saved is {vol, size, tstamp, name}. vol is X# zero in files. format of media.db is {vol, size, btime}. We X X# files and saved are of the form (vol, size, tstamp, name), X# with vol degenerate (0) in files. media.db is of the X# form (vol, size, btime). We join them into X# tlist: {vol, btime, size, tstamp, name}. X X# join { files U saved } X { media.db U (0,0,0) } -> tlist Xecho 0 0 0 | sort - media.db > mdb Xsort +0 -1 files saved | join -j 1 -o 2.1 2.3 1.2 1.3 1.4 - mdb > tlist X X# format of wlist (and tlist) is (vol, btime, size, tstamp, name) X X# for each group differing only in volume,btime we want to keep X# the youngest btime found, but only if the file was found in X# the scan -- in which case there will be a vol=0 record also. X X# so we sort tlist such that records differing only in vol,btime are X# grouped together in ascending btime order. the result is piped X# into an awk script that prints the last (most recently saved) X# line for each group having a 0 (freshly scanned) element. X# The sentinel is provided to help flush the last group. X X(sort +4 +2 -4 +1n tlist ; echo sentinel ) | Xawk '$3 != s || $4 != t || $5 != n { if (z) print v,b,s,t,n; z=0 } X { if(!$1) z=1; v = $1; b = $2; s = $3; t = $4; n = $5}' > wlist X X# Now wlist has one record for each record in files. We sort X# them into ascending btime order so that the files most in need X# of saving will be assigned first. X X# now sort by btime, oldest first -- X# the zeros will sort out first. Xsort +1n -2 +4 -o wlist wlist X X# now assign from wlist to N flists X# flist.x, flist.y ... where x,y... are the volumes selected. X# flist.0 gets any we can't assign. In the process the btime gets stripped X# from flist.i but not from flist.0. X Xrm -f flist.* Xcp /dev/null assign.errors Xecho end-of-media-list | awk -f fit.awk mlist - wlist > flist.0 X X# report any errors from assignment Xcat assign.errors X X# remember age T of oldest file not backed up Xif [ -s flist.0 ] Xthen X T=`head -1 flist.0 | cut -d' ' -f4` Xelse X T=`rawdate` Xfi X X# report overlap percentage = 100(A-T)/A Xnow=`rawdate` Xecho "Overlap =" `expr 100 \* \( $T - $now \) / \( $A - $now \)`"%" X X# for each flist.i (i != 0) copy to medium and update data bases Xrm flist.0 Xls flist.* | cut -d. -f2 | sort -n | Xwhile read vol Xdo X if [ $N != 1 ] X then X mcheck $vol < /dev/tty X fi X dd if=$DEV count=1 of=tmp_label 2>/dev/null X sed -e 's/[^ ]* [^ ]* [^ ]* //' flist.$vol | sort | X (cat tmp_label; afio -of -b 8k -c 16 -) > $DEV X cat flist.$vol >> saved X rm flist.$vol X ed - media.db <<foobar X/^$vol /s/[0-9]*\$/$now/ Xw Xq Xfoobar X Xdone X Xecho Backup complete. END_OF_script if test 3473 -ne `wc -c <script`; then echo shar: \"script\" unpacked with wrong size! fi chmod +x script # end of overwriting check fi echo shar: End of shell archive. exit 0 -- Steve Nuchia South Coast Computing Services (713) 964-2462 "Man is still the best computer that we can put aboard a spacecraft -- and the only one that can be mass produced with unskilled labor." - Wernher von Braun