rjk@sawmill.uucp (Richard Kuhns) (07/28/89)
First, some background. Backups on 3b1s can be a royal pain on floppy disks, and we can't afford to buy tape backups for all of our machines (40+ in 4 states). Our solution: a new backup program. We're currently hacking on `afio', which was posted to the net some time ago. In the process we've come across several bugs that we'd like to bring to the attention of the net at large, and see if anyone has solutions. Our mods to afio: in order to speed/simplify verification of the floppy, we keep 1 floppy disk worth of data in-core at all times. Second, we've added an option to automatically compress(1) the data before writing (which necessitates a fork(2)). For 360k floppies, this is no big deal. However, for those machines where we've installed 3 1/2 inch drives, the 795k buffer required causes forks to fail with great regularity, presumably because we're out of swap space. We have at least 2 1/2 M RAM on each machine, with 5000 blocks of swap space. Our solution: use enough shared memory segments to hold the disk image. This works perfectly as long as the output of afio is directed to a regular file -- forking is MUCH faster. (By the way, the maximum size of a shared memory segment on the 3b1 running 3.51 is 262144 bytes == 2^18.) If we attempt to write a block of data from the shared memory segment to /dev/{r}fp021, the write(2) hangs and the process is unkillable -- we have to reboot to get rid of the process. Our current work-around is to malloc(3) an additonal buffer the size of a disk block, memcpy(3) from the shared memory segment to the malloc()ed buffer, and write the malloc()ed buffer (ICK!!). The next problem, which we DON'T have a workable solution for yet, is difficulties opening /dev/tty. I know that there have been discussions of this on the net in the past, but I don't recall ever seeing a definitive statement of the cause/cure (possibly because there isn't one :-)). Running afio directly or 1 shell `deep' works fine (so far, anyway). Running a shell script which runs another shell script which runs afio causes all open(2)s of /dev/tty to fail. This makes it extremely difficult to prompt for additional disks. Any suggestions (other than "don't do that")? We can mail a small (98 line) C program which demonstrates the `hanging write(2) from shared memory' to anyone who would like an immortal process of his/her very own. We will also make the patches to afio available as soon as we get it working. WARNING: We're well aware that compress(1)ing the data before backing it up makes the backup much more fragile, but we have so much data that needs to be backed up on a daily basis that anything else (short of tape drives which we can't afford right now) is unworkable. Finally, there are apparently problems with the GDGETA and GDSETA ioctl(2)s as documented in gd(7). While GDGETA will return a struct gdctl, GDSETA apparently only sets the in-core copy of this buffer. Dismounting the floppy, closing it and re-opening it reveals that the struct gdctl on the disk is unchanged. We also can't figure out how to do the checksum mentioned in <sys/gdisk.h>. We have open tickets at the AT&T Hotline for all the problems mentioned above, but no answers yet. RE the hanging write problem: one of the techinicians at the hotline told us "we don't support C programming...". Interesting... Thanks in advance for any help. Mail can be addressed to me Rich Kuhns(rjk) or Jeff Buhrt (buhrt) at {the_known_world}!newton.physics.purdue.edu!sawmill!{rjk, buhrt} PS An additional feature we've added to afio which makes users very happy is the ability, when a verify on the (for example) 3rd disk fails, to reformat the disk (or a new one) directly from afio and continue with the backup from the point where it failed.