shawn@mit-eddie.MIT.EDU (Shawn F. Mckay) (03/06/87)
Howdy, I have a few ideas I wonder if people would be willing to ponder, and perhaps lend a hand with, I'm very open to totally new ideas as well, ths basic idea is to make backups at our site easy, simple, and reliable. Thanks for any help. Ideas for system backups; The problem -- We have 1 main system, (a vax w/ra's), and two clusters of little systems. (Different types, but they look like workstations). Within these clusters, there are critical machines, and non-critical machines. The critical machines need to be backed up on a nightly basis, and the non-cricical machines need to be backed up on a weekly basis, or on user request. (Being development machines, it would be nice to have them backed up when people to some amount of work, rather then all the time). Resources -- We have two operations people, who have to make sure other things keep working as well, which limits the time spent on backups, to something less then 50%, (Would be nice to chop that number down below 10%). We have as many mag tapes as it takes, but would be nice to cut this number down as well, so it takes less time. ---- The soloutions so far ---- Backup type (a) Procedure -- o Do a full dump of the main system once a week, with incrementals done each day. (Difference being mainly physical/method) o Do a full backup of all remotes's each week, some each day, and incrementals each night. Pros -- o Given the systems listed below, I can't really see any. Cons -- o Great in number in size, I think they are obvious, the main being number of tapes/ manhours/ and obvious cpu usage. Backup type (b) Procedure -- o The main system has two sets of file systems, (primary/all), the primary set is backed up weekly, and the whole system is backed up monthly, (i.e. 'all'). Incrementals are done on all file systems at level 9 daily. If they expand to more then 1 tape, a full dump of all file systems becomes needed. o The remote systems each have there own cron script to initiate backups to there individual tape drives, and do so on a regular basis as is needed by that particular system, reporting errors to a human, but otherwise being quiet. (This for incrementals, we still need to save full dumps more then a night). o Critical remotes may optionally send some data to our main system, or perhaps shadow something in compressed format to another remote. Pros -- o We cut out a great deal of human intervention o We gain reliable, tested, backups. o Done at night, so minimal cpu loss o Done with a tape for remotes, so minimal tape use, except for full dumps, which still have to get there own tape. Cons -- o The tape drive(*s*) must work o Each machine, MUST have it's own drive. o The need for high quality tapes comes up fast, since they will be left sitting in the drive all day in most cases. o The potential to write over a good tape which someone didn't remember to swap out of the drive exsists. o The potential for someone to forget to put a tape in the drive on a critical system exsists. (I'm sure if I want to nitpick, I could add more lines to this). Backup type (c) Procedure -- o Procedure is complex, I'll explain by players - Main system will be called master, and remotes slaves. (Original eh?) Master will query slaves each night to ask them to give it an idea of how much data has been modified that day, and what total bytage it needs to have saved. When the slave replies with a number, they master then decieds if this is a full, or incremental save time, based on knowing how much data is reasonable to save with an incremental. I have allways felt that if you have to save more then a third of the disk with an incremental it's time for a full dump. (This makes it easier to restore). The master then has several options, based on how each system was backed up last. a) Save the incremental data to it's drive somewhere, or to a designated host on the cluster to store such information. I'll call this type of host a 'buddy', since it would be saving information for it's buddy. Every system opn the cluster is a buddy for at most 2 systems, but it could be any 2 systems based on how much space that buddy has left. b) (B was in a, wasn't it? Oh well). c) The master could also decied to save data to it's own tape drive, which works well as an option, but would probably be an 'overflow' option, more then a regular option. Alot of what will make this system better then most, is the master slave relationship, for example, if master tells slave 'save leve 9 to /dev/mt0', (for it's tape drive), and slave says 'cant-offline', then the master can reissue the next way to save the level 9, by saying something like 'save level 9 remote host', where host is the name of a buddy that master has checked to see has the space for it, and then the slave sends it's level 9 dump image. Pros - o As stated above, it's got a stronger will to work. o Less human intervention, although to make sure people know where everything is, it must have strong/clear event logging. o Less tapes, since it only uses tapes as part of a cycle of data preservation, making it very hard to lose a great deal of anything, since if the tape dies, it might be on a buddy, of the master may have a copy. o Automation, just ask the master to get you the latest copy of file 'x' from system 'y', and let it deal with where it put the file. o Room for expansion, if you add a new remote, or new type of remote, all you have to do is define it to the master, in what should be a simple text database, and write the interface for the remote. Cons -- o A system this nice has bugs, and takes a while to write, if you wan't to do a good job. o Once up, it would require people to read the manual. o If the master is down, all hell breaks loose, right? Wrong. As I didn't remember to mention, (and at 300 baud, will mention right here), if the master goes down, 1 node from each cluster, should have a copy of the main systems database, and a program to allow it to become an emergency master, (but not a long term master, because this would lead to chaos). o I'm sure there are more, but that's why I'm asking for comments. Backup type (d) ------------------------------------------------------------ ** This space left blank for your very welcomed ideas ** ------------------------------------------------------------ Final comments; I would also like to use data compression in some step before data gets written out to a real tape, it's unclear what the tradeoff's are, I would expect that to lose a bit, in a high compression tape, would be a problem, to use a low compression method, would be useless, so comments here are welcome also. Thanks in advance, -- Shawn Reply paths; ---------------------------------------- Usenet: mit-eddie!shawn, think!ima!haddock!shawnm Arpanet: Shawn at Mit-Mc, Shawn at Mit-Ai Internet: shawn@eddie.mit.edu, shawn@borax.lcs.mit.edu Chaosnet: Shawn@Mit-eecs, Shawn@Mit-eddie