wayne@dsndata.uucp (Wayne Schlitt) (06/08/91)
well, it's a long story, but as a result of trying to install hp-ux 7.05, i had to re-initialize my main disk and restore everything from backups. that's when i found out that for at least the last six months, all of the backups we have been doing were corrupted! DO NOT, i repeat DO NOT use ftio or cpio for backups. there are SERIOUS problems with them. for those who do not know, the /etc/backup script uses cpio. remove /etc/backup off your disk. it is an accident waiting to happen. basically, there are three problems with ftio, and two of the three problems also happens with cpio because ftio and cpio compatible files. problem #1: the first problem with ftio is that is uses shared memory and it allocates the shared memory segment at the default location. this default location only leaves 64k to be used by malloc. ftio needs to malloc memory to keep track of the links on the files. 64k doesnt go very far. when the memory runs out, ftio stops linking files. this means that you get duplicate files and that can cause your disk space usage to go way up. (in our case, it filled up the disc.) you can change a kernel parameter (shmbrk) to "fix" this problem, but i dont know what formula to use to calculate what this value needs to be and if you set it too large, then programs that allocate large amounts of shared memory will start to fail because you are reserving too much memory for malloc. *sigh*. we are currently using a shmbrk value of 1024 (4MB for malloc), but i dont know if this will restore our entire system since i didnt try restoring /usr/spool/news. this problem also happens with cpio. cpio documents the problem in the bugs section of the man page, ftio makes no such warning. i (wrongly) assume that this meant that ftio didnt have this problem. unfortunately, i dont know of any way to get around this problem when using cpio. problem #2: the second problem is much more serious and it effects both cpio and ftio. the problem is that the inode numbers that are written to tape are limited to 64k. under the old ATT file system, you could not have more than 64k inodes on the disk. using the berkeley fast file system (like hp and sun have used for years) you can have more than 64k inodes. this means that if you have a file at inode 65556 (64k+20), cpio cant tell the difference between that file and the file at inode 20. since the size if the inode field is defined to be this way for compatibility reasons, i will probably not see any change in cpio. so, if you have a file with inode 65556 (64k+20) that is linked to another file, when you try to restore that file, it will actually link it to the file at inode 20. this means that when you try to restore your system, you could end up with, say, an article out talk.bizarre linked to /dev/kmem. in fact, when we restored our disk, we had things that were just as bad, if not worse than this. **** THE RESULT **** IF you have usenet news, OR you have a hp cluster (lots of links in the /dev/pty directories) OR you have a large hard disk, OR use use ACL's you could very easily run into this problem. to the best of my knowledge, there is no way around the problem because the information is lost when you do the backup. you are screwed. problem #3: this one i didnt personally run into, but i guess is a fundamental problem with cpio, ftio, tar, and fbackup. only dump and dd from the raw device do not have this problem. basically, when cpio et al read a file to back it up, it cause the system to change the "access time" of the file. if the access time isnt reset, then you cant do incremental backups 'cause time stamp info has been lost. so, cpio et al will reset the access time via the utime(2) system call. but this still leaves the "inode changed time" set. so, cpio et al cant depend on the inode changed time to tell them if the file needs to be backed up. other commands, such as chmod, chown and such will change the inode change time, but not the access time. this means, that if the only thing you have done is changed permissions or ownership of a file since the last backup, then the incremental backup will not notice and the new file stats will not be backed up. this is only a problem if you are doing incremental backups. in the HP manuals, they list 4 different ways of doing backups. the first two are cpio and ftio. i cant believe that with such fundamental problems with cpio/ftio that they would even suggest using them. the other two methods that they suggest are dd and fbackup. dd doesnt work if the media that you are backing up to is smaller than the backup media, it doesnt backup a live file system very well, and it is hard to restore from. fbackup is non-standard, and still has one of the three problems. it kind of looks like the one command that hp doesnt mention is dump, and it is the only way to get really reliable backups. (actually, they dont mention tar either, but from what i understand, tar has it's own set of major problems if you try to use it for backups.) we had our computer down for 2 1/2 days during the week, and it took me and another sysadmin the better part of a week to finish cleaning up from this mess. i really expected to be able to format the disk and restore from backups and have a working system in a few hours. instead me and the other sysadmin worked, literally, night and day for over 2 days just to get things working again. we are vab, and the thought of this kind of thing happening to one of our customers really scares me. i couldnt imaging trying to walk someone through all of this over the phone. we would probably have ended up flying out the the customer site or something. the only thing i can think of that is more important than good backups, is being able to install the system in the first place. does hp really expect people with less than 5 years of unix experience and good software development skills to be able to do backups? how were we supposed to know that the backups we were doing were corrupt? scratch the disk and do a restore, see if it works, if it doesnt, go off and find another way? how many weeks worth of work does hp expect every customer to do in order to find a workable backup system? why do they even mention cpio and ftio in the manuals? in case you cant tell, i am more than a little upset about all this. because of the backup being bad, we werent able to get some important things done for a trade show. working 18 hours a day for 3 days didnt do much for my disposition. anyway, i guess fbackup, dd or dump are then _only_ commands you should be using for backups. -wayne
tm@othello.altair.fr (Timothy Mullins) (06/10/91)
>anyway, i guess fbackup, dd or dump are then _only_ commands you >should be using for backups. That is very bad news. I cannot use dump either because it cannot handle Context Dependent Files. Do'nt we just love HP. Tim Mullins
kgj@hpcndxyz.CND.HP.COM (~Karl Jensen) (06/10/91)
> this means, that if the only thing you have done is changed > permissions or ownership of a file since the last backup, then the > incremental backup will not notice and the new file stats will not > be backed up. You could alleviate this problem by modifying the backup script to record the list of files on the backup disk along with their modes and ownership. This is also useful to detect files which have been deleted after the base backup and before the incremental so that you don't restore deleted files.
rsh@hpfcdc.HP.COM (Scott Holbrook) (06/11/91)
> problem #2: > > the second problem is much more serious and it effects both cpio > and ftio. the problem is that the inode numbers that are written > to tape are limited to 64k. under the old ATT file system, you > could not have more than 64k inodes on the disk. using the > berkeley fast file system (like hp and sun have used for years) you > can have more than 64k inodes. this means that if you have a file > at inode 65556 (64k+20), cpio cant tell the difference between > that file and the file at inode 20. since the size if the inode > field is defined to be this way for compatibility reasons, i will > probably not see any change in cpio. This problem has been fixed in cpio (6.5 and later releases on the s300, 7.0 and later releases on the s800). The problem is handled in a way that does not change the format, but still preserves all link information. > so, if you have a file with inode 65556 (64k+20) that is linked to > another file, when you try to restore that file, it will actually > link it to the file at inode 20. this means that when you try to > restore your system, you could end up with, say, an article out > talk.bizarre linked to /dev/kmem. in fact, when we restored our > disk, we had things that were just as bad, if not worse than this. This is incorrect. Ftio still does (and cpio did before the 6.5 release) detect inode numbers that are > 65535 and maps them to 65535. Upon restore, both cpio and ftio detect a file with inode 65535 and restore the file as a plain file, losing the link information. It does not, however link the file to the "wrong" file. The only bad behavior is that you end up with multiple copies of your data. While this isn't great, at least you aren't losing your data. I don't know why ftio hasn't been fixed, but cpio does work properly and will not lose any information. I will submit a defect report against ftio so that it can get fixed for a futrue release. Scott Holbrook HP-UX kernel The opinions expressed here are mine and mine only, they do not represent an official or un-official statement of the Hewlett-Packard Company.
gabby@gabs.lbl.gov (Gabby Obegi) (06/11/91)
In article <2268@seti.inria.fr> tm@othello.altair.fr (Timothy Mullins) writes: >>anyway, i guess fbackup, dd or dump are then _only_ commands you >>should be using for backups. > >That is very bad news. I cannot use dump either because it cannot handle >Context Dependent Files. >Do'nt we just love HP. We've been using dump for over a year now and have had great luck with it. As a matter of fact, six month ago we lost over 500Mb of data when the inode for / got corrupt. Even though this was the root disk with CDF's on it, we were able to recover the disk entirely! -- -Gab
franks@hpuamsa.neth.hp.com (Frank Slootweg CRC) (06/11/91)
Just some quick notes : problem #1: memory usage by ftio and cpio I don't think that cpio uses shared memory. It uses normal (i.e. virtual) memory. As long as you have enough virtual memory there is no problem. problem #2: multiply linked files with inodes >64k For ftio : I *think* you are right (SR# 1650147959). Can investigate if needed. For cpio : This was fixed with some clever algorithms in 6.5 (300) and 7.0 (800). The tape format is still compatible. Are you sure you have a new and stock (and HP :-)) cpio? At some release (can find out number if needed) there was some problem with getting the wrong cpio. If I remember correctly a "what" on the wrong cpio said something like "update version". Customers were informed at the time about this wrong version and how to get a good one. problem #3: can not (re)set the change-of-inode time You are right. This is a fundamental one. Any backup utility can only be as good as the OS on which it runs. In this respect UNIX (not just HP-UX) is just not good. > in the HP manuals, they list 4 different ways of doing backups. the > first two are cpio and ftio. i cant believe that with such > fundamental problems with cpio/ftio that they would even suggest using > them. the other two methods that they suggest are dd and fbackup. Can you please be specific (i.e. which manual, edition and page)? The 7.0 HP-UX System Administration Tasks manual for the Series 300 (you said 7.05, so I assume 300/400) recommends (on page 11-5) to use "fbackup". Hope this helps somewhat. If you want me to investigate then mail me and I will see what I can do. Frank Slootweg, HP, Dutch Customer Response Center.
lanzo@wgate.UUCP (Mark Lanzo) (06/12/91)
In a prior article franks@hpuamsa.neth.hp.com (Frank Slootweg CRC) wrote: > Just some quick notes : > > problem #2: multiply linked files with inodes >64k > > For cpio : This was fixed with some clever algorithms in 6.5 (300) and > 7.0 (800). The tape format is still compatible. How is this done? I did "man 4 cpio". It clearly shows that the header structure stores the inode as a "ushort" value. No other field in the header structure seems to represent the upper word of the inode value. Therefore either (1) the inode number must still be only 16 bits; or (2) the header has been changed & the man pages are now incorrect. If (1) is true, then the only choices I can see are that: (a) inodes are mod 64K, just like the original article suggested (with the attendant problems of bogus links being made), or (b) there's some magic value used for the inode which never gets overlaid and link information is discarded after a certain point. Choice (b) is at least a little better than (a) in that it means that you can at least restore the files without corrupting something else, but it implies that links won't be made in those cases where they *should* be, so you'll end up with duplicate copies of things. Neither of these qualifies as an acceptable solution. Is there some other option which I've overlooked? I for one am not going to rest comfortably until I get some positive info from HP sounding the "all's clear" signal! Are there any other PD/Freeware type backup utilities out there that I should consider (perhaps the GNU folks have something?) ? -- Mark -- P.S.: The cpio/ftio/etc stuff has been reported to the HP response center (probably multiple times!). One ref number is A1784569.
franks@hpuamsa.neth.hp.com (Frank Slootweg CRC) (06/14/91)
Mark Lanzo writes : > > For cpio : This was fixed with some clever algorithms in 6.5 (300) and > > 7.0 (800). The tape format is still compatible. > > How is this done? I did "man 4 cpio" ... Since I do not have the algorithms and formats handy (is is "ages" ago that this was done) *and* I do not know if I can make this public, I leave this one to others to answer. > I for one am not going to rest comfortably until I get some positive info > from HP sounding the "all's clear" signal! You are of course entitled to your position, but I think (for cpio) all *is* clear. The basenote poster said something is seriously wrong. I and Scott Holbrook justed wanted to put people at ease. HP has fixed a *design limitation* (not a *bug*) in cpio without compromising compatibility. If we deserve anything for that, I think we more deserve a compliment than anything else. Frank Slootweg, HP, Dutch CRC.
franks@hpuamsa.neth.hp.com (Frank Slootweg CRC) (06/14/91)
Mark Lanzo wrote : > Is there some other option which I've overlooked? and I wrote : > Since I do not have the algorithms and formats handy (is is "ages" ago > that this was done) *and* I do not know if I can make this public, I > leave this one to others to answer. I can no longer control myself. :-) Without giving away any (possible) HP private information, I can probably safely say the following : - The *absolute* value of the inode number on tape has no meaning for the target system. I.e. when restoring/overwriting a file, you do not create/open an *inode*, you create/open a *file* (which just happens to have an inode). - The inode number on the source (cpio -o) system *can* be the same as on the target (cpio -i), but this will normally not be the case and it is surely not a requirement. - The inode number on tape is only *needed* for files with more than one link. Now you go figure how we did it! :-) Hint: You wrote : > How is this done? I did "man 4 cpio". Sure you did, but did you *read* it? :-) Frank Slootweg, HP, Dutch CRC.
vandys@sequent.com (Andrew Valencia) (06/15/91)
franks@hpuamsa.neth.hp.com (Frank Slootweg CRC) writes: > Since I do not have the algorithms and formats handy (is is "ages" ago >that this was done) *and* I do not know if I can make this public, I >leave this one to others to answer. I don't remember how HP fixed it. I seem to remember that when it bit me I came up with the idea that when you emitted an entry <h_ino, h_nlink>, after you had written the h_nlink guys for this h_ino, the inode was now unused for the rest of the cpio dump. Thus you could map one of your > 64K inumbers down into it and reuse it that way. Since inumbers > 64K are utterly broken for cpio anyway, there shouldn't be much of a compatibility issue. Andy Valencia vandys@sequent.com