[comp.sys.hp] Do NOT use ftio or cpio for backups

wayne@dsndata.uucp (Wayne Schlitt) (06/08/91)

 
well, it's a long story, but as a result of trying to install hp-ux
7.05, i had to re-initialize my main disk and restore everything from
backups.  that's when i found out that for at least the last six
months, all of the backups we have been doing were corrupted!


DO NOT, i repeat DO NOT use ftio or cpio for backups.  there are
SERIOUS problems with them.  for those who do not know, the
/etc/backup script uses cpio.  remove /etc/backup off your disk.  it
is an accident waiting to happen.


basically, there are three problems with ftio, and two of the three
problems also happens with cpio because ftio and cpio compatible
files.


problem #1:

    the first problem with ftio is that is uses shared memory and it
    allocates the shared memory segment at the default location.  this
    default location only leaves 64k to be used by malloc.  ftio needs
    to malloc memory to keep track of the links on the files.  64k
    doesnt go very far.  when the memory runs out, ftio stops linking
    files.  this means that you get duplicate files and that can cause
    your disk space usage to go way up.  (in our case, it filled up
    the disc.)

    you can change a kernel parameter (shmbrk) to
    "fix" this problem, but i dont know what formula to use to
    calculate what this value needs to be and if you set it too large,
    then programs that allocate large amounts of shared memory will
    start to fail because you are reserving too much memory for
    malloc.  *sigh*.

    we are currently using a shmbrk value of 1024 (4MB for malloc),
    but i dont know if this will restore our entire system since i
    didnt try restoring /usr/spool/news.  

    this problem also happens with cpio.  cpio documents the problem
    in the bugs section of the man page, ftio makes no such warning.
    i (wrongly) assume that this meant that ftio didnt have this
    problem.  unfortunately, i dont know of any way to get around this
    problem when using cpio.


problem #2:

    the second problem is much more serious and it effects both cpio
    and ftio. the problem is that the inode numbers that are written
    to tape are limited to 64k.  under the old ATT file system, you
    could not have more than 64k inodes on the disk.  using the
    berkeley fast file system (like hp and sun have used for years) you
    can have more than 64k inodes.  this means that if you have a file
    at inode 65556 (64k+20), cpio cant tell the difference between
    that file and the file at inode 20.  since the size if the inode
    field is defined to be this way for compatibility reasons, i will
    probably not see any change in cpio.
    
    so, if you have a file with inode 65556 (64k+20) that is linked to
    another file, when you try to restore that file, it will actually
    link it to the file at inode 20.  this means that when you try to
    restore your system, you could end up with, say, an article out
    talk.bizarre linked to /dev/kmem.  in fact, when we restored our
    disk, we had things that were just as bad, if not worse than this.
    
    
    ****  THE RESULT  ****

    IF you have usenet news, OR you have a hp cluster (lots of links
    in the /dev/pty directories) OR you have a large hard disk, OR use
    use ACL's you could very easily run into this problem.  to the
    best of my knowledge, there is no way around the problem because
    the information is lost when you do the backup.  you are screwed.


problem #3:

    this one i didnt personally run into, but i guess is a fundamental
    problem with cpio, ftio, tar, and fbackup.  only dump and dd from
    the raw device do not have this problem.

    basically, when cpio et al read a file to back it up, it cause the
    system to change the "access time" of the file.  if the access
    time isnt reset, then you cant do incremental backups 'cause time
    stamp info has been lost.  so, cpio et al will reset the access
    time via the utime(2) system call.  but this still leaves the
    "inode changed time" set.  so, cpio et al cant depend on the inode
    changed time to tell them if the file needs to be backed up.
    other commands, such as chmod, chown and such will change the
    inode change time, but not the access time.

    this means, that if the only thing you have done is changed
    permissions or ownership of a file since the last backup, then the
    incremental backup will not notice and the new file stats will not
    be backed up.

    this is only a problem if you are doing incremental backups.


in the HP manuals, they list 4 different ways of doing backups.  the
first two are cpio and ftio.  i cant believe that with such
fundamental problems with cpio/ftio that they would even suggest using
them.  the other two methods that they suggest are dd and fbackup.  dd
doesnt work if the media that you are backing up to is smaller than
the backup media, it doesnt backup a live file system very well, and
it is hard to restore from.  fbackup is non-standard, and still has
one of the three problems.

it kind of looks like the one command that hp doesnt mention is dump,
and it is the only way to get really reliable backups.  (actually, they
dont mention tar either, but from what i understand, tar has it's own
set of major problems if you try to use it for backups.)



we had our computer down for 2 1/2 days during the week, and it took
me and another sysadmin the better part of a week to finish cleaning
up from this mess.  i really expected to be able to format the disk
and restore from backups and have a working system in a few hours.
instead me and the other sysadmin worked, literally, night and day for
over 2 days just to get things working again.

we are vab, and the thought of this kind of thing happening to one of
our customers really scares me.  i couldnt imaging trying to walk
someone through all of this over the phone.  we would probably have
ended up flying out the the customer site or something.

the only thing i can think of that is more important than good
backups, is being able to install the system in the first place.  does
hp really expect people with less than 5 years of unix experience and
good software development skills to be able to do backups?  how were
we supposed to know that the backups we were doing were corrupt?
scratch the disk and do a restore, see if it works, if it doesnt, go
off and find another way?  how many weeks worth of work does hp expect
every customer to do in order to find a workable backup system?  why
do they even mention cpio and ftio in the manuals?


in case you cant tell, i am more than a little upset about all this.
because of the backup being bad, we werent able to get some important
things done for a trade show.  working 18 hours a day for 3 days didnt
do much for my disposition.  



anyway, i guess fbackup, dd or dump are then _only_ commands you
should be using for backups.  


-wayne

tm@othello.altair.fr (Timothy Mullins) (06/10/91)

>anyway, i guess fbackup, dd or dump are then _only_ commands you
>should be using for backups.

That is very bad news. I cannot use dump either because it cannot handle
Context Dependent Files.
Do'nt we just love HP.


Tim Mullins

kgj@hpcndxyz.CND.HP.COM (~Karl Jensen) (06/10/91)

>   this means, that if the only thing you have done is changed
>   permissions or ownership of a file since the last backup, then the
>   incremental backup will not notice and the new file stats will not
>   be backed up.

You could alleviate this problem by modifying the backup script to
record the list of files on the backup disk along with their modes and
ownership.  This is also useful to detect files which have been deleted
after the base backup and before the incremental so that you don't
restore deleted files.

rsh@hpfcdc.HP.COM (Scott Holbrook) (06/11/91)

> problem #2:
> 
>   the second problem is much more serious and it effects both cpio
>   and ftio. the problem is that the inode numbers that are written
>   to tape are limited to 64k.  under the old ATT file system, you
>   could not have more than 64k inodes on the disk.  using the
>   berkeley fast file system (like hp and sun have used for years) you
>   can have more than 64k inodes.  this means that if you have a file
>   at inode 65556 (64k+20), cpio cant tell the difference between
>   that file and the file at inode 20.  since the size if the inode
>   field is defined to be this way for compatibility reasons, i will
>   probably not see any change in cpio.

This problem has been fixed in cpio (6.5 and later releases on the
s300, 7.0 and later releases on the s800).  The problem is handled in
a way that does not change the format, but still preserves all link
information.

>   so, if you have a file with inode 65556 (64k+20) that is linked to
>   another file, when you try to restore that file, it will actually
>   link it to the file at inode 20.  this means that when you try to
>   restore your system, you could end up with, say, an article out
>   talk.bizarre linked to /dev/kmem.  in fact, when we restored our
>   disk, we had things that were just as bad, if not worse than this.

This is incorrect.  Ftio still does (and cpio did before the 6.5
release) detect inode numbers that are > 65535 and maps them to 65535.
Upon restore, both cpio and ftio detect a file with inode 65535 and
restore the file as a plain file, losing the link information.  It
does not, however link the file to the "wrong" file.  The only bad
behavior is that you end up with multiple copies of your data.  While
this isn't great, at least you aren't losing your data.

I don't know why ftio hasn't been fixed, but cpio does work properly
and will not lose any information.  I will submit a defect report
against ftio so that it can get fixed for a futrue release.

Scott Holbrook
HP-UX kernel

The opinions expressed here are mine and mine only, they do not
represent an official or un-official statement of the Hewlett-Packard
Company.

gabby@gabs.lbl.gov (Gabby Obegi) (06/11/91)

In article <2268@seti.inria.fr> tm@othello.altair.fr (Timothy Mullins) writes:
>>anyway, i guess fbackup, dd or dump are then _only_ commands you
>>should be using for backups.
>
>That is very bad news. I cannot use dump either because it cannot handle
>Context Dependent Files.
>Do'nt we just love HP.

We've been using dump for over a year now and have had great luck with it.  As
a matter of fact, six month ago we lost over 500Mb of data when the inode for /
got corrupt.  Even though this was the root disk with CDF's on it, we were able
to recover the disk entirely!

--
-Gab

franks@hpuamsa.neth.hp.com (Frank Slootweg CRC) (06/11/91)

Just some quick notes :

problem #1: memory usage by ftio and cpio

  I don't think that cpio uses shared memory. It uses normal (i.e.
virtual) memory. As long as you have enough virtual memory there is no
problem.

problem #2: multiply linked files with inodes >64k

  For ftio : I *think* you are right (SR# 1650147959). Can investigate if
needed.

  For cpio : This was fixed with some clever algorithms in 6.5 (300) and
7.0 (800). The tape format is still compatible.
  Are you sure you have a new and stock (and HP :-)) cpio? At some
release (can find out number if needed) there was some problem with
getting the wrong cpio. If I remember correctly a "what" on the wrong
cpio said something like "update version". Customers were informed at
the time about this wrong version and how to get a good one.

problem #3: can not (re)set the change-of-inode time

  You are right. This is a fundamental one. Any backup utility can only
be as good as the OS on which it runs. In this respect UNIX (not just
HP-UX) is just not good.

> in the HP manuals, they list 4 different ways of doing backups.  the
> first two are cpio and ftio.  i cant believe that with such
> fundamental problems with cpio/ftio that they would even suggest using
> them.  the other two methods that they suggest are dd and fbackup.

  Can you please be specific (i.e. which manual, edition and page)? The
7.0 HP-UX System Administration Tasks manual for the Series 300 (you
said 7.05, so I assume 300/400) recommends (on page 11-5) to use
"fbackup".

  Hope this helps somewhat. If you want me to investigate then mail me
and I will see what I can do.

Frank Slootweg, HP, Dutch Customer Response Center.

lanzo@wgate.UUCP (Mark Lanzo) (06/12/91)

In a prior article franks@hpuamsa.neth.hp.com (Frank Slootweg CRC) wrote:
  > Just some quick notes :
  > 
  > problem #2: multiply linked files with inodes >64k
  > 
  >   For cpio : This was fixed with some clever algorithms in 6.5 (300) and
  > 7.0 (800). The tape format is still compatible.

How is this done?  I did "man 4 cpio".  It clearly shows that the header
structure stores the inode as a "ushort" value.  No other field in the
header structure seems to represent the upper word of the inode value.
Therefore either (1) the inode number must still be only 16 bits;
or (2) the header has been changed & the man pages are now incorrect.

If (1) is true, then the only choices I can see are that:
   (a) inodes are mod 64K, just like the original article suggested (with
       the attendant problems of bogus links being made), or
   (b) there's some magic value used for the inode which never gets
       overlaid and link information is discarded after a certain point.

Choice (b) is at least a little better than (a) in that it means that you
can at least restore the files without corrupting something else, but it 
implies that links won't be made in those cases where they *should* be,
so you'll end up with duplicate copies of things.

Neither of these qualifies as an acceptable solution.
Is there some other option which I've overlooked?
I for one am not going to rest comfortably until I get some positive info
from HP sounding the "all's clear" signal!

Are there any other PD/Freeware type backup utilities out there that I
should consider (perhaps the GNU folks have something?) ?

			-- Mark --

P.S.:  The cpio/ftio/etc stuff has been reported to the HP response
       center (probably multiple times!).  One ref number is A1784569.

franks@hpuamsa.neth.hp.com (Frank Slootweg CRC) (06/14/91)

Mark Lanzo writes :

>   >   For cpio : This was fixed with some clever algorithms in 6.5 (300) and
>   > 7.0 (800). The tape format is still compatible.
> 
> How is this done?  I did "man 4 cpio" ...

  Since I do not have the algorithms and formats handy (is is "ages" ago
that this was done) *and* I do not know if I can make this public, I
leave this one to others to answer.

> I for one am not going to rest comfortably until I get some positive info
> from HP sounding the "all's clear" signal!

  You are of course entitled to your position, but I think (for cpio)
all *is* clear. The basenote poster said something is seriously wrong. I
and Scott Holbrook justed wanted to put people at ease. HP has fixed a
*design limitation* (not a *bug*) in cpio without compromising
compatibility. If we deserve anything for that, I think we more deserve
a compliment than anything else.

Frank Slootweg, HP, Dutch CRC.

franks@hpuamsa.neth.hp.com (Frank Slootweg CRC) (06/14/91)

Mark Lanzo wrote :

> Is there some other option which I've overlooked?

and I wrote :

>   Since I do not have the algorithms and formats handy (is is "ages" ago
> that this was done) *and* I do not know if I can make this public, I
> leave this one to others to answer.

  I can no longer control myself. :-) Without giving away any (possible)
HP private information, I can probably safely say the following :

- The *absolute* value of the inode number on tape has no meaning for
  the target system. I.e. when restoring/overwriting a file, you do not
  create/open an *inode*, you create/open a *file* (which just happens
  to have an inode).
- The inode number on the source (cpio -o) system *can* be the same as on
  the target (cpio -i), but this will normally not be the case and it is
  surely not a requirement.
- The inode number on tape is only *needed* for files with more than one
  link.

  Now you go figure how we did it! :-)

  Hint: You wrote :

> How is this done?  I did "man 4 cpio". 

  Sure you did, but did you *read* it? :-)

Frank Slootweg, HP, Dutch CRC.

vandys@sequent.com (Andrew Valencia) (06/15/91)

franks@hpuamsa.neth.hp.com (Frank Slootweg CRC) writes:

>  Since I do not have the algorithms and formats handy (is is "ages" ago
>that this was done) *and* I do not know if I can make this public, I
>leave this one to others to answer.

I don't remember how HP fixed it.  I seem to remember that when it
bit me I came up with the idea that when you emitted an entry
<h_ino, h_nlink>, after you had written the h_nlink guys for this
h_ino, the inode was now unused for the rest of the cpio dump.
Thus you could map one of your > 64K inumbers down into it and
reuse it that way.  Since inumbers > 64K are utterly broken for
cpio anyway, there shouldn't be much of a compatibility issue.

					Andy Valencia
					vandys@sequent.com