[comp.sys.hp] find and cpio

pearmana@prlhp1.prl.philips.co.uk (Andy Pearman) (09/11/90)

I've been thinking about how our dumping works on our various
Unix based machines and I'm now a bit worried about our HP systems.

On our Suns we use dump/restore which works on inodes.  This means
that a file with several hard links will only be dumped once - no
problem then when restoring.

On HP-UX (6.21) we use  their backup script which effectively does:

  cd /
  find . -hidden -print | cpio -ocxa | tcio ......

Am I right in thinking that find works by looking in directories
and -print's everything it finds, which of course means that
several directory entires hard-linked to the same file will be
picked up individually and passed to cpio for dumping ?

When performing a restore using cpio I assume that each file read
is allocated a fresh inode and therefore what may have been
one file hard-linked several times will be restored as several
individual files (taking up more disk-space than originally used).

I would be grateful if someone could clear this up for me.

   Andy
-- 

Andy Pearman, Computer Dept, Philips Research Labs, Redhill, Surrey, England. 
              pearmana@prl.philips.co.uk

stroyan@hpfcso.HP.COM (Mike Stroyan) (09/13/90)

> When performing a restore using cpio I assume that each file read
> is allocated a fresh inode and therefore what may have been
> one file hard-linked several times will be restored as several
> individual files (taking up more disk-space than originally used).
> 
> I would be grateful if someone could clear this up for me.

Actually, cpio makes a list of inodes and links those files
with duplicated inodes when performing a restore.

Mike Stroyan, mike_stroyan@fc.hp.com

fritz@hpfcbig.SDE.HP.COM (Gary Fritz) (09/13/90)

> Am I right in thinking that find works by looking in directories
> and -print's everything it finds, which of course means that
> several directory entires hard-linked to the same file will be
> picked up individually and passed to cpio for dumping ?

Yes, and this means linked files will be written to the tape multiple times.  
(Try creating a test directory containing two files linked together, then
do a "find . -print | cpio -ocxa > /tmp/test" and examine /tmp/test.  You'll
see the file is written out twice.)

However, when restoring the cpio archive, cpio will look for linked files
and create links appropriately.  (Try "cpio -icv < /tmp/test".)  It does this 
by maintaining a list of linked files, however, and can therefore run out 
of space.  The cpio(1) man page sez:

				... If there are too many unique
	linked files, the program runs out of memory to keep track
	of them, and thereafter, linking information is lost.

If you prefer to use dump/restore, you can.  At least, it's present
on my 7.03 HP-UX system.  I'm not certain it was on 6.21.

Gary

Not an official statement of HP, etc. etc.  Just trying to be helpful.

rer@hpfcdc.HP.COM (Rob Robason) (09/13/90)

cpio does get the names of the various paths to the links as you
suggest, but is smart enough to detect the linkage and record it on the
archive.  When the archive is read, the links are reestablished.

I think the file does actually get archived for each link, even though
the link is recognized.  This is so individual files can be retrieved.

Rob

djw@hpldsla.sid.hp.com (David Williams) (09/14/90)

I hope this makes sense, I've  re-written  it a couple of times,
but there are just too many "words"....

> On HP-UX (6.21) we use  their backup script which effectively does:
> 
>   cd /
>   find . -hidden -print | cpio -ocxa | tcio ......
> 
> Am I right in thinking that find works by looking in directories
> and -print's everything it finds, which of course means that
> several directory entires hard-linked to the same file will be
> picked up individually and passed to cpio for dumping ?

That's  pretty  much the way that it works.  Note  though,  that
cpio saves the file inode  number (and the device  number of the
file's  file  system),  and the number of links in the  archive.
This leads to...

> When performing a restore using cpio I assume that each file read
> is allocated a fresh inode and therefore what may have been
> one file hard-linked several times will be restored as several
> individual files (taking up more disk-space than originally used).

Not quite.  When doing the restore,  cpio(1) tracks files in the
archive that have a link count  greater than one.  Cpio -i saves
the  pathname  of  the  first  file   loaded,   along  with  the
inode/device  number  in the  archive.  If  another  file in the
archive  has the same  inode/device  number it is linked to this
first file.  Simple hay?  So only the first 'file' in an archive
allocates a new inode on the target file  system, all the others
are linked to it - as desired.

Note, back on the dump (cpio -o) side of things, the N number of
file links are each  archived as a complete  file.  I guess this
means  tapes get  filled  up more than  needed,  but it means on
restore  you don't have to start  with the first tape to get the
"real"  file.  This is a  different  strategy  than used by some
other  tools  which just use  pointers  for all links  after the
first -  resulting  in a "go find tape number N if you want file
'blah'" type of message sometimes.

Ftio(1) use a similar strategy to cpio for storage of the links.
Tar(1) (and I think  fbackup(1))  go the  pointer  strategy  for
saving the non-zero'th link.

Hope that helps,

David Williams
___________________________________________________________________
Hewlett-Packard Scientific Instruments Division (SID) /\___________
1601 California Ave, Palo Alto, CA, USA. /\______________/\________
phone: 415 857 6100. FAX: 415 852 8011  //\\____________|__________
HP-UX Mail:  djw@hpldsla.hp.com         /  \____/\____/\___________
HPdesk:     (djw)hpldsla/HP1900/00     /\____________/  \__________

<usual disclaimer>

seligman@CS.Stanford.EDU (Scott Seligman) (09/15/90)

In article <1150@prlhp1.prl.philips.co.uk> pearmana@prlhp1.prl.philips.co.uk (Andy Pearman) writes:
> 
> On HP-UX (6.21) we use  their backup script which effectively does:
> 
>   cd /
>   find . -hidden -print | cpio -ocxa | tcio ......

Could someone please explain why HP chooses to use this incantation?
It appears to make a list of all file names, and then individually
seek each and every one (!) before writing them to the backup medium.

Why not use the more direct dump(1M) and restore(1M)?

Scott Seligman

Internet:  seligman@cs.stanford.edu
UUCP:      ...{apple,decwrl,ucbvax}!cs.stanford.edu!seligman

jad@hpcndnm.hp-sdd (John A Dilley) (09/18/90)

In article <1990Sep15.061919.3988@Neon.Stanford.EDU> seligman@CS.Stanford.EDU (Scott Seligman) writes:

   Could someone please explain why HP chooses to use this incantation?
   It appears to make a list of all file names, and then individually
   seek each and every one (!) before writing them to the backup medium.

   Why not use the more direct dump(1M) and restore(1M)?

	In HP-UX 6.0/1.1 we support dump/restore(1M).  In 7.0 we also
support rdump/rrestore(1M), so you should be able to choose your
favorite dump method and go for it.  I believe the 7.0 rdump/rrestore
can dump from an HP system to a BSD-based DEC VAX or Sun system running
(able to run) /etc/rmt (and vise-versa).

                          --      jad      --
			      John DILLEY
			    Hewlett-Packard
                       Colorado Networks Division
UX-mail:      		     jad@cnd.hp.com
Phone:                       (303) 229-2787
--
This is not an official statement of Hewlett-Packard Corp., and does not 
necessarily reflect the views of HP.  The information above is provided
completely without warranty of any kind.


                          --      jad      --