[comp.bugs.4bsd] SVR2/SunOS 3.5 cpio

rcodi@yabbie.rmit.oz (Ian Donaldson) (08/02/88)

Description:
	cpio sometimes silently refuses to link destination files back
	together.   This happens with the -p and -i options, and the
	-o option can generate an incorrect archive.

	At the end of this report is a list of other problems with cpio,
	some major, some minor.

Versions:
	The cpio supplied with SunOS 3.5 exhibits the fault.  The cpio
	supplied with the SVR2 Vax distribution also exhibits the fault.

Repeat by:
	On BSD fast-filesystems, this is easy; go to a filesystem that has more
	than 32768 inodes (practically any large one), and do this:

		rm -fr a b c
		cat </dev/null > a
		ln a b
		mkdir c
		ls a b | cpio -pdmv c

		ls -lg c

	This -may- fail if "a" has an inode number > 32767.
	What will happen is in the directory "c" you end up with two different
	files "a" and "b" that are not linked together, but have the
	same contents.  cpio also doesn't indicate that it had linked them.
	(because it didn't)

	On a System-V filesystem, you need to find a filesystem that has
	more than 32767 inodes currently allocated (not as common), and
	do the same thing.

	This fails more on BSD because of the way the inodes are allocated;
	higher i-numbers are commonplace.  On the System-V filesystem,
	inodes are allocated at the lower end of the scale.

Fix:
	Not trivial.  Best fix is to "rm /usr/bin/cpio" :-)

	A partial fix that will work with high probability is to change
	the type of "m_ino" from "short" to "ino_t" in struct "ml", routine
	postml().  This allows inodes upto number 65535 to be handled 100%,
	and above that you will get unpredictable failures, when two
	multiply-linked inodes that have the lower-16 bits of their 
	inode numbers the same end up being linked when they shouldn't be.

	(I strongly suspect that SVR3 cpio has the above partial fix)

	A proper fix is more difficult because the cpio(5) format only
	has a 16-bit field for the inode-number, whereas BSD systems
	(at least) have 32-bit inode numbers.

	My workaround is to use the cpio(5) inode-field only when a file has
	multiple links (ie: st_nlink > 1), and generate a unique number
	bearing no relationship to the original inode number.  If the
	file has a single link or is a directory, the cpio(5) inode
	number field would be zero.

	Since cpio only uses the inode-number field when nlinks > 1, this
	should not pose a problem, and gives a maximum of 65534 multiply-linked
	files per archive, a limit that probably won't be hit quickly.
	(if there is a cpio that uses the inode-number field all the time,
	then this would be broken, but I cannot see why it should do so)

	(I strongly suspect that SVR3 cpio doesn't have this fix, since
	the inode numbers in the archive seem to match the originals
	on singly-linked files)

	The SVR2 cpio source is a complete disaster, and should be totally
	rewritten from scratch (if it hasn't already been done in SVR3).
	There are more bugs in it than you can poke a stick at!

Other-bugs:
	Among other things wrong with -this- cpio are:
	(I'm not necessarily speaking of the SunOS cpio here, but I know some
	of these bugs do apply to it)

		- disk errors could easily result in an archive being corrupted
		  because the file-size is written in the header based
		  on what stat(3) says, but the copy routine gives up copying
		  the file when it comes across a bad read from disk;
		  and doesn't pad the file out on the archive to its
		  stated size.  This means that the header for the file
		  is incorrect, and upon reading will get cpio very
		  confused.

		- somebody truncating a file that is being dumped (eg: via
		  creat() or truncate()) could result in a corrupt archive, 
		  making cpio absolutely useless for dumping live filesystems
		  (I suspect this might have been fixed in SVR3)

		- cpio will restore the modification time of a destination
		  file during '-p' or '-i' commands, even if the file was
		  not successfully restored.  This makes it damn difficult
		  to redo the restore/copy and find out which files weren't
		  restored (eg: filesystem full, you end up with lotsa
		  inodes of zero length with the original mod-dates!).
		  
		- having errno in an error message is absurd
		  (this seems to have been eradicated in SVR3, and in 
		  SunOS 3.5)

		- many arrays can be over indexed because of lack of
		  bounds checking (eg: you supply more than 100 patterns
		  on the command line, or a file name that exceeds 256
		  characters).  Both of these situations could cause
		  a coredump.

		- a limitation of the number of reels of tape is imposed due
		  to a bug in that /dev/tty is reopened but not closed each 
		  time input is requested when changing reels.  On many 
		  systems, this limit is around 16, if NOFILE=20.

		- there are calls to utime(2) using the stat structure
		  directly, which is incorrect on many (eg: BSD) systems,
		  resulting in the inode mtime being reset to the epoch
		  if the -a option is used.  This is because the stat structure
		  is different.  
		  
		  This bug isn't present in SunOS 3.5 cpio.

		- attempting to dump files with inode numbers > 32767 or
		  NFS inodes will result in a corrupt archive if using -c, 
		  due to sign extension in bintochar() causing the octal
		  numbers to be written with wider width.
		  (NFS inodes have dev being negative; i-numbers > 32767
		  are negative).  

		  This bug isn't present in SunOS cpio.

		- writing an archive with files owned by "nobody" (uid = -2)
		  will create a corrupt archive when using -c.  

		  This bug IS present in SunOS 3.5 cpio.

		- opening of /dev/tty isn't even checked for success, resulting
		  in a probable coredump if you run it from cron and multi-reels
		  are required.  (SVR3 cpio has options to specify a 
		  substitute for /dev/tty anyway, which would allow it to
		  run from cron ok, and I strongly suspect that this
		  has been fixed in SunOS 3.5, by the looks of the error
		  messages in the binary)

		- trying to restore files from an archive that are within
		  read-only directories in the archive are impossible unless 
		  you don't restore the directories.  This is because the 
		  mode of the directory is set before the files are extracted.

		  This bug IS present in SunOS 3.5 cpio.
		  (could be tricky to fix, unless you keep a list of 
		  directories and set all the modes for them at the end
		  of the run; or chmod the directories if you need
		  to write into them.  Tar typically has this problem too,
		  when you use its 'p' option)

All in all, cpio is a total mess and should be totally rewritten or replaced.
Beware if you use this as your only backup mechanism.

I would be interested in anybody with the "latest" cpio (SVR3?) could
peek at the source and comment on which of the above bugs remain.
(I don't have access to SVR3 sources)

Ian D

les@chinet.chi.il.us (Leslie Mikesell) (08/05/88)

In article <824@yabbie.rmit.oz> rcodi@yabbie.rmit.oz (Ian Donaldson) writes:
>	At the end of this report is a list of other problems with cpio,
>	some major, some minor.

Thanks for the warning.
Have you looked at the "afio" program that was posted to the net a while
back?   Perhaps it would be more reliable, or at least easier to fix for
those of us without source to cpio. 

 Les Mikesell