[comp.unix.questions] Cpio and long devices and inodes

dce@mips.UUCP (David Elliott) (01/01/70)

>In article <596@quacky.UUCP> dce@quacky.UUCP (David Elliott) writes:
>-Now, anyone want to hear how you can change cpio to handle long
>-device numbers and/or long inode numbers without losing data or
>-even changing the magic number?

OK. Here goes. I'll discuss the general case, in which both inode and
device numbers are handled, instead of just one or the other (UTek cpio
just handles long device numbers).

This idea plays on two properties of cpio:

	1. The code has to go ahead and read the data associated with
	   all headers, including special files. The System V and BRL
	   versions of cpio both do this, and it is probably wrong
	   to ignore this field in any case.

	2. Device numbers are used for linking and creating special
	   devices (character, block, and FIFO) only, so it is
	   not required that the device number pair for two files
	   in the same directory to neccessarily be the same.

What happens is that each device/inode pair is mapped into a
unique "long" integer, which is stored as 2 shorts in the cpio
device and inode number fields. Later, I'll give a neat method
of mapping this stuff.

Additionally, special devices are given a special "magic cookie"
hash value (I used 32767 in UTek, though any value will do),
a data section big enough to contain the real device and
inode numbers (I'd use a fixed field of 12-digits each), and the
h_filesize value adjusted appropriately. It's easiest to always
make the data section ASCII instead of worrying about matching it
with the header type.

Extraction of special devices with the known "magic cookie" is done
by reading the data section to obtain the "real" device and inode
numbers. In the UTek version, I went ahead and made the header-
reading routine read the data and adjust the h_filesize structure back
to 0, but any method for reading ahead will do.

When I did this for UTek, I used a naive system that kept a table of
all of the device number pairs, and this took a lot of space. After
thinking about it again, I realized that there is a special property
of the mapping: if the file has a link count of 1, it will never be
seen again. Thus, the algorithm is:

	if (link count is 1) {
		increment "unique" counter
		return "unique" counter
	}
	if (device/inode is in table) {
		return the value for that entry
	} else {
		increment "unique" counter
		store device/inode in the table
		return "unique" counter
	}

In practice, it looks like you can start the unique counter at 1 and
change the first statement to

	if (link count is 1) {
		return 0
	}

This works because cpio doesn't worry about making links if the file
doesn't have more than 1. Thus, it will never know that a whole
bunch of files "look" like they are linked. On the other hand, if
someone were to change cpio to handle things a little differently,
this could easily be broken. What do people think about this?

-- 
David Elliott		{decvax,ucbvax,ihnp4}!decwrl!mips!dce