[comp.unix.wizards] Holey files, Batman!

mbr@aoa.UUCP (Mark Rosenthal) (05/21/87)

In article <6899@alice.UUCP> ark@alice.UUCP writes:
>In article <7421@brl-adm.ARPA>, evins@nrl-radar.arpa writes:
>> 	Does anyone know of a version of rm that allows you to scrub blocks
>> clean before making them available, for BSD VAXes and Suns.
>
>Before you unlink the file, open it for output and write a block
>of zeroes over every block of the file.  Then sync() and unlink it.

How do you deal with "holey" files?  The method you describe could chew up
humongous amounts of disk space.  You could read through the file and only write
over locations which are already non-zero, but this could run into significant
processing time if the holes are large.

In general, how can a utility tell where the "real" bytes in a file are, as
opposed to the empty spots which read back as zero?  Is there any way other
than opening the raw disk and following through starting at the block pointers
in the inode?  Ya gotta run as root to get at the raw disk anyway (unless your
system administrator is a real turkey).
-- 
	Mark of the Valley of Roses
	...!{harvard,ima}!bbn!aoa!mbr
	...!{wjh12,mit-vax}!biomed!aoa!mbr

mbr@aoa.UUCP (Mark Rosenthal) (03/17/88)

How do you copy holey files?

You know, the kind of file which would be created by the following code.

    #include <sys/file.h>

    main(argc, argv)
    int argc;
    char **argv;
    {
	static long zeroes[] = { 0, 0, 0, 0 };
	int fd;

	fd = creat("holey", 0666);
	lseek(fd, 123456L, L_SET);
	write(fd, zeroes, sizeof(zeroes));
    }

As far as I can tell, when reading such a file, there is no way to determine
whether bytes whose value is 0 are really stored on disk, or are in the middle
of a hole in the file.  Therefore, when copying such a file, the holes get
converted to actual blocks which take up space on the disk, so the file expands.

I can think of two possible ways to copy such a file without expansion, neither
of which is entirely satisfactory:

    1. Read through the bytes sequentially, but don't output any whose value
	is 0.  Use lseek() before writing the next non-zero byte.  To guarantee
	that the created file has the same extent, always write the last byte.
	Disadvantage: When copying a file with huge holes and little data,
	this would be very slow.

    2. Go outside the filesystem.  Read the raw device for the partition,
	and interpret through the structures which represent the filesystem
	to determine where the holes are.  Disadvantages: 1. system dependent -
	needs to be rewritten for each different implementation of the
	filesystem; 2. a non-privileged user may well not have read access
	to the raw device.

Is there any direct method which will allow me to determine where the holes
in a file are?  Does anybody know of any other tricks besides the two I have
described above?  What, if any, are the differences among the following
utilities in terms of how they handle holey files: cp, tar, cpio, restor (V7
and successors), restore (4.2bsd and successors)?
-- 
	Mark of the Valley of Roses
	...!{harvard,ima}!bbn!aoa!mbr

davidsen@steinmetz.steinmetz.UUCP (William E. Davidsen Jr) (03/17/88)

  Your original idea will work for copying sparse files from disk to
disk.  The problem comes in copying them to a backup device and
reloading them.  I know of no program to do this, although it would be
easy to have the reload program write a sparse file if blocks of zeros
were encountered. 

Is there a problem with this? That is, is there a way to tell a sparse
file from a file with a lot of zeros?

-- 
	bill davidsen		(wedu@ge-crd.arpa)
  {uunet | philabs | seismo}!steinmetz!crdos1!davidsen
"Stupidity, like virtue, is its own reward" -me

donn@utah-cs.UUCP (Donn Seeley) (03/18/88)

Mark Rosenthal (mbr@aoa.UUCP) wants to know how to copy files with
holes in them.  He makes one suggestion that he doesn't think is
'entirely satisfactory':

	Read through the bytes sequentially, but don't output any whose
	value is 0.  Use lseek() before writing the next non-zero
	byte.  To guarantee that the created file has the same extent,
	always write the last byte.  Disadvantage: When copying a file
	with huge holes and little data, this would be very slow.

There are faster ways to do this if you don't need to be general about
making holes, for example if you only deal with sequential copying of
files.  We produced a modified 'cp' here at Utah that detects aligned
blocks of zeroes in input and performs seeks instead of writes on
output.  The detection of blocks of zeroes can be very cheap -- we find
that it's faster to copy a file with zeroes in it using our modified
'cp' than to copy it using the standard 'cp'.  Of course it helps that
most files have very predictable patterns of 'zero' use: null bytes
typically appear either in large blocks (produced by unexec(), for
example) or among other data that has roughly random bits (such as VAX
instructions).  The effect of this is that testing a block of zeroes is
inexpensive since the cost is made up by the subsequent seek, while
testing a block of random data that contains nulls is inexpensive
because the test almost always fails very quickly.  It's not hard to
imagine a pathological file that defeats this algorithm (8191 nulls
followed by one non-null byte, repeated many times, will take care of
our 4.3 VAXen with 8k filesystems) but we don't see such files in
practice.

As for Mark Rosenthal's list of utilities, I believe that only
restor/restore will replace holes in files (because dump actually puts
block number information on a tape).

Donn Seeley    University of Utah CS Dept    donn@cs.utah.edu
40 46' 6"N 111 50' 34"W    (801) 581-5668    utah-cs!donn

PS -- Here's the code...  To build a cp that creates holes, compile
cp.c with -Dwrite=zwrite and -Dclose=zclose and load with an object
made from the following file.  This code is specifically written for
4.2 or 4.3 BSD on the VAX, but the algorithm should be easy to port.

------------------------------------------------------------------------
#ifndef lint
static char RCSid[] = "$Header: zwrite.c,v 1.4 85/07/01 03:53:18 lepreau Exp $";
#endif
/*
 * zwrite(): like write(2), but creates holes in files if possible.
 * If a write() fails zwrite returns -1 instead of the number of
 * bytes written-- what do you think about this?
 *
 * The current code is only optimal if zwrite's are done in lengths
 * evenly dividing the filesystem blksize-- must be fixed.
 * 
 * zclose() should be called instead of close() if fd's are to be re-used
 * or if you might have zeroes at the end of the file.  And you should
 * check that zclose returns 0, also!
 *
 * Current `vax' code assumes st_blksize (filesys blksize) is <= 64K;
 * non-vax code not yet tested.
 *
 * Donn Seeley and Jay Lepreau, Univ of Utah, April 1985.
 */

#include <sys/types.h>
#include <sys/stat.h>
#include <sys/file.h>

#define min(a,b)	((a) < (b) ? (a) : (b))

struct dtab {
	int blksize;
	off_t offset;
	char flags;
};
#define DF_LASTSEEK	0x2

static struct dtab (*bp)[];
static int dtabsize;

int
zwrite(fd, buf, len)
	int fd;
	register char *buf;		/* known to be r11 below */
	int len;
{
	register int count;		/* known to be r10 below */
	register int notzero = 0;	/* known to be r9 below */
	int savelen = len;
	int obsize;
	struct stat statb;

	if (dtabsize == 0) {
		register int i;
		extern char *malloc();

		dtabsize = getdtablesize();
		bp = (struct dtab (*)[]) malloc((unsigned) (dtabsize * sizeof(struct dtab)));
		if (bp == 0)
			return write(fd, buf, len);
		for (i = 0; i < dtabsize; i++)
			(*bp)[i].blksize = -1;
	}
	/* get output blocksize if we don't have it already */
	if ((*bp)[fd].blksize < 0) {
		if (fstat(fd, &statb) < 0 ||
		   (statb.st_mode & S_IFMT) == S_IFSOCK ||
		   (statb.st_mode & S_IFMT) == 0)	/* avoid kernel bug */
			(*bp)[fd].blksize = 0;	/* remember we are screwed */
		else
			(*bp)[fd].blksize = statb.st_blksize;
	}
	if ((obsize = (*bp)[fd].blksize) == 0)
		return write(fd, buf, len);

	while (len > 0) {
		count = min(len, obsize);

		/* Are there any non-zero chars in this block? */
#ifdef vax
		;asm("	skpc	$0,r10,(r11)")
		;asm("	beql	1f")		/* all zeroes */
		;asm("	incl	r9")		/* notzero++ */
		;asm("1:")
#else
		{
		register char *p;
		char *endbuf;
		
		p = buf;
		endbuf = buf + count;
		while (p < endbuf)
			if (*p++) {
				notzero++;
				break;
			}
		}
#endif
		if (notzero) {
			(*bp)[fd].flags &= ~DF_LASTSEEK;
			if (write(fd, buf, count) != count)
				return -1;
		}
		else {
			(*bp)[fd].flags |= DF_LASTSEEK;
			if (lseek(fd, (off_t) count, L_INCR) < 0)
				return -1;
		}
		len -= count;
		buf += count;
	}
	return savelen;
}

zclose(fd)
{
	int rc = 0;

	if (bp) {
		/* Can't just seek at end and expect to get anything! */
		if ((*bp)[fd].flags & DF_LASTSEEK) {
			if (lseek(fd, (off_t) -1, L_INCR) < 0) 
				rc = -1;
			else if (write(fd, "", 1) != 1)
				rc = -1;
		}
		(*bp)[fd].blksize = -1;
		(*bp)[fd].flags = 0;
	}
	if (close(fd) < 0)
		return -1;
	else
		return rc;
}
------------------------------------------------------------------------

Apologies for the asm()s -- they were fun at the time.  Didn't you ever
wonder what some of those CISC instructions were for?

Some typical timings on our 11/785:

% ls -ls /usr/site/src/psl/bin/bare-psl
 412 -rwxrwxr-x  1 psl      psl       1651920 Oct 19 08:17 /usr/site/src/psl/bin/bare-psl*
% time cp /usr/site/src/psl/bin/bare-psl /tmp
0.0u 4.0s 0:07 55% (11t+25ds+37avg+19max)k 106i+210o (0maj+7min)pf 0swaps
% rm /tmp/bare-psl
% time zcp /usr/site/src/psl/bin/bare-psl /tmp
0.2u 2.4s 0:04 65% (15t+27ds+43avg+23max)k 104i+60o (0maj+9min)pf 0swaps

chris@mimsy.UUCP (Chris Torek) (03/18/88)

In article <9978@steinmetz.steinmetz.UUCP> davidsen@steinmetz.steinmetz.UUCP
(William E. Davidsen Jr) writes:
>The problem comes in copying [sparse or `holey' files] to a backup device
>and reloading them.  I know of no program to do this, although it would be
>easy to have the reload program write a sparse file if blocks of zeros
>were encountered. 

4BSD dump and restore know about holes.  Curiously, `restore' makes the
assumption that the last block of a file is never a hole.  This was
true until the introduction of the `truncate' system call, because
there was no way to shorten a file, and the only way to create a file
with holes was

	fd = open(...);
	lseek(fd, (long)way_out, 0);
	write(fd, "", 1);

which did allocate at least one block (or fragment) containing only
zero bytes.  Now, however, it is possible to create a file containing
ONLY holes; a program that does this is appended.

>Is there a problem with this?

No, althought it might be slow.  `restore' works with an image of
the original inode, and only replaces holes with holes; it does not
test each block.

>That is, is there a way to tell a sparse file from a file with a lot of zeros?

There are two ways.  One is to look at the inode (bypass the file
system).  The other is to count the number of blocks used (easy in
4.2/4.3BSD, hard in others).  But all normal file system operations are
obliged to treat them the same.

	/* create a file that is entirely hole */
	#include <sys/types.h>
	#include <sys/stat.h>

	struct stat st;

	main(){
		int fd = creat("t.hole", 0666);

		if (fd < 0)
			perror("t.hole"), exit(1);
		(void) lseek(fd, 65536L, 0);
		(void) write(fd, "", 1);
		(void) fstat(fd, &st);
		(void) printf("t.hole has %ld blocks\n", st.st_blocks);
		(void) ftruncate(fd, 32768L);
		(void) fstat(fd, &st);
		(void) printf("t.hole now has %ld blocks\n", st.st_blocks);
		exit(0);
	}
-- 
In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 7163)
Domain:	chris@mimsy.umd.edu	Path:	uunet!mimsy!chris

dan@maccs.UUCP (Dan Trottier) (03/20/88)

Then there was the case of a student who inadvertently fseek'ed to a 
pointer value and wrote a couple of bytes of data and closed the file.

The morning backup of a 146 MB partition wanted 26 40 MB tapes! Yes folks
thats a total of 1040 MB. Of course df(1), du(1), and find(1) indicated
all was fine. We had to do an "ls -laR" to find that someone had created
a 1 Gigabyte file! You would think dump(8) would be a little smarter.

-- 
       A.I. - is a three toed sloth!        | ...!uunet!mnetor!maccs!dan
-- Official scrabble players dictionary --  | dan@mcmaster.BITNET

root@shockeye.UUCP (Bourne-again Superuser) (03/20/88)

In article <143@aoa.UUCP> mbr@aoa.UUCP (Mark Rosenthal) writes:
>How do you copy holey files?
...
>As far as I can tell, when reading such a file, there is no way to determine
>whether bytes whose value is 0 are really stored on disk, or are in the middle
>of a hole in the file.  Therefore, when copying such a file, the holes get
>converted to actual blocks which take up space on the disk, so the file expands.
>    1. Read through the bytes sequentially, but don't output any whose value
>	is 0.  Use lseek() before writing the next non-zero byte.  To guarantee
>	that the created file has the same extent, always write the last byte.
>	Disadvantage: When copying a file with huge holes and little data,
>	this would be very slow.
...
>Is there any direct method which will allow me to determine where the holes
>in a file are?  Does anybody know of any other tricks besides the two I have
>described above?  What, if any, are the differences among the following

You don't have to check every byte of the file - if you know what the
filesystem's block size (say 1024) then you can read the next 1024 bytes and
if any of them are not zero, write them all, otherwise lseek past them.

--
Mark Buda, The Embattled Hermit		Domain: hermit@chessene.uucp
Dumb: ...{rutgers,ihnp4,cbosgd}!bpa!vu-vlsi!devon!chessene!hermit
"Dr. Johnson, can you come over right away? My father has a hibachi on his
head."

jerry@oliveb.olivetti.com (Jerry Aguirre) (03/23/88)

In article <5356@utah-cs.UUCP> donn@utah-cs.UUCP (Donn Seeley) writes:
>There are faster ways to do this if you don't need to be general about
>making holes, for example if you only deal with sequential copying of
>files.  We produced a modified 'cp' here at Utah that detects aligned
>blocks of zeroes in input and performs seeks instead of writes on
>output.  The detection of blocks of zeroes can be very cheap -- we find
>that it's faster to copy a file with zeroes in it using our modified
>'cp' than to copy it using the standard 'cp'.  Of course it helps that

I tried this one time also.  The problem is that each program requires a
separate solution.  Fixing "cp" does nothing about tar, ld, dd, etc.

Has anyone tried puting the zero test into "write".  As Donn pointed out
an "all zero" check will usually fail in the first byte of the block and
so is, on the average, cheap.

The only problem is I doubt a Vax750 can generate a block of zeros
faster than it can read it from disk.  I seem to remember that the
PDP-11 version of UNIX kept a block of zeros on disk to zero memory,
that being faster than a cpu loop to do the same.

				Jerry Aguirre