[comp.unix.questions] sparse files

emcmanus@cs.tcd.ie (Eamonn McManus) (11/12/88)

In article <17462@adm.BRL.MIL> Makey@logicon.arpa (Jeff Makey) writes:
>Few files on most UNIX systems need the sparse-file feature.  However,
>I once was messing with some custom database files that were very
>large and very sparse.  In order to copy one of these database files
>and not fill up the file system I had to write my own version of "cp"
>which scanned each input block and didn't write it out if it contained
>all zeroes.

Another useful application of a program like this is for systems like TeX,
where an executable file is created that has its initialisation code
already done.  This is (effectively) achieved by running the initialisation
code, producing a core-dump, and making a new executable from the result.
There is no bss segment in the resultant program so it tends to have huge
patches of zeros in the data segment, especially if it has been configured
with large arrays.  I was able to reduce the TeX executable to 47% of its
original size by using a program like yours.  (Do more recent TeX
distributions have this idea already?)

JAZBO@brownvm.brown.edu (James H. Coombs) (11/30/89)

"Sparse files" have been mentioned in several recent postings.  For example:

>Kemp@DOCKMASTER.NCSC.MIL writes:
>>Just for the record, is there *any* way to do a recursive copy
>>correctly?  I.e.  one that doesn't:
>> * turn symbolic links into actual files
>> * turn link loops into a series of infinitely nested copies
>> * alter the modify and change times
>> * choke on block and character special files
>> * turn holes in sparse files into real disk blocks
>I think afio will do this. I am not sure about the symlink
>stuff, though, as we're a SYS V only site.

Can someone explain exactly what a sparse file is?  How does one get created?

--Jim

Dr. James H. Coombs
Senior Software Engineer, Research
Institute for Research in Information and Scholarship (IRIS)
Brown University, Box 1946
Providence, RI 02912
jazbo@brownvm.bitnet
Acknowledge-To: <JAZBO@BROWNVM>

jik@athena.mit.edu (Jonathan I. Kamens) (11/30/89)

In article <21581@adm.BRL.MIL> JAZBO@brownvm.brown.edu (James H. Coombs)
writes:
>Can someone explain exactly what a sparse file is?  How does one get created?

  A "sparse file" is a file with a lot more NULLs in it than anything
else (well, that's a general definition, but it's basically correct).

  Many (although not all -- the Andrew File System, for example does
not) Unix filesystem types support the ability to greatly reduce the
amount of space taken up by a file that is mostly nulls by not really
storing the file blocks that are filled with nulls.

  Instead, the OS keeps track of how many blocks of nulls there are in
between each block that has something other than nulls in it, and
feeds nulls to anybody that tries to read the file, even though
they're not really being read off of a disk.

  You can create a sparse file by fopen'ing a file and fseek'ing far
past the end of the file without writing anything -- the file up to
where you fseek will be NULL, and the kernel (probably) won't save all
of those NULLs to disk.

  Programs that use dbm often create sparse files, because dbm uses
file location as part of its hashing and tries to spread out entries
in the database file so there is lots of blank space between them.

  The reason sparse files are a problem when it comes to copying is
that the Kernel isn't smart enough (or perhaps it won't do it because
it *is* smart :-) to figure out you're feeding it a sparse file if you
actually feed it the NULLs.  Therefore, standard file copying programs
like cp that just read the file in and write it out in a different
location lose, because they end up creating a file that really does
take up as much as space physically as there are NULLs in the abstract
file object.

Jonathan Kamens			              USnail:
MIT Project Athena				11 Ashford Terrace
jik@Athena.MIT.EDU				Allston, MA  02134
Office: 617-253-8495			      Home: 617-782-0710

cpcahil@virtech.uucp (Conor P. Cahill) (12/01/89)

In article <21581@adm.BRL.MIL>, JAZBO@brownvm.brown.edu (James H. Coombs) writes:
> "Sparse files" have been mentioned in several recent postings.  For example:

 	[ example deleted.. ]

> Can someone explain exactly what a sparse file is?  How does one get created?

A sparse file is a file that has "holes" in it.  A hole is a portion of 
the file that does not exist on the disk.  Reading data from a hole will
always get null bytes for the portion of the read that was a hole.  Writing
data to the holw (even if it is all nulls) will cause the hole to become
a non-hole (in other words it will take up space on the file system).

Holes usually occur in binary database files and can also occur in 
core files on some systems.

An easy way to create a sparse file is as follows:

	fd = open(newfile,...);
	lseek(fd,1meg,0);
	write(fd,"at one meg",count);

If you ls the file it will appear as if it takes up 1meg of space.
If you read the file, you will see 1meg of nulls followed by the "at one meg".

However, if you do a df and compare it to the value before the file 
was created you will find that the system space has gone down by just
a few blocks.

I have seen 80 meg files on 40 meg file systems.  This can be a problem if
you try to restore the file from a backup, since most backup utilities, like
cpio or tar, do not know about sparse files, the restore will faile when
you run the filesystem out of space.

-- 
+-----------------------------------------------------------------------+
| Conor P. Cahill     uunet!virtech!cpcahil      	703-430-9247	!
| Virtual Technologies Inc.,    P. O. Box 876,   Sterling, VA 22170     |
+-----------------------------------------------------------------------+

rec@dg.dg.com (Robert Cousins) (12/01/89)

In article <21581@adm.BRL.MIL> JAZBO@brownvm.brown.edu (James H. Coombs) writes:
>"Sparse files" have been mentioned in several recent postings.  For example:
>
>>Kemp@DOCKMASTER.NCSC.MIL writes:
>>>Just for the record, is there *any* way to do a recursive copy
>>>correctly?  I.e.  one that doesn't:
>>> * turn symbolic links into actual files
>>> * turn link loops into a series of infinitely nested copies
>>> * alter the modify and change times
>>> * choke on block and character special files
>>> * turn holes in sparse files into real disk blocks
>>I think afio will do this. I am not sure about the symlink
>>stuff, though, as we're a SYS V only site.
>
>Can someone explain exactly what a sparse file is?  How does one get created?
>
>--Jim
>
>Dr. James H. Coombs
>Senior Software Engineer, Research
>Institute for Research in Information and Scholarship (IRIS)
>Brown University, Box 1946
>Providence, RI 02912
>jazbo@brownvm.bitnet
>Acknowledge-To: <JAZBO@BROWNVM>

A sparse file is one which has "holes" in it.  Specifically, the amount of 
space required to store the file on disk is less than the length of the
file (offset of the last byte).  A sparse file can be created under UNIX
by creating a file and then simply choosing not to write some portions
of the file.  The following program creates a sparse file:


#include	<stdio.h>
#include	<fcntl.h>
#include	<sys/file.h>
#include	<sys/types.h>
#include	<unistd.h>

main()
{
	int	fp, status;
	off_t	position;
	static char buffer[] = "This is a test of sparse files.";
	
	fp = open("test.file",O_RDWR+O_CREAT,0666);
	if (fp < 0) { printf("Unable to open/create file.\n"); exit(1); }
	position = lseek(fp,100000, SEEK_SET);
	printf("Moved the file to offset %d\n",position);
	status = write(fp, buffer, sizeof(buffer));
	printf("Result status of write is %d\n",status);
	close(fp);
	exit(0);
}

UNIX treats the "holes" as 0's when read. In fact, UNIX has only
minimal support for sparse files.  Backing up sparse files often
involves copying large amounts of nulls.  Once an area of a file is
written, it cannot be returned to its previous sparse state.  One
cannot REALLY tell (without heroic effort) if  a given area of a file
is just 0's or is not there.  In arguments that UNIX is not suitable for
DP applications, sparse files usually come up if the conversation goes
on long enough between knowledgeable people.

Some operating systems return an error which amounts to 
"you can't read that because there isn't anything there."  
Sparse files are quite popular for a number of Data Processing
applications.  (Effectively you can use them for hash buckets amongst
other applications.)  Furthermore, for some scientific applications, sparse
files can be used to store sparse matrices.  This, however, would
require finer granularity than normally found in the sparse storage
system.  Most operating systems just check to see if there is a block
allocated which would contain that information and if so return that
value.  Hence, a "sparse" file in which every other byte was written would
appear to an application to be continuous.

Robert Cousins
Dept. Mgr, Workstation Dev't.
Data General Corp.

Speaking for myself alone.

chris@mimsy.umd.edu (Chris Torek) (12/02/89)

In article <235@dg.dg.com> rec@dg.dg.com (Robert Cousins) writes:
>UNIX treats the "holes" as 0's when read. In fact, UNIX has only
>minimal support for sparse files.  Backing up sparse files often
>involves copying large amounts of nulls.  Once an area of a file is
>written, it cannot be returned to its previous sparse state.

*Real* backup programs (V7 dump and its descendents, but not cpio or
tar) understand holes in files.  It is true, though, that the average
Unix system will fill in a hole if you so much as breathe too loudly
near it.

(Time to argue once again for a change to bmap and rdwri to look to
see if a full block is being written with zero, and if so, to release
the previously allocated block if any....  At the very least `cp' could
look for opportunities to create holes.  Typically this will not slow
things down much, since the first word of each block is rarely zero.)
-- 
In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 7163)
Domain:	chris@cs.umd.edu	Path:	uunet!mimsy!chris

guy@auspex.auspex.com (Guy Harris) (12/09/89)

>UNIX treats the "holes" as 0's when read. In fact, UNIX has only
>minimal support for sparse files.  Backing up sparse files often
>involves copying large amounts of nulls.  Once an area of a file is
>written, it cannot be returned to its previous sparse state.

Not in general, anyway.  At least the first version of AIX for the RT PC
claimed, in its documentation, that it had an "fclear()" call to punch
holes in files; I think this may show up in future releases of other
UNIXes as well.

>In arguments that UNIX is not suitable for DP applications, sparse
>files usually come up if the conversation goes on long enough between
>knowledgeable people.

Umm, what other operating systems support sparse files *and* return a
"there's a hole there" indication?  For instance, are there any OSes
with extent-based file systems (VMS, OS/360 and successors as I
remember, IRIX with SGI's Extent File System) that support sparse files?

fnf@estinc.UUCP (Fred Fish) (12/10/89)

In article <235@dg.dg.com> uunet!dg!rec (Robert Cousins) writes:
>In article <21581@adm.BRL.MIL> JAZBO@brownvm.brown.edu (James H. Coombs) writes:
>>Can someone explain exactly what a sparse file is?  How does one get created?
>
>A sparse file is one which has "holes" in it.  Specifically, the amount of 
>space required to store the file on disk is less than the length of the
>file (offset of the last byte).  A sparse file can be created under UNIX
>by creating a file and then simply choosing not to write some portions
>of the file.  The following program creates a sparse file:

[program deleted]

This is about the 2nd or 3rd posting I've seen with a sample program to
create sparse files.  You guys are working too hard!  :-)

Try:

	echo "This is a sparse file." | dd of=sparsefile seek=1000 bs=1k

-Fred
-- 
# Fred Fish, 1835 E. Belmont Drive, Tempe, AZ 85284,  USA
# 1-602-491-0048           asuvax!{nud,mcdphx}!estinc!fnf

darcy@druid.uucp (D'Arcy J.M. Cain) (12/11/89)

In article <2700@auspex.auspex.com> guy@auspex.auspex.com (Guy Harris) writes:
>>UNIX treats the "holes" as 0's when read. In fact, UNIX has only
>>minimal support for sparse files.  Backing up sparse files often
>>involves copying large amounts of nulls.  Once an area of a file is
>>written, it cannot be returned to its previous sparse state.
>
>Not in general, anyway.  At least the first version of AIX for the RT PC
>claimed, in its documentation, that it had an "fclear()" call to punch
>holes in files; I think this may show up in future releases of other
>UNIXes as well.
>
Seems simple enough to write a utility.  The core would be something like
the following:

	file_pointer = 0L;
	skip_space = 0;

	while ((c = fgetc(in_fp)) != EOF)
	{
		if (skip_space)
		{
			fseek(out_fp, file_pointer, SEEK_SET);
			skip_space = 0;
		}

		if (c)
			fputc(c, out_fp);
		else
			skipspace = TRUE;

		file_pointer++;

	}

Of course this is just off the cuff and probably needs some fleshing out
and optimizing but I think it would work on any system supporting sparse
files that return nulls for the empty parts.

-- 
D'Arcy J.M. Cain (darcy@druid)     |   "You mean druid wasn't taken yet???"
D'Arcy Cain Consulting             |                    - Everybody -
West Hill, Ontario, Canada         |
No disclaimers.  I agree with me   |

rec@dg.dg.com (Robert Cousins) (12/12/89)

In article <2700@auspex.auspex.com> guy@auspex.auspex.com (Guy Harris) writes:
(I wrote)
>>UNIX treats the "holes" as 0's when read. In fact, UNIX has only
>>minimal support for sparse files.  Backing up sparse files often
>>involves copying large amounts of nulls.  Once an area of a file is
>>written, it cannot be returned to its previous sparse state.
>Not in general, anyway.  At least the first version of AIX for the RT PC
>claimed, in its documentation, that it had an "fclear()" call to punch
>holes in files; I think this may show up in future releases of other
>UNIXes as well.

It is unclear whether support for sparse files is necessary.  My only
point is that at one time they were very popular amongst a particular
class of heavy DP applications.  Today we have the technology to more
effectively use system resources.  Don't forget that B-trees are relatively
recent inventions!

>>In arguments that UNIX is not suitable for DP applications, sparse
>>files usually come up if the conversation goes on long enough between
>>knowledgeable people.
>Umm, what other operating systems support sparse files *and* return a
>"there's a hole there" indication?  For instance, are there any OSes
>with extent-based file systems (VMS, OS/360 and successors as I
>remember, IRIX with SGI's Extent File System) that support sparse files?

There are a number of OS's which support sparse files.  An 
incomplete list of them includes:

	TurboDOS (1.3 and later)
	S1 (all revs if my memory is correct)
	RM/COS 
	IBM System 3 os (I think, its been 10 years)
	VM
	VMS 
	CP/M (Its not really an os but . . . . it is extent based)
	Any operating system which supports honest-and-for-true ISAMs
	In fact, a number of OS's designed for COBOL or RPG support
		have these features.
	Anyone care to add to the list?
	
It is true, however that newer operating systems don't support sparse
files.  However, add-ons such as VTAM, do still support it.  One reason
for the dimise of sparse files is the lack of support for the concept
of records in more popular operating systems (UNIX, DOS, etc.) It is 
much more difficult to treat a file as a sparse collection of bytes efficiently
than it is as a collection of records.  Several of the above mentioned
operating systems were plagued with handling sparse files in some form
of system imposed record scheme. Often this system-imposed scheme did
hide the "sparseness" from programmers under certain circumstances. For
example, I have been told that VMS allows programs to sequentially read 
a sparse file and skip over gaps in the file. ISAM files were intrensically
sparse.  ("ISAM" is a term which has recently been corrupted to mean 
"Keyed indexed access system of some form" instead of the traditional
surface/track/sector indexing scheme.)

As an aside, TurboDOS used sparse files as the extension mechanism for
files.  To extend a file, one would lock the region beyond the end of the
file, write to it (implicitly extending the file) and then release the lock.
Since file locks were for system imposed quantities, it was possible for
a program to create a sparse file by accident.  If one program wanted to write
1k bytes but the lock quantity was set at 2k bytes, it would have to lock the entire 
physical record (2k bytes) which would cause any program attempting to extend the 
file at the same time to skip beyond the lock region (over the second half of the
2k bytes) and do the same thing.  Effectively a sparse file was
created where the file ended in 1k of written data, 1k of "nothing", and 1k of 
written data.  Depending upon other circumstances, it was possible that the
sparse area could be shown as either unwritten (and return sparse file status)
or under certain obscure cases it would show to contain the previous contents 
of some physicla disk sectors.  This made porting some business applications 
quite difficult since business applications tend to depend upon shared files 
extended in real time. Applications properly written could use sparse files
to their own advantage without difficulty, however.

Robert Cousins
Dept. Mgr, Workstation Dev't.
Data General Corp.

Speaking for myself alone.

meissner@dg-rtp.dg.com (Michael Meissner) (12/12/89)

In article <2700@auspex.auspex.com> guy@auspex.auspex.com (Guy Harris) writes:

|  >UNIX treats the "holes" as 0's when read. In fact, UNIX has only
|  >minimal support for sparse files.  Backing up sparse files often
|  >involves copying large amounts of nulls.  Once an area of a file is
|  >written, it cannot be returned to its previous sparse state.
|  
|  Not in general, anyway.  At least the first version of AIX for the RT PC
|  claimed, in its documentation, that it had an "fclear()" call to punch
|  holes in files; I think this may show up in future releases of other
|  UNIXes as well.
|  
|  >In arguments that UNIX is not suitable for DP applications, sparse
|  >files usually come up if the conversation goes on long enough between
|  >knowledgeable people.
|  
|  Umm, what other operating systems support sparse files *and* return a
|  "there's a hole there" indication?  For instance, are there any OSes
|  with extent-based file systems (VMS, OS/360 and successors as I
|  remember, IRIX with SGI's Extent File System) that support sparse files?

Data General's AOS/VS and AOS/VS-II operating systems support sparse
files, and recently they added a version of the read block system call
that would indicate where the holes are.  The default read and read
block system calls just return all zeroes.  Both the file level backup
programs, and the low level disk copy programs preserve holes.

Actually the user level backup programs will never write blocks of all
0's to the backup media (unlike either tar or cpio).  Before the
system call to tell where the holes were added, you could watch the
backup program get into a tight CPU loop when it would be processing a
large host (say a couple of meg of so).  The OS would realize that
there was a hole, and fill the buffer with all 0's, the dump program
would then compare the block to see if it contained all 0's, and if it
did skip it, maintaining the file position internally so it could then
be indicated when the next real data was encountered.
--
--
Michael Meissner, Data General.
Until 12/15:	meissner@dg-rtp.DG.COM
After 12/15:	meissner@osf.org

fnf@estinc.UUCP (Fred Fish) (12/13/89)

In article <1989Dec10.170841.26798@druid.uucp> darcy@druid.UUCP (D'Arcy J.M. Cain) writes:
>In article <2700@auspex.auspex.com> guy@auspex.auspex.com (Guy Harris) writes:
>>Not in general, anyway.  At least the first version of AIX for the RT PC
>>claimed, in its documentation, that it had an "fclear()" call to punch
>>holes in files; I think this may show up in future releases of other
>>UNIXes as well.
>>
>Seems simple enough to write a utility.  The core would be something like
>the following:

I think Guy (and others) were talking about punching holes in-place without
having to make a copy of the file.  Recreating sparse files by copying is
much easier.

The necessary changes to add preservation of sparseness (or creation of
sparseness from nonsparse files) are fairly trivial and can be probably
be added to cp, tar, cpio, etc in a matter of a few minutes.  Here is the
relevant code from BRU (Backup and Restore Utility) with some minor changes
to simplify variable names:

	if (!allnulls (buffer, nbytes)) {
	    iobytes = write (fildes, buffer, nbytes);
	} else {
	    if (lseek (fildes, nbytes, 1) != -1) {
		iobytes = nbytes;
	    } else {
		bru_message (MSG_SEEK, fname);
		iobytes = write (fildes, buffer, nbytes);
	    }
	}

Note that the file starts off truncated to zero length, so the lseeks only
extend the file from the current last written position.  By falling back
to doing writes if the seek fails, the code is portable to systems where
the files cannot be extended with holes (or nulls) by seeking, at the
expense of performing occasional failing lseeks.

-Fred
-- 
# Fred Fish, 1835 E. Belmont Drive, Tempe, AZ 85284,  USA
# 1-602-491-0048           asuvax!{nud,mcdphx}!estinc!fnf

chris@mimsy.umd.edu (Chris Torek) (12/13/89)

In article <244@estinc.UUCP> fnf@estinc.UUCP (Fred Fish) writes:
>The necessary changes to add preservation of sparseness (or creation of
>sparseness from nonsparse files) are fairly trivial and can be probably
>be added to cp, tar, cpio, etc in a matter of a few minutes.  Here is the
>relevant code from BRU (Backup and Restore Utility) with some minor changes
>to simplify variable names:
>
>	if (!allnulls (buffer, nbytes)) {
>	    iobytes = write (fildes, buffer, nbytes);
>	} else {
>	    if (lseek (fildes, nbytes, 1) != -1) {
>		iobytes = nbytes;
>	    } else {
>		bru_message (MSG_SEEK, fname);
>		iobytes = write (fildes, buffer, nbytes);
>	    }
>	}

This code is not sufficient.  In particular, a file that ends in
an `allnulls' block will come out too short.  In older Unix systems,
the following is required:

	while (there are more blocks) {
		read this block, handling any errors;
		if (it is all nulls)
			nullblocks++;
		else {
			if (nullblocks) {
				(void) lseek(fd, nullblocks * blocksize, 1);
				nullblocks = 0;
			}
			write this block, handling any errors;
		}
	}
	if (nullblocks) {
		(void) lseek(fd, nullblocks * blocksize - 1, 1);
		err = write(fd, "", 1) != 1;
		if (err)
			handle error;
	}

On newer systems, the file can be made to end in a hole by using
ftruncate().  If ftruncate() is actually fsetsize() (SunOS and some
other systems), the last section can be replaced by

	if (nullblocks) {
		if (ftruncate(fd, lseek(fd, 0L, 1) + nullblocks*blocksize))
			handle error;
	}

If ftruncate() does what its name claims to do (truncate only), the file
can still be made to end in a hole:

	if (nullblocks) {
		long newoff = lseek(fd, nullblocks * blocksize, 1);
		err = write(fd, "", 1);
		if (err || ftruncate(fd, newoff))
			handle error;
	}

Note, however, that the 4.2BSD and 4.3BSD `restore' programs have the
very same bug that this article is about: if a file ends in a hole,
the restored version of the file will have the trailing hole omitted.
For this reason, the first version---write(fd,"",1)---may be preferable.
-- 
In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 7163)
Domain:	chris@cs.umd.edu	Path:	uunet!mimsy!chris

darcy@druid.uucp (D'Arcy J.M. Cain) (12/13/89)

In article <244@estinc.UUCP> fnf@estinc.UUCP (Fred Fish) writes:
>In article <1989Dec10.170841.26798@druid.uucp> darcy@druid.UUCP (D'Arcy J.M. Cain) writes:
>>In article <2700@auspex.auspex.com> guy@auspex.auspex.com (Guy Harris) writes:
>>>Not in general, anyway.  At least the first version of AIX for the RT PC
>>>claimed, in its documentation, that it had an "fclear()" call to punch
>>>holes in files; I think this may show up in future releases of other
>>>UNIXes as well.
>>>
>>Seems simple enough to write a utility.  The core would be something like
>>the following:
>
>I think Guy (and others) were talking about punching holes in-place without
>having to make a copy of the file.  Recreating sparse files by copying is
>much easier.
>
>The necessary changes to add preservation of sparseness (or creation of
>sparseness from nonsparse files) are fairly trivial and can be probably
>be added to cp, tar, cpio, etc in a matter of a few minutes.  Here is the
>relevant code from BRU (Backup and Restore Utility) with some minor changes
>to simplify variable names:
>
>	if (!allnulls (buffer, nbytes)) {
>	    iobytes = write (fildes, buffer, nbytes);
>	} else {
>	    if (lseek (fildes, nbytes, 1) != -1) {
>		iobytes = nbytes;
>	    } else {
>		bru_message (MSG_SEEK, fname);
>		iobytes = write (fildes, buffer, nbytes);
>	    }
>	}
>
>Note that the file starts off truncated to zero length, so the lseeks only
>extend the file from the current last written position.  By falling back
>to doing writes if the seek fails, the code is portable to systems where
>the files cannot be extended with holes (or nulls) by seeking, at the
>expense of performing occasional failing lseeks.
>
In my example, I only intended to show a method of creating a sparse file
given an input stream of characters.  This is why I left out any mention
of opening the file in_fp.  It could in fact be standard input and in fact
that is how I thought of it myself.  Your example doesn't seem to do it any
differently except that you assume that the input stream is hard coded to
come from a particular program (tar, cpio etc.)  You also flesh it out a
little better to test for seek failures but you use low-level routines.  I
used stdio to make it more portable across systems that didn't support low
level stuff but did support sparse files.

-- 
D'Arcy J.M. Cain (darcy@druid)     |   Thank goodness we don't get all 
D'Arcy Cain Consulting             |   the government we pay for.
West Hill, Ontario, Canada         |
No disclaimers.  I agree with me   |

johnl@esegue.segue.boston.ma.us (John R. Levine) (12/14/89)

In article <2700@auspex.auspex.com> guy@auspex.auspex.com (Guy Harris) writes:
>At least the first version of AIX for the RT PC claimed in its documentation
>that it had an "fclear()" call to punch holes in files; ...

Yes, it did, I put it there myself.  The semantics were basically the same
as a write of a buffer of zeros but the implementation promised to make holes
where it could.  In retrospect, it was a mistake.  One of my coworkers
proposed that we make write() a little smarter and have it notice when the
buffer was all-zero and write a hole.  At the time I thought that would be
too slow, but he correctly pointed out that few buffers that are not entirely
zero start with a lot of zeros, so in most cases the overhead would be tiny.
Fixing write() would have the enormous advantage of automatically making
cp, cpio, tar, dump/restor, and everything else do the right thing with
sparse files.
-- 
John R. Levine, Segue Software, POB 349, Cambridge MA 02238, +1 617 864 9650
johnl@esegue.segue.boston.ma.us, {ima|lotus|spdcc}!esegue!johnl
"Now, we are all jelly doughnuts."

gwyn@smoke.BRL.MIL (Doug Gwyn) (12/14/89)

One problem with sparse files is that simply updating (i.e., reading a
record, modifying it, and writing it back on top of the original record)
can result in an I/O failure if the original record overlapped a hole
and the file system doesn't have any more free data blocks.

I don't offer a solution for this, just a warning that even I/O
operations that "can't" fail might, so applications should always
check and be prepared to recover.

johnl@esegue.segue.boston.ma.us (John R. Levine) (12/14/89)

In article <11813@smoke.BRL.MIL> gwyn@brl.arpa (Doug Gwyn) writes:
>One problem with sparse files is that simply updating can result in an I/O
>failure if the original record overlapped a hole and the file system doesn't
>have any more free data blocks.
>
>I don't offer a solution for this, just a warning that even I/O operations
>that "can't" fail might, so applications should always check and be prepared
>to recover.

Given the traditional tendency for disks to fail at the least convenient
time in the least convenient way, one should be prepared for any write to
fail as a disk block that used to be good becomes bad.  Doug's point is well
taken, though, since your error routine might be prepared for a write error
which makes a write() return -1, but not an out of space error which tends
to return a positive number less than the amount you wanted to write.
-- 
John R. Levine, Segue Software, POB 349, Cambridge MA 02238, +1 617 864 9650
johnl@esegue.segue.boston.ma.us, {ima|lotus|spdcc}!esegue!johnl
"Now, we are all jelly doughnuts."

fnf@estinc.UUCP (Fred Fish) (12/18/89)

In article <21261@mimsy.umd.edu> chris@mimsy.umd.edu (Chris Torek) writes:
>This code is not sufficient.  In particular, a file that ends in
>an `allnulls' block will come out too short.  In older Unix systems,
>the following is required:
> ...
>Note, however, that the 4.2BSD and 4.3BSD `restore' programs have the
>very same bug that this article is about: if a file ends in a hole,
>the restored version of the file will have the trailing hole omitted.
>For this reason, the first version---write(fd,"",1)---may be preferable.

Chris (along with the dozen or so people that emailed me about my example)
is of course 100% correct.  My example was misleading because I edited it
quite heavy, and in the course of such, broke it with respect to the trailing
hole bug.  The actual code (unedited) is:

	if (bytes > DATASIZE && flags.Sflag
		&& allnulls (blkp -> FD, (int) wbytes)) {
	    if (s_lseek (fildes, (S32BIT) wbytes, 1) == SYS_ERROR) {
		bru_message (MSG_SEEK, fname);
		iobytes = s_write (fildes, blkp -> FD, wbytes);
	    } else {
		iobytes = wbytes;
	    }
	} else {
	    iobytes = s_write (fildes, blkp -> FD, wbytes);
	}

where of course the seek is only done when there is at least one more
buffer full of data (besides the current one) to be written to the file
and the -S flag was given.

-Fred
-- 
# Fred Fish, 1835 E. Belmont Drive, Tempe, AZ 85284,  USA
# 1-602-491-0048           asuvax!{nud,mcdphx}!estinc!fnf