[comp.unix.wizards] Releasing blocks from a file

michi@anvil.oz (Michael Henning) (05/17/89)

4.3 BSD has a truncate() system call to get rid of trailing blocks at the end
of a file. I believe that this was introduced to allow certain FORTRAN
libraries to work under UNIX, which depended on that feature.
I would like to know why this has not been generalised to allow
*any* block of a file to be released. For example, in random access files
such as B-trees, on deletion, one would like to get rid of a disk block.
Because there is no way to "shrink" a file under UNIX, B-tree packages are
forced to either keep a list of free'd blocks for reuse, or to copy the entire
tree to get rid of the unused space. Is there any reason not to have a system
call something like:

	release(fd, offset, num_blocks)
	int fd;
	off_t offset;
	unsigned num_blocks;

The idea is to specify that num_blocks are to be released beginning at
the specified offset (which must be aligned on a file system block boundary).
I believe that all the kernel would have to do is to set the corresponding
pointers inthe file's inode to NULL and to return these blocks to the free list.
On a subsequent read of a block at that offset, the kernel could return
a block of NULL characters, just like for a file that was written with
random access and has no data in the region being read.

					Michi.


-- 
               | The opinions expressed are my own, not those of my employer. |
               |                                                              |
               | Michael (Michi) Henning                                      |
               | - We have three Michaels here, that's why they call me Michi |

chris@mimsy.UUCP (Chris Torek) (05/19/89)

In article <461@anvil.oz> michi@anvil.oz (Michael Henning) writes:
>I would like to know why this has not been generalised to allow
>*any* block of a file to be released.

That task is harder (but not impossible; and truncating to an arbitrary
size is harder than truncating to zero).  It might be worth trying.
However:

>... system call something like:
>
>	release(fd, offset, num_blocks)
>	int fd;
>	off_t offset;
>	unsigned num_blocks;

File system calls should always be specified in bytes, not blocks.

>The idea is to specify that num_blocks are to be released beginning at
>the specified offset (which must be aligned on a file system block boundary).

The call should really `zero out' the part of the file from the
given offset to the end of offset+size:

	int wzero(int fd, int offset, size_t size)

where the zeroing would be done by freeing allocated blocks whenever
this region spans full blocks; the system call itself is then immune
to changes in the file system representation (would work over extents,
e.g.).

You could then reduce this system call to

	lseek(fd, offset, 0), write(fd, buffer_containing_zeroes, size)

by simply making write() notice blocks of zeroes.  (But you then have
to fix /etc/restore :-) )
-- 
In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 7163)
Domain:	chris@mimsy.umd.edu	Path:	uunet!mimsy!chris

jmm@eci386.uucp (John Macdonald) (05/19/89)

In article <461@anvil.oz> michi@anvil.oz (Michael Henning) writes:
>
>4.3 BSD has a truncate() system call ...
>I would like to know why this has not been generalised to allow
>*any* block of a file to be released. ...
>... Is there any reason not to have a system
>call something like:
>
>	release(fd, offset, num_blocks)
>	int fd;
>	off_t offset;
>	unsigned num_blocks;
>
>...
>
>					Michi.

There is one very good reason - the size of a block is not constant.
Some systems have a single system-wide block size (but different
instances might use different sizes), and other systems can support
different block sizes for different file systems simultaneously.

The end of file position is kept as a byte offset, and its position
relative to the underlying block layout used is handled by the file
system code.  To handle the block layout mapping transparently for
a release call could be done, but would be painfull.  To use your
suggested system/file-system dependent format would lead to obscure
bugs when the program gets the block size wrong, and to awkward code
to determine when a partial block release means that the entire block
could now being released.

Any method chosen that has user code based upon the underlying mapping
of bytes in a file to the physical storage units would interfere with
the freedom to make future changes in that mapping.  (If your proposed
release call above had been in effect early enough, there would have
been much more difficulty in going from 512 byte blocks to larger blocks.
If it came into effect somewhat later, there would have much more difficulty
in introducing file-system-switch and per-file-system-blocksize alternatives.)

A workable mechanism would be to have the system calls:
    release(int fd, off_t offset, off_t num_bytes)     and
    frelease(FILE *fp, off_t offset, off_t num_bytes)
(or a version without the offset argument that work relative to the current
seek position would be usable alternatives)

which would only guarantee that future reads of the file will return
all null bytes within the specified range, while permitting the kernel
to release any full blocks within the range, and possibly (if the kernel
implementors were ambitious) also release blocks that overlap the ends
of the range which have become entirely null bytes as after the required
zeroing.  The only difference between this and (seek to offset, write
num_byte long null buffer) would be the strong hint to the kernel that
blocks ought to be released.

cowan@marob.MASA.COM (John Cowan) (05/20/89)

In article <461@anvil.oz> michi@anvil.oz (Michael Henning) writes:
>
>I would like to know why [truncate()] has not been generalised to allow
>*any* block of a file to be released. For example, in random access files
>such as B-trees, on deletion, one would like to get rid of a disk block.
>Because there is no way to "shrink" a file under UNIX, B-tree packages are
>forced to either keep a list of free'd blocks for reuse, or to copy the entire
>tree to get rid of the unused space. Is there any reason not to have a system
>call something like:
>
>	release(fd, offset, num_blocks)
>	int fd;
>	off_t offset;
>	unsigned num_blocks;
>
>The idea is to specify that num_blocks are to be released beginning at
>the specified offset (which must be aligned on a file system block boundary).

The idea may be a good one, but I don't like the syntax/semantics of
this call.  It is better not to embed into programs knowledge about how big
a "block" is.  Here's a different proposed function:

	nullify(int fd; long size)

(note ANSI-ism) which causes "size" null bytes to be written to the file
at the current position.  The kernel then computes which blocks, if any,
can be freed, and which blocks must have actual zero bytes written in them
(at most two, the one at the beginning of the nullified region and the one
at the end).

Note that this does not remove the need for B-tree and suchlike programs
to keep their own free-lists.  There is no other way to know which parts of
your space are semantically empty and which parts contain information.
It simply avoids taking up more disk space than necessary.

Also note that this operation is fundamentally different from truncate(),
which changes the >size< of a file.  These operations change the content of
a file, not its size.
-- 
John Cowan <cowan@marob.masa.com> or <cowan@magpie.masa.com>
UUCP mailers:  ...!uunet!hombre!{marob,magpie}!cowan
Fidonet (last resort): 1:107/711
Aiya elenion ancalima!

jfh@rpp386.Dallas.TX.US (John F. Haugh II) (05/20/89)

In article <461@anvil.oz> michi@anvil.oz (Michael Henning) writes:
>4.3 BSD has a truncate() system call to get rid of trailing blocks at the end
>of a file. I believe that this was introduced to allow certain FORTRAN
>libraries to work under UNIX, which depended on that feature.
>I would like to know why this has not been generalised to allow
>*any* block of a file to be released. For example, in random access files
>such as B-trees, on deletion, one would like to get rid of a disk block.

I believe fclear() allows you to de-allocate blocks from random
locations within a file.  I think the syntax is like

	fclear (fd, whence, nbytes);

where fd is the open file descriptor, whence is the file offset and
nbytes is the number of bytes worth of hole to create.  This may be
wrong.  The manuals are at work.

This will not truncate the length of a file, as I recall.  However
ftruncate() is available for that situation.

Oh - I'm not an AIX developer.  I just play one on the net :-)  You
will want to read your manuals for more accurate information.
-- 
John F. Haugh II                        +-Button of the Week Club:-------------
VoiceNet: (512) 832-8832   Data: -8835  | "AIX is a three letter word,
InterNet: jfh@rpp386.Cactus.Org         |  and it's BLUE."
UucpNet : <backbone>!bigtex!rpp386!jfh  +--------------------------------------

rcodi@chudich.co.rmit.oz (Ian Donaldson) (05/20/89)

In article <461@anvil.oz>, michi@anvil.oz (Michael Henning) writes:
> call something like:
> 
> 	release(fd, offset, num_blocks)
> 	int fd;
> 	off_t offset;
> 	unsigned num_blocks;
> 
> The idea is to specify that num_blocks are to be released beginning at
> the specified offset (which must be aligned on a file system block boundary).

I would prefer to see num_blocks be in terms of bytes, not blocks.  A block
size is filesystem dependent (although fstat(2) and stat(2) on BSD tell
you how large it is).
Also no other current UNIX system calls take arguments in terms of blocks.

Of course if you release the space in a file on byte boundaries that
do not fall exactly on block boundaries then some blocks won't be releaseable,
hence the call will have a lesser effect than intended.

If the semantics of the call were right, it would be possible to 
implement ftruncate() as a library call that uses release() internally.

Also it would be possible to implement infinite length FIFO's as ordinary
files.  You could release the space at the *beginning* of the file after
it has been read.  You might need extra support to determine where the
new logical beginning of the file *is* however.  Its FIFO use may be limited
by the fact that the new logical start position would be limited to a
block boundary unless FIFO byte pointers were installed in the inode or 
something, which probably means creating a new FIFO file type (as opposed 
to named pipes).  The implimentation could be similar to that of named pipes.
But named pipes would differ in that the data is lost after a reboot or
close.  named fifo's would retain the data like ordinary files.
In fact you could even implement named pipes as a special case of
named fifo's.  (cf: BSD that does it as a special case of sockets)

This is all tied in with the ability to copy holey files around, which 
is a facility that is also currently lacking.

eg: dump/tar/cpio/cp could use this facility *now* and make themselves
more functional and portable:  dump would no longer be required to read 
raw filesystems to get the info.  cp would no longer allocate more disk
space in the destination of a holey file copy.

Ian D

guy@auspex.auspex.com (Guy Harris) (05/20/89)

>4.3 BSD has a truncate() system call to get rid of trailing blocks at the end
>of a file. I believe that this was introduced to allow certain FORTRAN
>libraries to work under UNIX, which depended on that feature.

They depended on being able to do something like that, and the hack used
prior to that - open a new file, copy the first N bytes of the old file
to that file, close the files, and rename the new file on top of the old
file (or some such), was kind of gross and slow.

>I would like to know why this has not been generalised to allow
>*any* block of a file to be released.

In the case of 4.3BSD, perhaps nobody got around to it, and not enough
people asked for it to push somebody into getting around to it?

I think AIX has precisely such a function, called "fclear"; it's defined
to zero out a specific range of bytes in the file, and will do so by
putting "holes" into the files whenever it can.

This may appear in S5R4 under the guise of F_FREESP, as may the
"ftruncate" function. 

Provision for a "punch a hole in a file" function of that sort also
appears in drafts of the NFS Version 3 protocol; if the file system
can't punch holes in the file, it must write zeroes to that region of
the file.  (The same would presumably be true of local file systems put
under the S5R4 VFS mechanism that can't support files with holes; I
think, for instance, SGI's extent-based file system doesn't support
holes.)