michi@anvil.oz (Michael Henning) (05/17/89)
4.3 BSD has a truncate() system call to get rid of trailing blocks at the end of a file. I believe that this was introduced to allow certain FORTRAN libraries to work under UNIX, which depended on that feature. I would like to know why this has not been generalised to allow *any* block of a file to be released. For example, in random access files such as B-trees, on deletion, one would like to get rid of a disk block. Because there is no way to "shrink" a file under UNIX, B-tree packages are forced to either keep a list of free'd blocks for reuse, or to copy the entire tree to get rid of the unused space. Is there any reason not to have a system call something like: release(fd, offset, num_blocks) int fd; off_t offset; unsigned num_blocks; The idea is to specify that num_blocks are to be released beginning at the specified offset (which must be aligned on a file system block boundary). I believe that all the kernel would have to do is to set the corresponding pointers inthe file's inode to NULL and to return these blocks to the free list. On a subsequent read of a block at that offset, the kernel could return a block of NULL characters, just like for a file that was written with random access and has no data in the region being read. Michi. -- | The opinions expressed are my own, not those of my employer. | | | | Michael (Michi) Henning | | - We have three Michaels here, that's why they call me Michi |
chris@mimsy.UUCP (Chris Torek) (05/19/89)
In article <461@anvil.oz> michi@anvil.oz (Michael Henning) writes: >I would like to know why this has not been generalised to allow >*any* block of a file to be released. That task is harder (but not impossible; and truncating to an arbitrary size is harder than truncating to zero). It might be worth trying. However: >... system call something like: > > release(fd, offset, num_blocks) > int fd; > off_t offset; > unsigned num_blocks; File system calls should always be specified in bytes, not blocks. >The idea is to specify that num_blocks are to be released beginning at >the specified offset (which must be aligned on a file system block boundary). The call should really `zero out' the part of the file from the given offset to the end of offset+size: int wzero(int fd, int offset, size_t size) where the zeroing would be done by freeing allocated blocks whenever this region spans full blocks; the system call itself is then immune to changes in the file system representation (would work over extents, e.g.). You could then reduce this system call to lseek(fd, offset, 0), write(fd, buffer_containing_zeroes, size) by simply making write() notice blocks of zeroes. (But you then have to fix /etc/restore :-) ) -- In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 7163) Domain: chris@mimsy.umd.edu Path: uunet!mimsy!chris
jmm@eci386.uucp (John Macdonald) (05/19/89)
In article <461@anvil.oz> michi@anvil.oz (Michael Henning) writes: > >4.3 BSD has a truncate() system call ... >I would like to know why this has not been generalised to allow >*any* block of a file to be released. ... >... Is there any reason not to have a system >call something like: > > release(fd, offset, num_blocks) > int fd; > off_t offset; > unsigned num_blocks; > >... > > Michi. There is one very good reason - the size of a block is not constant. Some systems have a single system-wide block size (but different instances might use different sizes), and other systems can support different block sizes for different file systems simultaneously. The end of file position is kept as a byte offset, and its position relative to the underlying block layout used is handled by the file system code. To handle the block layout mapping transparently for a release call could be done, but would be painfull. To use your suggested system/file-system dependent format would lead to obscure bugs when the program gets the block size wrong, and to awkward code to determine when a partial block release means that the entire block could now being released. Any method chosen that has user code based upon the underlying mapping of bytes in a file to the physical storage units would interfere with the freedom to make future changes in that mapping. (If your proposed release call above had been in effect early enough, there would have been much more difficulty in going from 512 byte blocks to larger blocks. If it came into effect somewhat later, there would have much more difficulty in introducing file-system-switch and per-file-system-blocksize alternatives.) A workable mechanism would be to have the system calls: release(int fd, off_t offset, off_t num_bytes) and frelease(FILE *fp, off_t offset, off_t num_bytes) (or a version without the offset argument that work relative to the current seek position would be usable alternatives) which would only guarantee that future reads of the file will return all null bytes within the specified range, while permitting the kernel to release any full blocks within the range, and possibly (if the kernel implementors were ambitious) also release blocks that overlap the ends of the range which have become entirely null bytes as after the required zeroing. The only difference between this and (seek to offset, write num_byte long null buffer) would be the strong hint to the kernel that blocks ought to be released.
cowan@marob.MASA.COM (John Cowan) (05/20/89)
In article <461@anvil.oz> michi@anvil.oz (Michael Henning) writes: > >I would like to know why [truncate()] has not been generalised to allow >*any* block of a file to be released. For example, in random access files >such as B-trees, on deletion, one would like to get rid of a disk block. >Because there is no way to "shrink" a file under UNIX, B-tree packages are >forced to either keep a list of free'd blocks for reuse, or to copy the entire >tree to get rid of the unused space. Is there any reason not to have a system >call something like: > > release(fd, offset, num_blocks) > int fd; > off_t offset; > unsigned num_blocks; > >The idea is to specify that num_blocks are to be released beginning at >the specified offset (which must be aligned on a file system block boundary). The idea may be a good one, but I don't like the syntax/semantics of this call. It is better not to embed into programs knowledge about how big a "block" is. Here's a different proposed function: nullify(int fd; long size) (note ANSI-ism) which causes "size" null bytes to be written to the file at the current position. The kernel then computes which blocks, if any, can be freed, and which blocks must have actual zero bytes written in them (at most two, the one at the beginning of the nullified region and the one at the end). Note that this does not remove the need for B-tree and suchlike programs to keep their own free-lists. There is no other way to know which parts of your space are semantically empty and which parts contain information. It simply avoids taking up more disk space than necessary. Also note that this operation is fundamentally different from truncate(), which changes the >size< of a file. These operations change the content of a file, not its size. -- John Cowan <cowan@marob.masa.com> or <cowan@magpie.masa.com> UUCP mailers: ...!uunet!hombre!{marob,magpie}!cowan Fidonet (last resort): 1:107/711 Aiya elenion ancalima!
jfh@rpp386.Dallas.TX.US (John F. Haugh II) (05/20/89)
In article <461@anvil.oz> michi@anvil.oz (Michael Henning) writes: >4.3 BSD has a truncate() system call to get rid of trailing blocks at the end >of a file. I believe that this was introduced to allow certain FORTRAN >libraries to work under UNIX, which depended on that feature. >I would like to know why this has not been generalised to allow >*any* block of a file to be released. For example, in random access files >such as B-trees, on deletion, one would like to get rid of a disk block. I believe fclear() allows you to de-allocate blocks from random locations within a file. I think the syntax is like fclear (fd, whence, nbytes); where fd is the open file descriptor, whence is the file offset and nbytes is the number of bytes worth of hole to create. This may be wrong. The manuals are at work. This will not truncate the length of a file, as I recall. However ftruncate() is available for that situation. Oh - I'm not an AIX developer. I just play one on the net :-) You will want to read your manuals for more accurate information. -- John F. Haugh II +-Button of the Week Club:------------- VoiceNet: (512) 832-8832 Data: -8835 | "AIX is a three letter word, InterNet: jfh@rpp386.Cactus.Org | and it's BLUE." UucpNet : <backbone>!bigtex!rpp386!jfh +--------------------------------------
rcodi@chudich.co.rmit.oz (Ian Donaldson) (05/20/89)
In article <461@anvil.oz>, michi@anvil.oz (Michael Henning) writes: > call something like: > > release(fd, offset, num_blocks) > int fd; > off_t offset; > unsigned num_blocks; > > The idea is to specify that num_blocks are to be released beginning at > the specified offset (which must be aligned on a file system block boundary). I would prefer to see num_blocks be in terms of bytes, not blocks. A block size is filesystem dependent (although fstat(2) and stat(2) on BSD tell you how large it is). Also no other current UNIX system calls take arguments in terms of blocks. Of course if you release the space in a file on byte boundaries that do not fall exactly on block boundaries then some blocks won't be releaseable, hence the call will have a lesser effect than intended. If the semantics of the call were right, it would be possible to implement ftruncate() as a library call that uses release() internally. Also it would be possible to implement infinite length FIFO's as ordinary files. You could release the space at the *beginning* of the file after it has been read. You might need extra support to determine where the new logical beginning of the file *is* however. Its FIFO use may be limited by the fact that the new logical start position would be limited to a block boundary unless FIFO byte pointers were installed in the inode or something, which probably means creating a new FIFO file type (as opposed to named pipes). The implimentation could be similar to that of named pipes. But named pipes would differ in that the data is lost after a reboot or close. named fifo's would retain the data like ordinary files. In fact you could even implement named pipes as a special case of named fifo's. (cf: BSD that does it as a special case of sockets) This is all tied in with the ability to copy holey files around, which is a facility that is also currently lacking. eg: dump/tar/cpio/cp could use this facility *now* and make themselves more functional and portable: dump would no longer be required to read raw filesystems to get the info. cp would no longer allocate more disk space in the destination of a holey file copy. Ian D
guy@auspex.auspex.com (Guy Harris) (05/20/89)
>4.3 BSD has a truncate() system call to get rid of trailing blocks at the end >of a file. I believe that this was introduced to allow certain FORTRAN >libraries to work under UNIX, which depended on that feature. They depended on being able to do something like that, and the hack used prior to that - open a new file, copy the first N bytes of the old file to that file, close the files, and rename the new file on top of the old file (or some such), was kind of gross and slow. >I would like to know why this has not been generalised to allow >*any* block of a file to be released. In the case of 4.3BSD, perhaps nobody got around to it, and not enough people asked for it to push somebody into getting around to it? I think AIX has precisely such a function, called "fclear"; it's defined to zero out a specific range of bytes in the file, and will do so by putting "holes" into the files whenever it can. This may appear in S5R4 under the guise of F_FREESP, as may the "ftruncate" function. Provision for a "punch a hole in a file" function of that sort also appears in drafts of the NFS Version 3 protocol; if the file system can't punch holes in the file, it must write zeroes to that region of the file. (The same would presumably be true of local file systems put under the S5R4 VFS mechanism that can't support files with holes; I think, for instance, SGI's extent-based file system doesn't support holes.)