chris@mimsy.UUCP (Chris Torek) (12/17/87)
Personally, I have never cared whether my files were contiguous. All I care about is that they be reasonably fast to access. Completely contiguous files are hard to build. You must either declare their size initially or else constantly rearrange disk blocks. Contiguous files tend to be fast to access, but if your OS has a heavy multiuser load, it will often be using several files at once, destroying the effect created by the contiguous files. Extent based file systems are usually implemented with a small maximum number of extents. This means that the file size must (implicitly, by extent size) be declared in advance, if the file may be large. Many contiguous file allocators actually work with extents, since this allows a contiguous file to be grown in place. Many extent based file systems force user code to work with extents, rather than with bytes; this has a tendency to make programs inflexible (they deal either with `files' or with `terminals' but not both). Some systems solve this fairly well with I/O libraries. The Unix file interface does not prevent a system from doing contiguous or otherwise well-scheduled allocation, although it does make it difficult to declare a file's size in advance. I consider this a feature, not a bug: very often I have no idea how big the file will be. This facilitates using filters and pipes, where the input file size is often unavailable. The 4.2/4.3BSD block allocator does a fairly good job of arranging the disk, although it uses more compute power than one would like on, e.g., a Vax 11/750. Other allocators exist that also do a good job; the PJW file system in Research Unix is supposed to work well (I have no firsthand experience). All of these manage without leaking the internal disk organisation out to user programs, which means said organisation can also be changed relatively easily. -- In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 7690) Domain: chris@mimsy.umd.edu Path: uunet!mimsy!chris
gupta@cullsj.UUCP (Yogesh Gupta) (12/18/87)
In article <9828@mimsy.UUCP>, chris@mimsy.UUCP (Chris Torek) writes: > Personally, I have never cared whether my files were contiguous. All > I care about is that they be reasonably fast to access. > I definitely agree. However, I find that if I create a 100MB file under Unix (BSD 4.2, System V Rel 1), the overhead in randomly accessing various parts of it is too high (due to indirect inode structures). Any comments? What if I know that my file will be of the order of 100MB and then declare an extent size of 1MB (wastage of 0.5%, on the average) in an extent based file system? -- Yogesh Gupta | If you think my company will let me Cullinet Software, Inc. | speak for them, you must be joking.
jlc@didsgn.UUCP (jlc) (12/18/87)
In article <9828@mimsy.UUCP>, chris@mimsy.UUCP (Chris Torek) writes: > Completely contiguous files are hard to build. You must either > declare their size initially or else constantly rearrange disk > blocks. Contiguous files tend to be fast to access, but if your > OS has a heavy multiuser load, it will often be using several files > at once, destroying the effect created by the contiguous files. > > Extent based file systems are usually implemented with a small > maximum number of extents. This means that the file size must > (implicitly, by extent size) be declared in advance, if the file > may be large. If you are interested in contiguous file, here is an example of how it is done on UN/V which is a real version of system V designed by Charles River Data. I have personnaly used this implementation for real time and image processing software and it very very fast. (All the following is copyrighted by Charles River Data System, etc etc....) The following is extracted out of the open() subroutine manual entry NAME open() opens a file for reading or writing FORMAT #include <fcntl.h> int open (path, oflag [, omodes [ , minext, maxext ] ]) char *path; int oflag; int omodes, minext, maxext; DESCRIPTION open() opens a file descriptor for the named file and sets the file status flags according to the value of o_flag. path points to a pathname naming a file. o_flag values are composed by ORing values from the following, which are declared in <fcntl.h> (only one from the first three): O_RDONLY Open for reading only. O_WRONLY Open for writing only. O_RDWR Open for reading and writing. { some verbiage cut here } O_CONTIG Force a contiguous file (a file which is allocated disk space in large extents). If the file is truncated or created by this open, it becomes a contiguous file with an initial allocated extent of extend_size bytes. Otherwise, the file must already be contiguous and have extent_size bytes available for growth (a new extent is allocated if necessary) or the open will fail. If O_CONTIG is not specified, contiguous files can be opened just like any other file. The following is extracted out of the command manual: contig UNOS COMMANDS MANUAL contig NAME contig make a contiguous file FORMAT contig [extent=range] [alloc=rangelist] [-freeze] [-truncate] file ... DESCRIPTION contig makes a contiguous file (a file which owns disk space in large extents rather than in single blocks). If the file named already exists, it must be truncated in order to become contiguous; this command saves the old contents of the file (in a file in /tmp or in main memory, depending on size) and writes it back later. Contiguous files have two parameters, min and max, which govern allocation of disk space as the file grows. Whenever a write past the end of the space currently used by the file is done, the file system allocates a new extent of at least min blocks and at most max blocks (or at most large enough to contain the write, if that's larger). The write fails if there is no free extent of at least min blocks. There are system default values for min and max, which are used if the file's min and max are -1. All allocation is inhibited whn the max value is 0. OPTIONS extent=range Sets the file's min and max allocation parameters, but does not cause any space to be allocated. range is a number (sets both min and max) or a pair of numbers separated by '-' (e.g. extent=1024-4096 sets min to 1024 and max to 4096). The number(s) can end in 'b' or 'k' to indicate units of blocks (times 512) and kilobytes (times 1024). This multiplier will be applied to both min and max even if only max specifies it, so extent=10-20b means extent=10b-20b. If extent= is omitted, min and max are set to -1, which causes the system default values to be used. alloc=rangelist Allocates one or more extents of disk space to the file. rangelist is a range (as above) or a list of ranges separated by ',' (e.g., alloc=10b,10b causes the file to have two ten block extents). If alloc= is omitted, contig tries to get a single extent to hold the current file contents (if any) by assuming alloc=512-filesize. -freeze Marks the file to prevent any further allocation, by setting exten to 0. Any attempt to write past the end of the allocated space will fail. Since the filesize may indicate the file ends anywhere in the last extent, it is still possible to extend the file up to the allocation limit. The contig command always unfreezes files if -freeze is not specified. -freeze and extent= are mutually exclusive. -truncate Discards any current file contents and disk space. Truncating a file requires write access to it. EXAMPLES To turn all commands into contiguous files for speed: @>contig /bin/* /sys/* /usr/bin/* To prevent additional allocation of space to a contiguous file: @>contig -freeze able To allocate 5 1000 block extents: @>contig -truncate alloc=1000b,1000b,1000b,1000b,1000b able To cause subsequent allocations to be 300 blocks while allowing allocations down to 100 blocks to be acceptable if 300 blocks is not available: @>contig extent=100-300b able NOTES -freeze does not prevent the file from being truncated, for example by '> filename' in the shell, or the create() subroutine. If a frozen file is truncated it will be frozen at zero length, so no write to it can succeed. Hope this of interest.... ___ _ __ _ ( > _// / ) / _/_ // __/_/> __. ____ --- / . . _. / /_ __. / _ // __. o ____ / / (__(_/|_/ / <__ /___ (_/_(__ (__/ / /_(_/|_<__</_</_(_/|_<_/ / <_ <_/ ...gatech!rebel!didsgn!jlc
trt@rti.UUCP (Thomas Truscott) (12/18/87)
In article <177@cullsj.UUCP>, gupta@cullsj.UUCP (Yogesh Gupta) writes: > ... . However, I find that if I create a 100MB file > under Unix (BSD 4.2, System V Rel 1), the overhead in > randomly accessing various parts of it is too high (due to indirect > inode structures). Any comments? Do you have, or would someone be willing to write, a benchmark to demonstrate this? For example a program that does I/O on a 100 000 byte file vs a 100 000 000 byte file. Sequential I/O certainly seems linear with file size. E.g. I made a 20 000 000 byte and a 2 000 000 file on our Gould which has a 1k/8k BSD filesystem and copying the larger file took "exactly" 10 times longer to copy (0.925 real 0.00 user 0.40 sys per 1 000 000 bytes.) I think random access would be the same. The larger file needs ~ceil(20 000 000/(8192/4)) == 2 indirect blocks, and any reasonable I/O system will keep both in the buffer cache at all times. Rather than making files contiguous I would prefer larger blocksizes. For example blocksize == tracksize lets a smart disk controller eliminate rotational delay. Tom Truscott
ron@topaz.rutgers.edu (Ron Natalie) (12/19/87)
First, there is no such thing as indirect inode. Second, having contiguous files are only good if you are the only one using the drive and you are guaranteeing that you are reading the file in one single pass. Far better is recent Berkeley file systems with decent interleaving and clustering. -Ron
jpdres10@usl-pc.UUCP (Green Eric Lee) (12/20/87)
Expires: Sender: Reply-To: Followup-To: In message <1931@rti.UUCP>, trt@rti.UUCP (Thomas Truscott) says: >In article <177@cullsj.UUCP>, gupta@cullsj.UUCP (Yogesh Gupta) writes: >> ... . However, I find that if I create a 100MB file >> under Unix (BSD 4.2, System V Rel 1), the overhead in >> randomly accessing various parts of it is too high (due to indirect >> inode structures). Any comments? >Sequential I/O certainly seems linear with file size. Yes, because of block caching knarfing the indirect blocks. >I think random access would be the same. Not necessarily. You may need to re-read indirect inodes, or other things of that sort. >The larger file needs ~ceil(20 000 000/(8192/4)) == 2 indirect blocks, >and any reasonable I/O system will keep both in the buffer >cache at all times. BSD, perhaps. I'm at home, so I don't have access to the BSD FFS documentation. But, let's see what Sys V does (quick, flip to back of Bach Book): Well, most Sys V systems that I've seen have 1K logical blocks (physical blocks may be larger). A 100mb file would thus have 100,000 logical blocks. Single indirect would take us up to 266 blocks. Double indirect would take us up to 65802 blocks. We'd be well into the triple-indirect. Now, for sequential access, it doesn't matter, because those triple-indirect blocks will be stored in the block cache. But for random access... consider that we have 390 blocks full of indexes to those 100,000 blocks, not even counting the indirects... and considering that accessing those 100,000 blocks is likely to displace the earliest-accessed of those 390 blocks of indexes from the block cache (because the data blocks will be cached, of course -- the disk algorithms don't know the difference between blocks full of block numbers, and blocks full of data), thus eliminating any cacheing effects... you may average 2-3 disk hits in order to access a block of data, if you access the data randomly enough (note that 4 disk hits is worst-case, first-time you hit a tail block... but it's quite likely that the first of those indirect blocks, at least, is going to stick around in cache for awhile). In other words, a large scale random access program quite possibly will run at half speed. I'd like it, too, if someone benchmarked this (we don't happen to have any file systems handy with 100mb free, or I'd do it myself :-). >Rather than making files contiguous I would prefer larger blocksizes. >For example blocksize == tracksize lets a smart disk controller >eliminate rotational delay. That works only when you have a file system like the BSD FFS which can allocate fragments of blocks. Otherwise, most of the space is wasted. Also note that it makes the job of the fragment allocator much more... fragmentary. Contigious files can greatly speed things, in conjunction with things such as track buffers. For example, allocating the icons contigiously can more than double the speed of pulling up icons under AmigaDOS/Intuition. Unfortunately, contigious allocation greatly increases the complexity of the file system, much more than BSD's fragment system (since there's a fixed number of 256 byte fragments in a 4k-8k block). I'd have to think about it for quite awhile to come up with a contigious allocation scheme that worked, that was transparant to applications, and, most importantly, had adequate speed on currently-existing machines. History is filled with machines that died because of elegant algorithms that sucked up more CPU time than they were worth (an example of such an algorithm is the Multics process scheduling algorithm, which took a dozen factors including phase-of-moon into account when trying to decide what process to run next ;-). -- Eric Green elg@usl.CSNET P.O. Box 92191, Lafayette, LA 70509 {ihnp4,cbosgd}!killer!elg, {ut-sally,killer}!usl!elg "what exactly is a dream... and what exactly is a joke?" -- Syd Barrett
atbowler@orchid.waterloo.edu (Alan T. Bowler [SDG]) (01/05/88)
Extent based file systems can be very nice. If the file is contiguous file access random or sequential is very fast. The in memory representation is very compact. It is easy to change a program to use larger buffers, and this will almost certainly has a positive effect on program and system peformance. The problems occur when you need to grow an existing file. A lot of operating systems have gotten screwed up at this point and given extent based systems an undeserved bad name. Thus we have people in this group making incorrect statements about extent based systems in general, an giving examples from some badly implemented system. So far I have seen the following arguments: 1) You can't grow the file This is not true. You simply allocate another block and add it to the file description. 2) You can only grow the file "n" times. No you simply allocate an extension to the file description and put the new extent descriptor there. So far if you just follow the above rules you will degenerate into a fixed block scheme like Unix systems have classically used. Simply call the file descriptor extensions indirect blocks and you have essentially the same thing. 3) You have to specify an initial and maximum size when creating the file. No, but you can allow the user to do such a thing and be rewarded with a performance improvement. 4) The available space gets fragemented and you can't allocate a file because there is not a big enough chunk available. No, you of course thry to allocate new files contiguous but if you can't simply return the space as a set of chunks. The logical to physical mapping is done by the system so except for performance the user program can't tell. The real thing is that a extent based system requires a reasonably regular file system maintenance proceedure to "compact" multi-extent files into a single extent, so the user doesn't have to worry about this. One approach is to simply dump restore the whole file system once every few months. This in a reasonably adequate approach in most cases, but an even better approach involves running a daemon or other automatically scheduled program that once a day, or once a week simply walks through the system and rearranges files to compact multi-extent files, and arrange coalesce the available space so that large holes are available. This program does not have to do a "perfect" job. If a file is currently busy, it can simply skipp it and go on. It will probably get it the next time. There are real advantages to extent based systems, but like everything else they do not come for free. In this case there is a lot of work required to do a "complete" implementation job. Incomplete extent base systems seem to work for a while, but a user feel like he has hit a brick wall when he runs into a restriction imposed by an inncomplete implementation.
fmayhar@killer.UUCP (Frank Mayhar) (01/09/88)
In article <12210@orchid.waterloo.edu> atbowler@orchid.waterloo.edu (Alan T. Bowler [SDG]) writes: >The real thing is that a extent based system requires a reasonably >regular file system maintenance proceedure to "compact" multi-extent >files into a single extent, so the user doesn't have to worry about >this. One approach is to simply dump restore the whole file >system once every few months. [...] > There are real advantages to extent based systems, but like >everything else they do not come for free. In this case there >is a lot of work required to do a "complete" implementation job. >Incomplete extent base systems seem to work for a while, but >a user feel like he has hit a brick wall when he runs into >a restriction imposed by an inncomplete implementation. Just out of curiousity (I "own" the file management part of an OS that uses an extent-based file system), exactly what do you mean by a "complete" implementation? The OS I work with (and help maintain) is the Honeywell (Bull) CP-6 Operating System. Your remarks (quoted above) regarding file fragmentation are right on the money in our case. It's even worse because the only way we have to effectively de-frag our disks is to do a ful backup/restore, since the level of abstraction we provide at the user level prevents us from having a program (that's not actually part of the OS itself) from working with the disk directly. Not to mention the fact that having a program moving extents around would play hell with our disk I/O caching scheme. (Nothing like having a file directory page in the middle of a FORTRAN source file. :-) -- Frank Mayhar UUCP: ..!{ihnp4,dj3b1}!killer!fmayhar ARPA: Frank-Mayhar%ladc@BCO-MULTICS.ARPA USmail: 680 Grand Ave #201 Long Beach, CA 90814 DeathStarNet: (213) 438-7899
mac3n@babbage.acc.virginia.edu (Alex Colvin) (01/10/88)
See the latest issue of ACM Transactions on Computer Systems (TOCS).