chris@mimsy.UUCP (Chris Torek) (10/26/88)
In article <8332@alice.UUCP> debra@alice.UUCP (Paul De Bra) writes: >Given a fast CPU, a not-very-intelligent disk controller and the optimal >interleaving and file system gapsize, the performance is roughly linearly >proportional to the block-size. ... Block size is a large, and probably the largest, factor in actual I/O performance on real Unix machines. The BSD Fast File System's cylinder group arrangement does have a non-negligible effect on at least one thing, however: backups speed up by more than the ratio of block sizes when switching from a V7/4.1BSD/SysV style file system. Faster seeks make the effect of cylinder groups less dramatic, but we still have a number of old washtub drives in service (until they fail and would be expensive to fix, as none are under service anymore). >The main reason why block-size is the limiting factor is that both the >OS and the disk-controller have only slightly more work handling an 8k >block than a 1k block. So you don't hit the hardware speed-limit as soon >with larger block-sizes. It would help if the disk drivers were clever, and coalesced adjacent block requests and/or read whole tracks at a time. (A Fuji Eagle with 48 sectors/track is 3 4.3BSD 8K blocks, and with `-a 3' files might be contiguous across one track often enough to make this worthwhile.) -- In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 7163) Domain: chris@mimsy.umd.edu Path: uunet!mimsy!chris
hedrick@geneva.rutgers.edu (Charles Hedrick) (10/26/88)
>Given a fast CPU, a not-very-intelligent disk controller and the optimal >interleaving and file system gapsize, the performance is roughly linearly >proportional to the block-size. Two problems: (1) To get really good performance, you have to use a block size so large that you waste lots of disk space on small files. The BSD file system can split the last block in a file into fragments. You get the benefits of a big block size on all but the last block, and don't waste the disk space. (2) System V (at least SVr2, and I think also SVr3) uses a free list, which it does not keep in order, so an active file system fragments very soon. The BSD file system is designed to avoid fragmentation. Of course this problem will not show if you do your tests right after creating the file system. The BSD fast file system is more than just larger block sizes. It makes sense for SVr4 to support both the old and new, to allow people to move between SVr3 and SVr4 during testing and conversion. However once you have committed to SVr4, I'd think you would want to move to the fast file system.
henry@utzoo.uucp (Henry Spencer) (10/28/88)
In article <Oct.25.22.42.50.1988.1890@geneva.rutgers.edu> hedrick@geneva.rutgers.edu (Charles Hedrick) writes: >(2) System V (at least SVr2, and I think also SVr3) uses a free list, >which it does not keep in order, so an active file system fragments >very soon. The BSD file system is designed to avoid fragmentation. >Of course this problem will not show if you do your tests right after >creating the file system. Or if you run your tests in a time-sharing environment, where the disk heads are always on their way to somewhere else anyway. If you read the fine print, all the Berkeley performance tests were run single-user!! We conjectured a long time ago that the only feature of the 4.2 filesystem that matters much in a timesharing environment is the big block size; I haven't yet seen any solid results (numbers, not anecdotes) that would contradict this. -- The dream *IS* alive... | Henry Spencer at U of Toronto Zoology but not at NASA. |uunet!attcan!utzoo!henry henry@zoo.toronto.edu
bostic@ucbvax.BERKELEY.EDU (Keith Bostic) (10/29/88)
In article <1988Oct27.173247.2789@utzoo.uucp>, henry@utzoo.uucp (Henry Spencer) writes: > Or if you run your tests in a time-sharing environment, where the disk > heads are always on their way to somewhere else anyway. This depends solely on your job mix; I recently saw figures someone derived from trying to decide how best to queue requests for a new disk driver. The sampled system normally showed no head movement between the original request and subsequent/read-ahead requests. If you have a system with an overloaded/limited number of disks, your paradigm is much more likely to be correct. > We conjectured a long time ago that the only feature of the 4.2 filesystem > that matters much in a timesharing environment is the big block size; I > haven't yet seen any solid results (numbers, not anecdotes) that would > contradict this. Given the nebulousness of the word "timesharing", I suspect you never will. --keith
chris@mimsy.UUCP (Chris Torek) (10/29/88)
>In article <Oct.25.22.42.50.1988.1890@geneva.rutgers.edu> >hedrick@geneva.rutgers.edu (Charles Hedrick) notes that >>... The BSD file system is designed to avoid fragmentation [of the free list, eventually resulting in blocks being allocated `at random']. >>Of course this problem will not show if you do your tests right after >>creating the file system. In article <1988Oct27.173247.2789@utzoo.uucp> henry@utzoo.uucp (Henry Spencer) writes: >Or if you run your tests in a time-sharing environment, where the disk >heads are always on their way to somewhere else anyway. If you read >the fine print, all the Berkeley performance tests were run single-user!! >We conjectured a long time ago that the only feature of the 4.2 filesystem >that matters much in a timesharing environment is the big block size; I >haven't yet seen any solid results (numbers, not anecdotes) that would >contradict this. I actually agree with this (in spite of the point others have noted as to the weak definition of `time shared'). But it is important to consider several things. Not the least is that workstations (e.g., Suns) are virtually single-user. Certainly there are servers and daemons running, and you may be multiprocessing, but `on the average' you tend not to have more than one process doing file system I/O. Second, read-ahead blocks are moved into the buffer cache at the same time as the block actually being read; if these are adjacent, the r.a. block will come in more or less immediately, even if the disk then has to move the heads elsewhere for someone else's page-outs. So you get `two for the price of one', as it were. Also---and to us, this is not the least important point---often, when the machine really *is* in single user mode, you will want file reading to be as fast as possible, so that your backups will finish soon and you can allow the next batch of news to flow in. The BSD FFS allocation policies do a fair job of keeping files straight even in the presence of multiprocessed/timeshared writes. In other words, while I think that the large blocks are the most important factor, I am not unhappy about all the rest of it. (After all, *I* did not have to write the code . . . :-) ---and I have not had to do much to `maintain' it, either; Kirk did a good job of the code [think that is a hint, Mike? :-) ]) -- In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 7163) Domain: chris@mimsy.umd.edu Path: uunet!mimsy!chris
jfh@rpp386.Dallas.TX.US (The Beach Bum) (10/30/88)
In article <26599@ucbvax.BERKELEY.EDU> bostic@ucbvax.BERKELEY.EDU (Keith Bostic) writes: >In article <1988Oct27.173247.2789@utzoo.uucp>, henry@utzoo.uucp (Henry Spencer) writes: >> Or if you run your tests in a time-sharing environment, where the disk >> heads are always on their way to somewhere else anyway. > > If you have a system with an >overloaded/limited number of disks, your paradigm is much more likely to be >correct. In the real world, where more than one process is accessing the disks at any given time, the heads are always in the wrong place. If you localize all of the file information for a given file, as the Berkeley Fast File System does, you only need access more than one file to break it. I have never seen a realistic benchmark [ multi-process, multi-file, random access ] validate the claims BSD FFS puts forward - except to the extent that having the larger block size dictates. And soon USG Unix will have 2K blocks so expect that advantage to diminish. -- John F. Haugh II +----Make believe quote of the week---- VoiceNet: (214) 250-3311 Data: -6272 | Nancy Reagan on Richard Stallman: InterNet: jfh@rpp386.Dallas.TX.US | "Just say `Gno'" UucpNet : <backbone>!killer!rpp386!jfh +--------------------------------------
mash@mips.COM (John Mashey) (10/30/88)
In article <8338@rpp386.Dallas.TX.US> jfh@rpp386.Dallas.TX.US (The Beach Bum) writes: ..... >I have never seen a realistic benchmark [ multi-process, multi-file, random >access ] validate the claims BSD FFS puts forward - except to the extent that >having the larger block size dictates. And soon USG Unix will have 2K blocks >so expect that advantage to diminish. I don't have the benchmark either. I do note that when we brought up V.3 on our systems, we started with a vanilla port, intending to put the FFS in later. We did (8K blocks). Overall performance, responsiveness, etc, in a multi-user environment went way up. On 5-10mips machines, the vanilla 1K block SYS V file system was tremendously disk-bound. (Again, I don't have the numbers handy, but I remember what it felt like.) -- -john mashey DISCLAIMER: <generic disclaimer, I speak for me only, etc> UUCP: {ames,decwrl,prls,pyramid}!mips!mash OR mash@mips.com DDD: 408-991-0253 or 408-720-1700, x253 USPS: MIPS Computer Systems, 930 E. Arques, Sunnyvale, CA 94086
friedl@vsi.COM (Stephen J. Friedl) (10/31/88)
In article <8338@rpp386.Dallas.TX.US>, jfh@rpp386.Dallas.TX.US (The Beach Bum) writes: > I have never seen a realistic benchmark [ multi-process, multi-file, random > access ] validate the claims BSD FFS puts forward - except to the extent that > having the larger block size dictates. And soon USG Unix will have 2K blocks > so expect that advantage to diminish. These are available now. System V Release 3.1.1 for the 3B15 has had 2k blocks for some time, and Sys V Rel 3.2.1 for the 3B2 just came out with it. How hard is it for an instantiation of UNIX to support multiple kinds of blocksizes? I would think that keeping the blocksize in the superblock would make it pretty easy, so I could use 1k blocks for root, and (say) 8k for the /database partition with a dozen files all > 1MB. Currently it seems like a big deal for them to come out with a new supported blocksize. Steve -- Steve Friedl V-Systems, Inc. +1 714 545 6442 3B2-kind-of-guy friedl@vsi.com {backbones}!vsi.com!friedl attmail!vsi!friedl ----Nancy Reagan on 120MB SCSI cartridge tape: "Just say *now*"----
djg@sequent.UUCP (Derek Godfrey) (11/01/88)
> How hard is it for an instantiation of UNIX to support multiple > kinds of blocksizes? I would think that keeping the blocksize in > the superblock would make it pretty easy, so I could use 1k blocks > for root, and (say) 8k for the /database partition with a dozen > files all > 1MB. Currently it seems like a big deal for them > to come out with a new supported blocksize. > Not difficult at all since the block size field of the super block is 32 bits wide. The code needed in fs/s5 is minimal - just changing a few case statements to an alogrithm! (assuming you'r willing to increase the size of a system buffer) The biggest effort however is converting all the utilities that are still using BSIZE rather then FsBSIZE.
guy@auspex.UUCP (Guy Harris) (11/01/88)
>How hard is it for an instantiation of UNIX to support multiple >kinds of blocksizes? If you have the right buffer cache mechanism (or moral equivalent; cf. SunOS 4.0, which uses the buffer cache only for control information, using the pagins sytem for data caching), it's not that hard. >I would think that keeping the blocksize in the superblock would make >it pretty easy, so I could use 1k blocks for root, and (say) 8k for >the /database partition with a dozen files all > 1MB. That's basically what the BSD file system does. >Currently it seems like a big deal for them to come out with a new >supported blocksize. That's because they *don't* have the right buffer cache mechanism, and have to hack in a new buffer cache for 2KB file systems (although at least both buffer caches are sort of subclasses of a more general "buffer cache" class, so they do get to share some code). With any luck, S5R4 will have the right buffer cache mechanism, namely the BSD one (i.e., with any luck, they'll put the V7/S5 file system on top of it, rather than having *both* the BSD *and* the V7/S5 buffer cache to support the two different file systems), or moral equivalent (cf. SunOS 4.0, whose VM subsystem will be in S5R4 - which will, like SunOS 4.0, use it for data caching).
crossgl@ingr.UUCP (Gordon Cross) (11/01/88)
In article <6413@daver.UUCP>, dlr@daver.UUCP (Dave Rand) writes: > Why is the System V.[23] file system _SO_ much slower than the > BSD file system? On several systems I have, the disk performance > seems dreadful. On one system, it is around 20K bytes per second. > The best I have seen from System V is 200K per second - but the > actual disk controller is capable of 1.5 megabytes per second! The problem is with the file system organization itself so you will not be able to "fix" it. Any further explanation here would be far too lengthy but I can direct you too an informative article on the subject: A Fast File System for UNIX Marshall Kirk McKusick, William N. Joy, Samuel J. Leffler, and Robert S. Fabry Computer Systems Research Group Computer Science Division Department of Electrical Engineering and Computer Science University of California, Berkeley Berkeley, CA 94720 Gordon Cross Intergraph Corp. Huntsville, AL
sl@van-bc.UUCP (pri=-10 Stuart Lynne) (11/01/88)
In article <917@vsi.COM> friedl@vsi.COM (Stephen J. Friedl) writes: >In article <8338@rpp386.Dallas.TX.US>, jfh@rpp386.Dallas.TX.US (The Beach Bum) writes: }> I have never seen a realistic benchmark [ multi-process, multi-file, random }> access ] validate the claims BSD FFS puts forward - except to the extent that }> having the larger block size dictates. And soon USG Unix will have 2K blocks }> so expect that advantage to diminish. } }These are available now. System V Release 3.1.1 for the 3B15 has }had 2k blocks for some time, and Sys V Rel 3.2.1 for the 3B2 just }came out with it. } My obsolete Callan Unistar running Unisoft 5.0 (a *very* early variant of System V, possibly about release 0 or -1) with vintage binaries from 1983/1984 supports 1, 2 and 4 block file systems (that's .5/1/2 kb). I would suggest that various releases of System V have supported 2k blocks as long as there has been a System V. It just seems up to the porting house as to whether they thought it was needed for a particular machine and worth using. In the case of the Callan they provided the 2kb support for use with SMD drives (although they will work on other drives as well). Unfortunately they shipped the system to generate .5kb blocks for all file systems as the default. You have to gen your own to use either 1kb or 2kb. To make things worse the boot ROM only knows about the .5kb blocks so you are stuck with that for your root partition (it's a fixed size too). At least on a slow 68010 with mediocre drives the difference between 1 and 2kb blocks is not that great (although both were a big improvement over .5kb). I use 1kb to help minimize the impact of the block buffers on my 2MB of RAM. -- Stuart.Lynne@wimsey.bc.ca {ubc-cs,uunet}!van-bc!sl Vancouver,BC,604-937-7532
dave@celerity.UUCP (David L. Smith) (11/01/88)
In article <917@vsi.COM> friedl@vsi.COM (Stephen J. Friedl) writes: >How hard is it for an instantiation of UNIX to support multiple >kinds of blocksizes? I would think that keeping the blocksize in >the superblock would make it pretty easy, so I could use 1k blocks >for root, and (say) 8k for the /database partition with a dozen >files all > 1MB. Currently it seems like a big deal for them >to come out with a new supported blocksize. We support multiple blocks sizes on our new toy (the Model 500), ranging from 4K to 256K (for big striped disks). It was relatively straightforward, except for some problems while doing development with several of the system utilities that depend on the system blocksize for the size of their internal buffers. We also ferreted out quite a few "magic" blocksize numbers.
dwc@homxc.UUCP (Malaclypse the Elder) (11/05/88)
In article <917@vsi.COM>, friedl@vsi.COM (Stephen J. Friedl) writes: > > How hard is it for an instantiation of UNIX to support multiple > kinds of blocksizes? I would think that keeping the blocksize in > the superblock would make it pretty easy, so I could use 1k blocks > for root, and (say) 8k for the /database partition with a dozen > files all > 1MB. Currently it seems like a big deal for them > to come out with a new supported blocksize. > the difficulty that i see is in the maintenance of the buffer cache. do you have separate buffer caches for each size or somehow share them. then there is the modification of file system maintenance programs (have you looked at fsck source lately). not such a big deal but a consideration. danny chen att!homxc!dwc
guy@auspex.UUCP (Guy Harris) (11/06/88)
>> How hard is it for an instantiation of UNIX to support multiple >> kinds of blocksizes? ..., >the difficulty that i see is in the maintenance of the >buffer cache. do you have separate buffer caches for >each size or somehow share them. Share them. See 4.[23]BSD.