banderso@sagpd1.UUCP (Bruce Anderson) (01/11/89)
We are in the process of switching over from an HP9000/540 which uses a "Structured Directory Format" (proprietary HP) file system where everything (files, inodes and swap space) are dynamically allocated, to an HP9000/835 which uses a Berkeley file system with everything statically allocated at system configuration and I am wondering if anyone has any data on the effects of some of the possible configuration tradeoffs. First, I gather that using multiple disk sections is supposed to increase speed but from a first glance it appears to do this by giving you 5 or 6 places to run out of disk space rather than just one and if you have a file system which is just a little bigger than one of the sections you can waste incredible amounts of space (for example if you have 31 MB of data you may have to use a 110 MB section rather than a 30 MB section). My first question is: how much speed do you gain by breaking everything up into small chunks on the disk rather than using it as just one (or possibly two) very large blocks? My second question is: how much effect does changing the block and fragment size have? The manual says that if you use an 8K block and fragment size it speeds up the file system but wastes space. Does anyone have a quantitative feel of how much the tradeoff is? When allocating inodes, what kind of ratio of disk space to inodes do people use? The default on the system is an inode for every 2KB of disk space in a file system but this seemed like an awfully high number of inodes. Is it? This is probably HP specific but if you define multiple swap sections, does it fill up the first before starting on the secondary ones or does it use all in a balanced manner? If the first then obviously the primary swap space should be on the fastest drive but otherwise it doesn't matter. Any information would be appreciated. Post or mail as you wish. Bruce Anderson - Scientific Atlanta, Government Products Division ...!sagpd1!banderso
dhesi@bsu-cs.UUCP (Rahul Dhesi) (01/12/89)
In article <310@sagpd1.UUCP> banderso@sagpd1.UUCP (Bruce Anderson) writes: >First, I gather that using multiple disk sections is supposed to >increase speed... I've heard this said, but I don't see why breaking up a disk into pieces will speed up access. The only exception I can see is the rare case when you have a big partition containing files that are almost never accessed. If this partition is at the end of the disk the disk head almost never has to travel that far. The main (4.3BSD-specific) reasons for having multiple disk partitions are: (a) dump and restore work on entire partitions, so the smaller a partition the more flexible your backup procedures can be; (b) filesystem parameters can be individually adjusted for partitions in case you want to use different block sizes etc.; (c) to do swapping on a disk you have to have a partition dedicated to that; and (d) you can protect the rest of the disk from filling up by giving a directory like /usr/tmp (or /a/crash :-) its own filesystem. I think the *most* popular reason for having disks partitioned in a certain way is because that's how the operating system was configured when you got it and it's too much trouble to change it. That certainly is why we have our disks partitioned the way they are.
breck@aimt.UU.NET (Robert Breckinridge Beatie) (01/12/89)
In article <5324@bsu-cs.UUCP>, dhesi@bsu-cs.UUCP (Rahul Dhesi) writes: > In article <310@sagpd1.UUCP> banderso@sagpd1.UUCP (Bruce Anderson) writes: > >First, I gather that using multiple disk sections is supposed to > >increase speed... > > I've heard this said, but I don't see why breaking up a disk into > pieces will speed up access. The only exception I can see is the rare > case when you have a big partition containing files that are almost > never accessed. If this partition is at the end of the disk the disk > head almost never has to travel that far. I interpreted his question as referring to cylinder groups in the BSD Fast File System. There are two (performance related) reasons that I can think of for Cylinder Groups. How effective Cylinder Groups are, I cannot say. The BSD file system certainly seems faster than the old style file system, but how much of that is due to the 8K (fs_bsize actually) block size and how much is due to improved locality of reference resulting from Cylinder Groups? First: I think the BSD file system attempts to keep inodes that are all referenced by the same directory in the same cylinder group. This way when you stat(2) all the files in a directory the inodes that the system will have to read will probably be (somewhat) closer together. Second: If the file system manages to keep all the blocks for a file in the same cylinder group as that file's inode, then the seek distance from inode to file-data will (typically) be smaller than in the old-stype file system. I'm not sure how big a win this is, since under the BSD file system, the disk heads will have to seek across cylinder groups all the time. -- Breck Beatie (408)748-8649 {uunet,ames!coherent}!aimt!breck OR breck@aimt.uu.net "Sloppy as hell Little Father. You've embarassed me no end."
larry@macom1.UUCP (Larry Taborek) (01/13/89)
From article <310@sagpd1.UUCP>, by banderso@sagpd1.UUCP (Bruce Anderson): >My second question is: how much effect does changing the block and >fragment size have? The manual says that if you use an 8K block and >fragment size it speeds up the file system but wastes space. Does >anyone have a quantitative feel of how much the tradeoff is? I kept some old copies of 4.2BSD documentation from my old job and in Volume 2 of the documentation they have a section on the 4.2BSD file system (A FAST FILE SYSTEM FOR UNIX, McKusick, Joy, Leffler, and Fabry) From it I have select the following: Space used % Waste Organization 775.2mb 0 Data only, no seperation 807.8 4.2 Data only, 512 byte boundry 828.7 6.9 512 byte Block 866.5 11.8 1024 byte block 948.5 22.4 2048 byte block 1128.3 45.6 4096 byte block It also states: "The space overhead in the 4096 (byte block) / 1024 (byte fragment) new file system organization is empirically observed to be about the same as in the 1024 byte old file system organization." ... "The net result is about the same disk utilization when the new file systems fragment size equals the old file systems block size." Thus by determining your fragment size, you can compare it to the table above to determine your amount of wasted space. You can also determine wether you have 2, 4, or 8 fragments per block, but I believe that 4 is about right. To high a fragment to block count (8), and the data from fragments may have to be copied up to 7 times to rebuild into a block (this would happen when a file would grow beyond the size that 7 fragments could hold, and the file system would copy these fragments into a block). To low a fragment to block count (2), and the block/fragment concept isn't helping very much. They also post a table that seems to show me that there is not all that much difference between a 4K block FS and 8K block FS in speed differences. Instead, they state that the biggest factor that helps speed things up is keeping at least 10% of the partition free. >When allocating inodes, what kind of ratio of disk space to inodes >do people use? The default on the system is an inode for every 2KB >of disk space in a file system but this seemed like an awfully high >number of inodes. Is it? It depends. The number of inodes to the size of the file system default is meant as a good rule of thumb for most partitions. If you plan on holding usenet information on a partition (lots of small files), then you may wish to lower this to 1KB of disk space in file system to inode. On the other hand, if you have a few very large files filling a partition, then you may wish to raise the parameter to 8KB of disk space in file system per inode. When you look at this sort of problem, you begin to understand why there are partitions, and what use they satisfy. >This is probably HP specific but if you define multiple swap sections, >does it fill up the first before starting on the secondary ones or >does it use all in a balanced manner? If the first then obviously >the primary swap space should be on the fastest drive but otherwise >it doesn't matter. What I noticed on BSD systems I used to administer was that the SECOND swap area was used exclusively until it filled, and then the swap overflow went to the first. To me, this made sense as the second swap area was on our second physical disk, which generally has less i/o then the first physical disk is expeced to have. (Any comments to this are appreciated). -- Larry Taborek ..grebyn!macom1!larry Centel Federal Systems larry@macom1.UUCP 11400 Commerce Park Drive Reston, VA 22091-1506 703-758-7000
chris@mimsy.UUCP (Chris Torek) (01/14/89)
In article <4787@macom1.UUCP> larry@macom1.UUCP (Larry Taborek) writes: >... You can also determine wether you have 2, 4, or 8 fragments per >block, but I believe that 4 is about right. To high a fragment to >block count (8), and the data from fragments may have to be copied >up to 7 times to rebuild into a block (this would happen when a file >would grow beyond the size that 7 fragments could hold, and the file >system would copy these fragments into a block). To low a fragment to >block count (2), and the block/fragment concept isn't helping >very much. This is not quite how things work; and there is not too much reason to worry about fragment expansion in 4.3BSD. (It *is* a problem in 4.2BSD if you use `vi', for instance, although just how much so varies.) Only the last part of a file ever occupies a fragment. When extending a file, the kernel decides whether it needs a full block or whether a fragment will suffice. If a fragment will do, the kernel looks for an existing block (in the right cg) that is already appropriately fragmented. If one exists and has sufficient space, it is used; otherwise the kernel allocates a full block and carves it up. In 4.3BSD, Kirk added an `optimisation' flag (space/time; tunefs -o) which is normally set to `time'. The kernel automatically switches it to `space' if the file system becomes alarmingly fragmented, then back to `time' when things are cleaned up. This flag does not exist in 4.2BSD; in essence, 4.2 always chooses `space'. Now, when expanding a file that already ends in a fragment to a new size that can be a fragment, if the flag is set to `space', the kernel uses the usual best-fit search. But if the flag is set to `time', the kernel finds a fragment that can be expanded in place to a full block, or takes a full block if no such fragments exist. All of this affects only poorly-behaved programs that write files a little bit at a time. In 4.2BSD, vi always wrote 1024 bytes, which in a 4k/1k file system is as bad as possible. It was possible for every write system call to have to allocate a new set of fragments, copying the data from the old fragments to the new. In 4.3BSD, even such programs only lose once per fragment expansion, because the next three (in a 4:1 FS) can always be done in place (provided that fs->fs_optim is FS_OPTTIME). vi was fixed in 4.3BSD to write statb.st_blksize blocks. (And enbugged at the same time: if st_blksize is greater than the MAXBSIZE with which vi was compiled, it scribbles over some of its own variables. I keep telling them that compiling in MAXBSIZE is wrong.... Yes, it *does* break, if you speak NFS with a Pyramid for instance.) [and on paging:] >What I noticed on BSD systems I used to administer was that the >SECOND swap area was used exclusively until it filled, and then >the swap overflow went to the first. To me, this made sense as the >second swap area was on our second physical disk, which generally >has less i/o then the first physical disk is expeced to have. (Any >comments to this are appreciated). No: Swap space is created in dmmax-sized segments scattered evenly across all paging devices; its allocation approximates a uniform random distribution. (See swfree() in /sys/sys/vm_sw.c and swpexpand() in /sys/sys/vm_drum.c.) -- In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 7163) Domain: chris@mimsy.umd.edu Path: uunet!mimsy!chris