macmillan%wnre.aecl.cdn%ubc.CSNET@RELAY.CS.NET (John MacMillan) (12/01/86)
We have about 12 RA81's in a VAX cluster (8650, 785 and 2 750's). The user population is around 400. The main programs are large simulations. Disk fragmentation has become a problem, requiring disk compresses every two weeks. Wierd program behaviour (like taking 3 to 4 times to run) was observed before one of the compress sessions. No-one is happy about losing the entire cluster for most of a day (10 hours+) while the compress goes on (ie. operators working overtime, users, system manager trying to keep costs down, etc.) Is our situation normal? How often do other sites compress their disks? What is the experience of other sites? Are there general guidelines to reducing the frequency of compresses? What is the optimal way to organize disks? (ie. number of user disks, system disks, scratch disks, where files are stored, etc.) We are about to retire a DECsystem-10 model KL. Ironically, disk compresses on it were a rarity. No problems. So why does the VAX give so much trouble and what is the best way to deal with it? Thanks.. John MacMillan Atomic Energy of Canada Whiteshell Nuclear Research Establishment Pinawa, Manitoba, Canada R0E 1L0 (204) 753-2311 x2539
cetron@utah-cs.arpa (Edward J Cetron) (12/02/86)
given that you have 12 ra81's, and disk compress is needed every 14 days, I see no reason that the whole cluster must go down..... My suggestions: 1. Dedicate 1 ra81 for daily temporary files - indicate that every night at say 4:00 am, this disk will be completely flushed.... 2. Every night at 4:02 am, run a script (woops, wrong OS), an indirect or batch, command file to do a full backup from one of the 11 other ra81's to the 12th ra81 (in effect doing a compress) and the reverse the process to restore the first ra81 to a compressed state also. (an option is to access the ra's by logical name only and rotate the logical names so that you need do only one backup.....) 3. have the command file reschedule itself as well as indicating the next disk of the other 11.... this will give you 1 compressed disk each day and the full series in 12 days (slightly better than two weeks) and the cluster can be available (minus 2 disks at some points) all day.. -ed cetron Center for Engineering Design Univ. of Utah
SEDAYAO@sc.intel.com.UUCP (12/04/86)
I read in DIGITAL REVIEW, November 24, 1986, about a product by Executive Software, Inc., that supposedly defragments disks in the background, even while the disk is being used. The product is called Diskeeper, and costs $750-$2500 depending on the VAX. Now I have never used it, and could not really comment on its actual performance, but the article seems to be very positive about it. Jeff Sedayao CSNET: sedayao@sc.intel.com UUCP: {hplabs,decwrl,oliveb,amdcad}!intelca!mipos3!td3cad!sedayao Disclaimer - My opinion and not my employers, also not an endorsement
LEICHTER-JERRY@YALE.ARPA (12/04/86)
We have about 12 RA81's in a VAX cluster (8650, 785 and 2 750's). The user population is around 400. The main programs are large simulations. Disk fragmentation has become a problem, requiring disk compresses every two weeks. Wierd program behaviour (like taking 3 to 4 times to run) was observed before one of the compress sessions. No-one is happy about losing the entire cluster for most of a day (10 hours+) while the compress goes on.... Is our situation normal? How often do other sites compress their disks? What is the experience of other sites? Are there general guidelines to reducing the frequency of compresses? What is the optimal way to organize disks? (ie. number of user disks, system disks, scratch disks, where files are stored, etc.) .... Let's step back and look at how VMS disk space allocation works. On each disk, there's a bit map, with one bit per cluster, where a cluster is a group of c consecutive disk blocks. (You set c, the cluster size, when you initialize the disk.) Bits are clear for unused blocks, set for used blocks - or the other way around; I don't recall, and it doesn't matter for our purposes. When an allocation request is made for k blocks, the bit map is scanned for k/c consecutive 0 bits, and the corresponding blocks are used. When blocks are freed, the corresponding bits are cleared. Note that there is no record of where the boundaries between previously-allocated groups of blocks were placed - free segments are implicitly merged. (Actually, in a cluster things are a bit more complex. Each cluster member keeps has its own copy of the bitmap. The copies must obviously be kept in agreement. To avoid overhead, each member pre-allocates some number of clusters, which it can now use for local requests without having to inform the other members. If it needs more blocks than it has pre-allocated, it has to coordinate with the other cluster members; a process called something like CACHE_SERVER runs on each cluster member and does this coordination. If a disk isn't dismounted properly - if the system crashes, for example - the pre-allocated blocks are lost. That's what the rebuild operation during MOUNT is all about - it scans all the file headers and builds an up-to-date allocation bitmap. If you start with an unfragmented disk, allocate a lot of files, and then delete them all, you will end up with exactly the same free-space configura- tion you started with. You can only get fragmentation if you interleave the allocations of two sets of files, A and B, then delete all the A files while retaining all the B files. As long as the B files are there, they will be splitting up - fragmenting - the free space that was part of the B files. I've over-simplified by talking about files. In fact, files often grow dynamically. What matters is not files but file extents. Suppose I open two files A and B, then alternately write to each of them, allowing each to extend until the disk is full. A and B's extents - contiguous groups of blocks - will alternate. If I now delete A, the disk will be terribly fragmented by all the pieces of B. Tying all this together, we can make a couple of rules to minimize disk fragmentation: 1. Put temporary files and more permanent files on separate disks. (Most studies of running systems show that files fall into two classes: Those that stick around for a short time, and those that stay essentially forever. The exact values of "short time" and "essentially forever" will depend on the particular mix of programs. You probably have a large popu- lation of files that exist as long as typical simulation runs, say several hours, plus many files, like source files, that stick around for weeks to years.) 2. Pre-allocate files to your best estimate of their maximum size. This will tend to keep them to a small number of large ex- tents. (It will also improve performance - extending a file is an expensive operation.) 3. For files that you can't pre-extend, but know will grow, use a large extension quantity. (This is the number of blocks by which RMS will extend the file when it needs to. See the RMS documentation for more information.) 4. Similarly, on disks that will contain mainly large files, use a large cluster size. This will gain performance in almost every way at a modest cost in wasted disk space. (The space wasted is, on average, (c-1)/2 per file for cluster size c. For disks with a lot of small files, this is likely to be a problem; for disks with a small number of large files, it is usually irrelevant.) 5. Keep disks with a lot of allocation and deletion from filling up. Fragmentation increases rapidly for such disks when they fill - a similar phenomenon occurs in hash tables. An active disk that's 95% full will be very fragmented; if the same disk were 75% full, the merging of free segments would be much, much more effective and the fragmentation wouldn't (on average, assuming "random" allocations and deletions, not the kind of worst-case examples I gave above) be bad at all. As with hash tables, the curve gets pretty steep at large "fill factors", so if the disks are quite full, every little bit helps. You may find that you still have to compress the disks, though perhaps not as frequently. If you can afford a spare disk, you can avoid taking the whole cluster down. (Considering typical salaries, it doesn't take much down time to cost as much as an extra disk!) With a spare disk, you can do a disk-to-disk backup (and compress) of one disk at a time. Only those users who use that particular disk are unable to use the system during the backup, which is simple and doesn't take long (an hour and a half at most). That's what we do here; I got the idea from the system manager of a large cluster at DEC. Note that you should be sure to avoid references to particular physical disk drives; any given volume will migrate around from drive to drive as backups are done. This is exactly what concealed device names are for! You can minimize inconvenience by not putting users' default directories on the same disk with those large simulation files. Since it will be the latter disks that need most frequent compression, users would still be able to log in - they might just not be able to run some programs. (In any case, this is part of the policy of separating disks with "permanent" files from those with "temporary" ones where possible.) For this policy to be effective, you shouldn't have to compress the system disk, as that DOES require taking the whole cluster down. This means that you should have no active file creation/deletion on the system disk - just system files. That's a good idea anyway. Also, you won't be able to do this if you use volume sets; you'd need as many free spindles are you have disks in the largest volume set. This is one aspect of the most serious liability of volume sets - you have to back up whole sets at once. -- Jerry -------
carl@CITHEX.CALTECH.EDU.UUCP (12/07/86)
The single parameter with the greatest impact on disk fragmentation is the disk's storage cluster factor. If you've got mainly applications that generate LARGE files, the default cluster factor of 3 (or for uVMS, 1) is simply too small. Try reinitializing your disks with a larger cluster factor next time you do the compression.