[mod.computers.vax] disk compresses

macmillan%wnre.aecl.cdn%ubc.CSNET@RELAY.CS.NET (John MacMillan) (12/01/86)

We have about 12 RA81's in a VAX cluster (8650, 785 and 2 750's). The user
population is around 400. The main programs are large simulations.

Disk fragmentation has become a problem, requiring disk compresses every
two weeks. Wierd program behaviour (like taking 3 to 4 times to run) was
observed before one of the compress sessions.

No-one is happy about losing the entire cluster for most of a day (10 hours+)
while the compress goes on (ie. operators working overtime, users, system
manager trying to keep costs down, etc.)

Is our situation normal? How often do other sites compress their disks?
What is the experience of other sites? Are there general guidelines to
reducing the frequency of compresses? What is the optimal way to organize
disks? (ie. number of user disks, system disks, scratch disks, where files
are stored, etc.)

We are about to retire a DECsystem-10 model KL. Ironically, disk compresses
on it were a rarity. No problems. 

So why does the VAX give so much trouble and what is the best way to deal with
it?

Thanks..

John MacMillan
Atomic Energy of Canada
Whiteshell Nuclear Research Establishment
Pinawa, Manitoba, Canada R0E 1L0

(204) 753-2311 x2539

cetron@utah-cs.arpa (Edward J Cetron) (12/02/86)

given that you have 12 ra81's, and disk compress is needed every 14 days, I
see no reason that the whole cluster must go down.....

My suggestions:

	1. Dedicate 1 ra81 for daily temporary files - indicate that every
night at say 4:00 am, this disk will be completely flushed....

	2. Every night at 4:02 am, run a script (woops, wrong OS), an indirect
or batch, command file to do a full backup from one of the 11 other ra81's to
the 12th ra81 (in effect doing a compress) and the reverse the process to 
restore the first ra81 to a compressed state also.  (an option is to access
the ra's by logical name only and rotate the logical names so that you need
do only one backup.....)

	3. have the command file reschedule itself as well as indicating the
next disk of the other 11.... this will give you 1 compressed disk each day
and the full series in 12 days (slightly better than two weeks) and the cluster
can be available (minus 2 disks at some points) all day..

-ed cetron
Center for Engineering Design
Univ. of Utah

SEDAYAO@sc.intel.com.UUCP (12/04/86)

I read in DIGITAL REVIEW, November 24, 1986, about a product by Executive
Software, Inc., that supposedly defragments disks in the background, even
while the disk is being used.  The product is called Diskeeper, and costs
$750-$2500 depending on the VAX.  Now I have never used it, and could not
really comment on its actual performance, but the article seems to be very
positive about it.

Jeff Sedayao

CSNET:  sedayao@sc.intel.com
UUCP:  {hplabs,decwrl,oliveb,amdcad}!intelca!mipos3!td3cad!sedayao

Disclaimer - My opinion and not my employers, also not an endorsement

LEICHTER-JERRY@YALE.ARPA (12/04/86)

    We have about 12 RA81's in a VAX cluster (8650, 785 and 2 750's). The user
    population is around 400. The main programs are large simulations.
    
    Disk fragmentation has become a problem, requiring disk compresses every
    two weeks. Wierd program behaviour (like taking 3 to 4 times to run) was
    observed before one of the compress sessions.
    
    No-one is happy about losing the entire cluster for most of a day (10
    hours+) while the compress goes on....
    
    Is our situation normal? How often do other sites compress their disks?
    What is the experience of other sites? Are there general guidelines to
    reducing the frequency of compresses? What is the optimal way to organize
    disks? (ie. number of user disks, system disks, scratch disks, where files
    are stored, etc.)

    ....
Let's step back and look at how VMS disk space allocation works.  On each
disk, there's a bit map, with one bit per cluster, where a cluster is a group
of c consecutive disk blocks.  (You set c, the cluster size, when you
initialize the disk.)  Bits are clear for unused blocks, set for used blocks -
or the other way around; I don't recall, and it doesn't matter for our
purposes.  When an allocation request is made for k blocks, the bit map is
scanned for k/c consecutive 0 bits, and the corresponding blocks are used.
When blocks are freed, the corresponding bits are cleared.  Note that there is
no record of where the boundaries between previously-allocated groups of
blocks were placed - free segments are implicitly merged.

(Actually, in a cluster things are a bit more complex.  Each cluster member
keeps has its own copy of the bitmap.  The copies must obviously be kept in
agreement.  To avoid overhead, each member pre-allocates some number of
clusters, which it can now use for local requests without having to inform the
other members.  If it needs more blocks than it has pre-allocated, it has to
coordinate with the other cluster members; a process called something like
CACHE_SERVER runs on each cluster member and does this coordination.  If a
disk isn't dismounted properly - if the system crashes, for example - the
pre-allocated blocks are lost.  That's what the rebuild operation during
MOUNT is all about - it scans all the file headers and builds an up-to-date
allocation bitmap.

If you start with an unfragmented disk, allocate a lot of files, and then
delete them all, you will end up with exactly the same free-space configura-
tion you started with.  You can only get fragmentation if you interleave
the allocations of two sets of files, A and B, then delete all the A files
while retaining all the B files.  As long as the B files are there, they
will be splitting up - fragmenting - the free space that was part of the
B files.

I've over-simplified by talking about files.  In fact, files often grow
dynamically.  What matters is not files but file extents.  Suppose I
open two files A and B, then alternately write to each of them, allowing
each to extend until the disk is full.  A and B's extents - contiguous
groups of blocks - will alternate.  If I now delete A, the disk will be
terribly fragmented by all the pieces of B.

Tying all this together, we can make a couple of rules to minimize disk
fragmentation:

	1.  Put temporary files and more permanent files on separate disks.
		(Most studies of running systems show that files fall into
		two classes:  Those that stick around for a short time, and
		those that stay essentially forever.  The exact values of
		"short time" and "essentially forever" will depend on the
		particular mix of programs.  You probably have a large popu-
		lation of files that exist as long as typical simulation runs,
		say several hours, plus many files, like source files, that
		stick around for weeks to years.)
	2.  Pre-allocate files to your best estimate of their maximum size.
		This will tend to keep them to a small number of large ex-
		tents.  (It will also improve performance - extending a file
		is an expensive operation.)
	3.  For files that you can't pre-extend, but know will grow, use a
		large extension quantity.  (This is the number of blocks by
		which RMS will extend the file when it needs to.  See the RMS
		documentation for more information.)
	4.  Similarly, on disks that will contain mainly large files, use a
		large cluster size.  This will gain performance in almost
		every way at a modest cost in wasted disk space.  (The space
		wasted is, on average, (c-1)/2 per file for cluster size c.
		For disks with a lot of small files, this is likely to be
		a problem; for disks with a small number of large files, it
		is usually irrelevant.)
	5.  Keep disks with a lot of allocation and deletion from filling
		up.  Fragmentation increases rapidly for such disks when
		they fill - a similar phenomenon occurs in hash tables.  An
		active disk that's 95% full will be very fragmented; if the
		same disk were 75% full, the merging of free segments would
		be much, much more effective and the fragmentation wouldn't
		(on average, assuming "random" allocations and deletions, not
		the kind of worst-case examples I gave above) be bad at all.
		As with hash tables, the curve gets pretty steep at large
		"fill factors", so if the disks are quite full, every little
		bit helps.

You may find that you still have to compress the disks, though perhaps not
as frequently.  If you can afford a spare disk, you can avoid taking the
whole cluster down.  (Considering typical salaries, it doesn't take much
down time to cost as much as an extra disk!)  With a spare disk, you can do
a disk-to-disk backup (and compress) of one disk at a time.  Only those users
who use that particular disk are unable to use the system during the backup,
which is simple and doesn't take long (an hour and a half at most).  That's
what we do here; I got the idea from the system manager of a large cluster at
DEC.

Note that you should be sure to avoid references to particular physical disk
drives; any given volume will migrate around from drive to drive as backups
are done.  This is exactly what concealed device names are for!

You can minimize inconvenience by not putting users' default directories
on the same disk with those large simulation files.  Since it will be the
latter disks that need most frequent compression, users would still be able
to log in - they might just not be able to run some programs.  (In any case,
this is part of the policy of separating disks with "permanent" files from
those with "temporary" ones where possible.)

For this policy to be effective, you shouldn't have to compress the system
disk, as that DOES require taking the whole cluster down.  This means that
you should have no active file creation/deletion on the system disk - just
system files.  That's a good idea anyway.

Also, you won't be able to do this if you use volume sets; you'd need as many
free spindles are you have disks in the largest volume set.  This is one
aspect of the most serious liability of volume sets - you have to back up
whole sets at once.
							-- Jerry
-------

carl@CITHEX.CALTECH.EDU.UUCP (12/07/86)

The single parameter with the greatest impact on disk fragmentation is the
disk's storage cluster factor.  If you've got mainly applications that generate
LARGE files, the default cluster factor of 3 (or for uVMS, 1) is simply too
small.  Try reinitializing your disks with a larger cluster factor next time
you do the compression.