[mod.computers.vax] Disk fragmentation

GWD21T@DGOGWD01.BITNET.UUCP (05/21/86)

I have observed the disk fragmentation produced by XQP (VMS V4.3)
for some time and got the impression that it is unnecessarily high.
For instance the XQP appears to be distributing small files all
over the volume, splitting large free "extents" into ever smaller
fragments. I think that "extent caching" could be the culprit.

Question: Has anyone successfully "tuned" his system to minimize
disk fragmentation? Does it help to turn off extent caching?

Thanks in advance for any hints.
                                        W.J.M.


J.W.Moeller, GWDG, D-3400 Goettingen, F.R.Germany       0551-201516
                                                   GWD21T@DGOGWD01.BITNET

Magill@upenn.CSNET (CETS Operations Manager) (05/24/86)

-----------------------------------------------------------------
I have observed the disk fragmentation produced by XQP (VMS V4.3)
for some time and got the impression that it is unnecessarily high.
For instance the XQP appears to be distributing small files all
over the volume, splitting large free "extents" into ever smaller
fragments. I think that "extent caching" could be the culprit.

Question: Has anyone successfully "tuned" his system to minimize
disk fragmentation? Does it help to turn off extent caching?
----------------------------------------------------------------
I am not conversent with details of XQP, however, these are general observations
about the fragmentation problem. Assuming that you have a specific production 
problem as opposed to general time sharing problems, read the Guide to VAX/VMS
File Applications for specific recomendations, there is an entire section on
file tuning. I assume that clustering rather than "extent caching" is what you 
really need to modify. A related area is the SET RMS_default command.

For a detailed discussion of file system caching read the "Monitor File_
system_cache" command description in the Utility manual. 
Caching is a mechanism for optimizing disk reads (and writes), 
not the cause of fragmentation.

Disk fragmentation is a very relative thing. To some operating systems or
programs it is important. To minimize disk fragmentation on other systems is 
essentially a waste of time and effort. File Fragmentation, on the other hand,
is typically only a concern when one has large files requiring many extents 
which require frequent access by many people, as in a database. 
A single file accessed by a single person once in a while might be "loaded 
faster", but unless it is the "black book" of the company president, you 
cannot justify expending even 1 hour of your time as a system administrator 
to save even a few seconds a month. The size of concern for a file is also 
relative to the capacity and performance of the drive (seek times, rotational 
speed, existance of fixed heads or multiple arms or actuators). 
A large file on an RM03 would be considered a small file on an RA81 for example.

Fragmentation really only exists if you have single files with multiple extents.
Files spread randomly over the disk is not fragmentation and will only lead to
fragmentation if you need to create a file larger than the largest contiguous
free space. Similarly, when is a file fragmented? Two extents, 10 extents, 
20 extents or over 50 extents, it depends upon your environment. Maximum
contiguous free space is not particulary important unless you are running
an Intergraph system which requires large files be allocated contiguously.

As in all system tuning situations, you must know your users and your workload.
Many times simply changing the clustering factor will have far more effect than
anyting else you can do. For example, if your average file is 3000 bytes, then
a cluster factor of 6 or 7 would make far more sense than one of 2 or 3.
(See the initialize command and the Guide to file applications for details.)

Another major concern is how the files are created. Are they only created once
at their maximum size or are they created and then extended frequently? 
The first case will only fragment if your cluster size is too small or if they
are infact large files with small contiguous space. The latter case is almost
always guaranteed to lead to another extent every time the file is extended,
unless one "pre allocates" space to the file when it is created. This, however,
leads to a potential of much space sitting around unavailable but unused.

Under most operating systems the goal is to distribute "small" files all over 
a physical volume thereby optimizing seek times. It is assumed that a random
placement of randomly sized files accessed randomly is preferred over a 
sequential placement of these files. The randomness assumes that the heads will
travel a shorter distance more often than a longer distance (Murphy ignored).
It is also assumed that on a true interactive system files come and go on 
a frequent basis, and therefore randomness is assured.

Most disk file placement algorythms are predicated on a volume retaining 
(and therefore utilizing) approximately 40-60% utilization. These algorythms
loose effectiveness on very lightly loaded volumes (10-15% utilization) or
on volumes with very little free space (85-100% utilization). Also a volume
with low file volitility (creation, deletion activity) will be less optimal
than one with high file volitility.

If you have "permanent" files, ie relatively large files (several cylinders
worth) which are accessed frequently, especially by many users, (like a 
swap file or paging file), these should be placed a) on fixed head devices,
b) "surrounding" the central cylinder, c) on different physical volumes. 
You build out from this central file with your next most heavily used file 
and so on. These are typically files which occur only on a production system 
and only rarely (as in databases) on a true interactive system.