[fa.info-vax] Filesystem brain damage in VMS V4

info-vax@ucbvax.ARPA (02/19/85)
From: saether%lassie.DEC@decwrl.ARPA


This problem sounds a lot like something I discovered, very incredibly
much to my chagrin, a few weeks ago.  If so, it is related to files
with lots of extension headers, which a 60000 block file with mostly
5 to 10 block extents would have.

The problem arises when a given file has more extension headers than
there are header buffers in the file system cache at the time.  The
number of file header buffers in the file system cache is controlled
by the sysgen parameter ACP_HDRCACHE.  The  number of file header buffers
available at any given time is determined by how many processes are
trying to use how many buffers at the same time.  For example, if two
processes are simultaneously accessing two different files with 10
extension headers each, they could make use of 20 buffers from the
file header pool (ACP_HDRCACHE) if available.  If only 10 buffers
were in that pool, they are supposed to toss out some of the buffers
they use while they traipse through the extension headers accessing
it.  Alas, alack, when that happens they go awry, and cause strange
things to happen, some of which you have observed, I suspect.

You've probably already guessed that the workaround is to give the
file system lots of file header buffers so that it doesn't get into
trouble.  You can discover how many headers a file has by doing a

$ DUMP/HEADER file

and counting how many headers go by before data starts coming out.
Then multiply by at least 2 or 3 to get a reasonable size for
ACP_HDRCACHE.

The last thing to note is that as of VMS V4, file system buffers are
allocated from paged pool, so it needs to be sized appropriately.
Assuming all disks use the default system disk cache, add up
ACP_MAPCACHE, ACP_DIRCACHE, ACP_HDRCACHE, and ACP_DINDXCACHE, all in
blocks, add about  10 percent for cache overhead, and multiply by 512
to get the number of bytes of paged pool it will use.  AUTOGEN should
know how to do this if you use it to modify the parameters.

Given that you say the problem is reproducible, I suspect you have
gotten a reduced cache size.  When the requested amount of paged
pool for the buffers (ACP_xxx + etc) is not available, the volume will be
mounted instead with a minimal cache with only 14 buffers total (6 of that
headers).  There is usually enough pool for that.  If this occurs as
the result of specifying a new cache using the /PROCESSOR=UNIQUE qualifier,
you will get a REDCACHE (volume mounted with reduced cache) warning from
mount.  I don't believe you get any message if this happens while
mounting the system disk (and allocating the default file system cache)
however.  The way to know for sure is to do a

$ SHOW DEVICE /FULL

and in the lower right hand corner is a line to the effect "Maximum
buffers in FCP cache" - if the value is 14, that is what happened.

I hope this helps you get around the problem, and maybe help others
avoid it.  This is not a formal commitment, but the fix for this
should be in V4.2.

Christian Saether - VMS V4 distributed file system project leader/developer.