info-vax@ucbvax.ARPA (02/19/85)
From: saether%lassie.DEC@decwrl.ARPA This problem sounds a lot like something I discovered, very incredibly much to my chagrin, a few weeks ago. If so, it is related to files with lots of extension headers, which a 60000 block file with mostly 5 to 10 block extents would have. The problem arises when a given file has more extension headers than there are header buffers in the file system cache at the time. The number of file header buffers in the file system cache is controlled by the sysgen parameter ACP_HDRCACHE. The number of file header buffers available at any given time is determined by how many processes are trying to use how many buffers at the same time. For example, if two processes are simultaneously accessing two different files with 10 extension headers each, they could make use of 20 buffers from the file header pool (ACP_HDRCACHE) if available. If only 10 buffers were in that pool, they are supposed to toss out some of the buffers they use while they traipse through the extension headers accessing it. Alas, alack, when that happens they go awry, and cause strange things to happen, some of which you have observed, I suspect. You've probably already guessed that the workaround is to give the file system lots of file header buffers so that it doesn't get into trouble. You can discover how many headers a file has by doing a $ DUMP/HEADER file and counting how many headers go by before data starts coming out. Then multiply by at least 2 or 3 to get a reasonable size for ACP_HDRCACHE. The last thing to note is that as of VMS V4, file system buffers are allocated from paged pool, so it needs to be sized appropriately. Assuming all disks use the default system disk cache, add up ACP_MAPCACHE, ACP_DIRCACHE, ACP_HDRCACHE, and ACP_DINDXCACHE, all in blocks, add about 10 percent for cache overhead, and multiply by 512 to get the number of bytes of paged pool it will use. AUTOGEN should know how to do this if you use it to modify the parameters. Given that you say the problem is reproducible, I suspect you have gotten a reduced cache size. When the requested amount of paged pool for the buffers (ACP_xxx + etc) is not available, the volume will be mounted instead with a minimal cache with only 14 buffers total (6 of that headers). There is usually enough pool for that. If this occurs as the result of specifying a new cache using the /PROCESSOR=UNIQUE qualifier, you will get a REDCACHE (volume mounted with reduced cache) warning from mount. I don't believe you get any message if this happens while mounting the system disk (and allocating the default file system cache) however. The way to know for sure is to do a $ SHOW DEVICE /FULL and in the lower right hand corner is a line to the effect "Maximum buffers in FCP cache" - if the value is 14, that is what happened. I hope this helps you get around the problem, and maybe help others avoid it. This is not a formal commitment, but the fix for this should be in V4.2. Christian Saether - VMS V4 distributed file system project leader/developer.