[mod.computers.vax] Dual-ported disk behavior

ESMP09@SLACTWGM.BITNET.UUCP (04/05/87)

  We have been using dual-ported (actually, multi-ported) disks
with VMS with reasonable success, but not without a few
problems.  These are disks on an SI9900 controller.  Each
disk is mounted read/write from a single CPU, and read-only from
the rest.  This is NOT part of a VAX cluster, nor are we using
any special software such as SILINK.

  My friend has a dual-ported DEC RP07, and is interested in using it
in a similar mode.  I mentioned some of the problems we have observed
(described below), and suggested to him that the RP07 dual-port
arrangement would probably behave similarly.  My friend called the DEC
software support center, described how he intended to use the
disk, and asked if there were any special problems to be expected
with this mode of operation.  The answer was that this was a
supported mode of operation, that there should be no problems--
simply mount the disk /WRITE from one CPU, /NOWRITE from the other;
no need to even turn off cacheing.  (DEC may also have stated that
SET DEVICE/DUAL_PORT should be used; presumably this is necessary.)

  I would be interested in hearing of the experience of others
who have tried using disks in this manner, especially  DEC RP07's
and disks on an SI9900 controller.  (Sites running a VAXcluster or
SILINK are, I think, irrelevant to this question.)  The problems which
we have seen with disks on the multi-ported SI controller when mounted
in this manner are described below.  We are currently running VMS 4.5.

Problem 1:  Expansion of INDEXF.SYS makes some files inaccessible
---------
  If the number of files stored on the disk increases beyond the
  number which can be accommodated by INDEXF.SYS, that file will
  be extended.  However those files which are created with headers
  which are located in this expansion area will be inaccessible
  ('file not found') from a read-only CPU until the disk is
  dismounted and remounted by that CPU.

Problem 2:  Mount verification failures
---------
  If a disk goes into 'mount verification' for some reason, a CPU will
  often fail to remount the disk, the reason reported for the failure
  being 'wrong volume'.  I believe I understand why this is so, although
  I have not verified all of these details.  Apparently when a disk is
  mounted read/write, the date/time of mounting is recorded both on the
  disk itself and in memory.  Should the disk go into the mount-
  verification state and a remount be attempted, it is required that
  not only the disk label but also this date/time match between what
  is on the disk and what is in memory--if both do not match, the
  disk is deemed to be the 'wrong volume'.  If a disk is mounted
  read-only, the date/time which is currently on the disk is recorded
  in memory for later comparison should a mount verification be necessary.
  With this background, it can be seen that there is a potential for a CPU
  with read-only access to fail mount verification.  Consider two scenarios:

   Scenario 1 (successful):
      At T0: CPU A mounts DISKX read/write, marking it with time T0
      At T1: CPU B mounts DISKX read-only, noting its mount time of T0
      At T2: DISKX goes into mount verification -- both CPUs successfully
        recover the disk, as they both agree with its recorded mount time
        of T0.

   Scenario 2 (unsuccessful):
      At T0: CPU B mounts DISKX read-only, noting its mount time (a time <T0)
      At T1: CPU A mounts DISKX read/write, marking it with time T1
      At T2: DISKX goes into mount verification -- CPU A is successful,
        as the time T1 matches its expectation, but CPU B fails mount
        verification--the time recorded on the disk (T2) is not the same
        as what was there (<T0) when it mounted the disk, hence it is
        deemed to be the 'wrong volume'.

Problem 3:  XQPERR bugcheck crashes from read-only CPU
---------
  The nature of this problem is less certain than the first two,
and we cannot reliably reproduce it.  From discussions with DEC software
support, a TENTATIVE interpretation is as follows:  the read/write
CPU has caused the INDEXF.SYS file to expand;  the read-only CPU gets
two conflicting pieces of information regarding the length of
INDEXF.SYS; taken at face value, these conficting values imply that
INDEXF.SYS has DECREASED in length, a logical impossibility within
the rules of the file system, and hence the XQP bails out with a
bugcheck (XQPERR).  We've seen this problem twice, both since making
the change to VMS 4.5 (with no time spent at 4.4).


  Clearly, there is a 'work-around' for two of these problems:  make
sure that INDEXF.SYS is large enough so that there is never any need
to expand it.  For the remaining problem (mount verification), I know of
no reasonable work-around; the best we could do was to try to reduce the
likelihood of events which lead to the need for mount verification.

                                Ed Miller
                                ESMP09@SLACTWGM.BITNET