[mod.computers.vax] A small horror story

JOHNSON@nuhub.acs.northeastern.edu ("I am only an egg.") (01/09/87)

     Following is an outline of a problem I recently encountered here at 
Northeastern University.  It points up the fact that I am possibly too 
trusting in nature.  It also shows the level of competence of DEC. I
thought I would let everyone in on it. 

               On December 30 1986 Northeastern University did updates
          to its  single  node  cluster  VAX  8650  VMS  system.   The
          products that where updated include the following:
           
               o ... FORTRAN compiler
               o ... PASCAL compiler
               o ... COBOL compiler
               o ... CDD
               o ... TDMS
               o ... DATATRIEVE
           
                These products  where  installed  using the procedures
          outlined  in  their  various  installation  notes  and cover
          letters.  The  installations  proceeded successfully.  These
          installation  were  all  performed   by  Chris  Johnson,  an
          Northeastern staff member.
           
               On December  31  a  complaint  was  made  by  a user to
          Northeastern's Academic  Computer  Services  that the PASCAL
          compiler didn't work.   The  error given concerned a invalid
          DCL table CLD entry.   Mr.  Johnson called the  DEC Customer
          Support  Center  and  as   a   start  the  PASCAL  team  was
          referenced.
           
               The  DCSC  software   engineer   in  the  PASCAL  group
          determined that the problem  was  of  a system type and that
          the VMS team should be called.
           
               On January 5, Mr.  Johnson  again called DCSC and asked
          for the VMS team.  The  error  was described to the DCSC VMS
          software engineer.   The  problem  was  determined to be the
          installation   of   a   SYS$SPECIFIC:[SYSLIB]   version   of
          DCLTABLES.EXE rather than the SYS$COMMON:[SYSLIB] version of
          it.  It seems  that  layered product installation procedures
          determine whether or not they  are  on a cluster node and if
          they are  they  change  the  SYS$COMMON:[SYSLIB]  version of
          DCLTABLES.EXE.  For a reason as yet unknown to both DCSC and
          Northeastern  there   was   a   copy   of  DCLTABLES.EXE  in
          SYS$SPECIFIC:[SYSLIB].  When a  cluster  node is booted, the
          SYS$LIBRARY     logical      name      is     equated     to
          SYS$SPECIFIC:,SYS$COMMON: IN THAT ORDER.   Thus, if there is
          a DCTABLES.EXE in SYS$SPECIFIC  (as  there was in this case)
          then  it  will  get  installed   on  boot  EVEN  THOUGH  THE
          SYS$COMMON version of  DCLTABLES.EXE  was  the  one that was
          updated by the layered produces installation procedures.
           
               When asked why there  were multiple copies of DCLTABLES
          in multiple directories, the DCSC engineer was able offer no
          explanation other  than  "We  don't  know,  it  just happens
          sometimes."  To be kind  this  answer  was unhelpful.  To be
          truthful this answer points up  a dreadful lack of knowledge
          on the part  of  DEC  an  DEC's  support  staff of their own
          installation procedures.
           
               As  a  solution,  The  DCSC  engineer  had  Mr. Johnson
          install, using the  INSTALL utility, the SYS$COMMON:[SYSLIB]
          version of DCLTABLES.EXE  and  then  delete,  again with the
          install  utility,   the   SYS$SPECIFIC:[SYSLIB]  version  of
          DCLTABLES.EXE.  This was  very  shortly  determined to be an
          ill-advised  procedure.  The  SYS$SPECIFIC  version  was  of
          course still in use and a  delete pending flag was raised on
          its global sections.   This,  in turn, prevented anyone else
          from logging on to the system  since a global section with a
          delete pending flag is effectively not usable. 
           
               To solve this new  problem,  caused by implementing the
          above DCSC advised  procedure,  the  SYS$SPECIFIC version of
          DCLTABLES.EXE had to be  renamed  and  the  system had to be
          rebooted IMMEDEATELY causing  inconvenience  to users logged
          on at the time.
           
               This delete pending problem  is  one  of two such.  The
          other is with  dismounting  disks  that  have  open files on
          them.  A REVERSE procedure IS NECESSARY for these two delete
          pending cases.   In  this  way  mistakes  made by operators,
          systems  people  AND   DCSC   would   not   REQUIRE  A  VERY
          INCONVENIENT system reboot. 

Chris Johnson
(more cynical than ever)
Northeastern University

McGuire_Ed@GRINNELL.MAILNET.UUCP (01/15/87)

>It is always bad policy to delete an installed image file. . . .  You
>can only safely delete the file if its globals sections no longer show
>up as delete pending in an $ install list/glob.

As long as the file was installed /OPEN (which it was if it was
installed /SHARE) it shouldn't matter if you delete the disk file.  The
file ought to be flagged for deletion but not actually deleted until it
is closed, i.e. until the delete pending section is released by all
processes.

This has been my assumption so far, and nothing has broken as a
consequence.  Please set me straight if anyone has proof that deleting a
disk file corrupts the shared sections.