[comp.sys.apollo] sr10.1 Install on Stand-Alone Nodes

rchrd@well.UUCP (Richard Friedman) (07/27/89)

The following is a copy of an APR regarding the system load/install
procedure for sr10.1 on stand-alone nodes.  This APR was issued as
a request for improvement ("enhancement").


  The installation procedure for DOMAIN 10.1 on stand-alone (disked) nodes
is seriously flawed.  Anyone attempting to install the operating system
on a node that has no partner is faced with some very anxious hours.  In
my recent experiences with 10.1, it took many re-attempts before the system
was properly installed.  The following is offered to the team at Apollo
that deals with the installation procedures as a friendly attempt to make
them more aware of the particular problems facing the owner/user of a
stand-alone node in the hope they can be improved in later releases.

  A number of issues are addressed in this APR, and rather than send
separate APR's, I have bundled them together into one.

  First, let me say that my experiences over the past weeks trying to
get 10.1 installed on a stand-alone DN3000 lead me to think that no one
at Apollo ever tried to do such an installation, and that the procedures
reflect the installation team's lack of experience in this area.

  The situations typically facing the administrator of stand-alone nodes
are:
     1. First-time installation of 10.1 from a 9.x system.  (This will
        require INVOL'd the disk.)

     2. Hardware problem with disk requires INVOL of disk (or, as in my
        recent case, replacement with new disk).

  The fact that there is no network and no partner node, the only source
of software is the system release materials.

  I should also mention that it takes about 4 hours to install DOMAIN
and the FTN and CC compilers from cartridge tapes on a DN3000, starting
with a format/write/read-back INVOL on a 170Mb disk.  (That's a long time!)





[1]   No Diagnostics on Boot Tape

Consider the following: Disk crashes repeatedly, due to real hardware
problem.  Crashes have made mince-meat out of system on disk.  It won't
boot properly, nor will diagnostics in /sau8 run.  Only recourse is to
INVOL disk and re-install system from cartridges.  Would like to be able
to run diagnostics before attempting the install in order to determine
cause of problem.  However, the boot system (on tape) does not include
any diagnostics, only INVOL, SALVOL, CALENDAR, and DOMAIN.  This means
the only way diagnostics can be run after the disk is INVOL'd is by first
doing a complete install of the operating system, which may fail due to
the hardware problem.

Suggestion:  Include, along with system distribution tapes, a diagnostic
tape.  (Obviously, this tape would have to include all the sau's.)
An initialization procedure could prompt the user for configuration.
This tape must be bootable as a self-contained system, making no
assumptions about what is already on the disk.


[2]   Need System Backup/Restore Procedure

On a stand-alone node, should disk problems occur requiring an INVOL
or disk replacement, there is no way to restore the system except by
doing a complete re-installation.  Doing this is not only an extreme
waste of time (it may take more than one re-install to get it right),
but it also loses any customizations of the running system that the
system adminstrator may have done since the system was last installed.
Doing a re-install of the system from the installation media is not
something an adminstrator is likely to look forward to doing.

Suggestion:  Provide a system backup/restore procedure.  Once the system
is installed and customized, a backup system can be produced as
a bootable tape.  Booting from this tape regenerates the system on the
disk.  Later, the adminstrator can reload his user data backup tapes.

Having a system backup procedure is preferred to merely saving the
MINST  configuration files to control the system installation.
The administrator has no idea how MINST works, and shouldn't need to.
The minimal DOMAIN system that comes up from the distributed boot system
immediately puts the user into MINST.  To do something extra to read in the
saved config files is too much too much to ask!



[3]  Streamline the Install Procedure for Stand-Alone Nodes

The "Authorized Area" concept is ridiculous for stand-alone nodes.
There is no reason why the install procedure must be partitioned into
load and install operations.  They should be combined into one operation
on stand-alone nodes.  MINST should ask at the start if this is an install
on a stand-alone node.  Then it need only to ask what system options are
to be selected, and then load/install accordingly.  There is no reason to
load into the /install directory and make links.  The load should be
directly to the proper system directories.

Also, only the selected parts of the system need be loaded.  With the
current MINST, even though I have selected only SAU8, it loads ALL the
sau's, but installs only sau8 into the root directory with links.
This is true for the other parts of the system I deselect: they get loaded
into /install, but the links are not made.  To conserve disk space after
the install I must (very carefully!) delete items from /install.
This should not be necessary!  Only the installation tools should be loaded
into /install on a stand-alone node load/install.  The system should
be installed in its proper place.  After installation I should feel
confident that    rm -r /install   will not get me into trouble, and will
recover disk space successfully.


[4]   Recovering from an Install that Went Bad

If something goes wrong during the load/install procedure (wrong answer
given, tape read error,  user error, etc.)  there is no way to confidently
restart the install without going all the way back to an INVOL.

Suppose something goes wrong and MINST exits with errors.  Without knowing
the exact parameters that the canned procedure used to start up MINST,
there is no way to manually restart MINST properly.  For one thing,
MINST and INSTALL++ must be more forgiving and robust.  The WORST thing
they can do is just exit!  They must be restartable, or at least indicate
to the user how to restart and proceed.


[5]  Documentation!

Much could be said about Apollo's documentation of the system install
process.  None of it would be good, and I am glad to hear that the
documentation is being re-done.  Here are some suggestions:

  o   It would be very valuable, when explaining the questions
      that MINST and INSTALL put to the administrator, to indicate
      the implications of each of the possible answers.  There are
      many instances when you ask yourself, "What happens if I say
      yes?!".  The documentation doesn't give a clue.

  o   Rather than issue Installation Notes that say "Disregard
      the 3rd paragraph on page 27 of the Installation Guide",
      issue change pages that update the installation guide!
      Currently, before attempting the installation,  I had to
      read 3 documents, each one updating and correcting details
      in the one previous.  The psychological damage to one's
      confidence that the installation procedure is workable
      creates tremendous anxiety and additional unneeded stress
      in those of us that are forced undertake these trials.

  o   Include an appendix that explains and suggests procedures
      for installing and maintaining system software on stand-alone
      nodes.  I suspect that there are a good enough number of
      such configurations to justify at least an appendix.
      It should be written by someone who has actually and successfully
      installed and customized the system on a stand-alone node.
      Some of the issues mentioned in this APR need to be addressed
      in this appendix:

             -- How to do the install efficiently
             -- What can be ignored, and what is required
                  for a stand-alone node.
             -- Which daemons are not needed.  Which are?
             -- Recovering disk space after the install
             -- How to set up a simple registry.
             -- Doing backups.. what needs to be saved.


[6]  Misleading INSTALL++ Question Deletes System!

  After successfully installing DOMAIN once, I went on to install FTN and
CC compilers.  Running INSTALL++ I was confronted with the questions:
  "Would you like to recover disk space by deleting the following
   OS branches:
      1. os       "

  Of course I would!  I assumed that this was going to delete all the
secondary links to the operating system that are in /install.  And,
after answering yes, I saw INSTALL go on to delete all the links to
the operating system that were in the /install directory.  Fine!
BUT, then it went on to delete all the primary links as well.  I watched
in horror as the operating system leaked out onto the floor and I was
left with nothing but 4 hours wasted.  I quickly opened another a window
and tried the   ls   command:

        % ls
        ls: command not found.

All was lost!  And its because none of this is properly described in
the documentation  (What is an OS branch, as far as INSTALL is
concerned?)  And, because INSTALL should not have asked!  (This is
a stand-alone system, remember??)  (You can imagine my state of composure
after this happened! "Hacker goes Berserk!  Axes Computer!")

---------------

I hope this information is helpful in improving the day-to-day life of
those of us with stand-alone nodes.  Please do not hesitate to call on
me for further information.

PS:  The opinions expressed herein are my own and do not in any way
     represent those of my employer, Pacific-Sierra Research Corp.

                            -> Richard Friedman
                               (415) 540-5216

END TEXT SET DESCRIPTION
END APOLLO PRODUCT REPORT
-- 
 ...Richard Friedman           rchrd@well.uucp                      
    (Pacific-Sierra Research/Berkeley, CA.)
     also: {lll-crg,pacbell,hplabs}!well!rchrd

tom@fangorn.gsfc.nasa.gov (Thomas D. Schardt) (07/27/89)

I whole-heartily agree with Richard Friedman's description of what needs
to be done to improve OS installations on stand-alone Apollo system.  As
I attempted to install SR10.1 on our DN3500, the thought of dragging the
machine to the roof and pitching it over the side crossed my mind.  I
didn't, but the experience has left a bad taste in my mouth when it comes
to that machine.
Tom Schardt                        Bitnet:    K4TDS at SCFVM
NASA/Goddard Space Flight Center   Internet:  K4TDS@SCFVM.GSFC.NASA.GOV
Code 632                           Opinions expressed are my own and do not
Greenbelt, MD 20771                necessarily reflect the opinions of my employer

abair@turbinia.oakhill.uucp (Alan Bair) (07/28/89)

I also agree with many of Richard's problems.  I would especially like to
emphasize the problem of determing just how to install the OS to use the
least amount of disk space; hard links or copy.  If you use hard links it
is not clear whether there is actually a space saving.  Then if you do a
copy, erasing the /install is the next step, but what can be erased.

Just a little more information would be handy.  Even large networks could
make use of this information.  If there is one thing I hate about 
installation software, its when facts about what is being done or how
items are related are hidden from the user.  I can understand not telling
much as a default, but I should have a option to ask for it or read about
it.

Please consider all of these ideas and proposals in a positive light.

Alan Bair
SPS CAD  Austin, Texas
Motorola, Inc.
UUCP cs.utexas.edu!oakhill!turbinia!abair