rchrd@well.UUCP (Richard Friedman) (07/27/89)
The following is a copy of an APR regarding the system load/install procedure for sr10.1 on stand-alone nodes. This APR was issued as a request for improvement ("enhancement"). The installation procedure for DOMAIN 10.1 on stand-alone (disked) nodes is seriously flawed. Anyone attempting to install the operating system on a node that has no partner is faced with some very anxious hours. In my recent experiences with 10.1, it took many re-attempts before the system was properly installed. The following is offered to the team at Apollo that deals with the installation procedures as a friendly attempt to make them more aware of the particular problems facing the owner/user of a stand-alone node in the hope they can be improved in later releases. A number of issues are addressed in this APR, and rather than send separate APR's, I have bundled them together into one. First, let me say that my experiences over the past weeks trying to get 10.1 installed on a stand-alone DN3000 lead me to think that no one at Apollo ever tried to do such an installation, and that the procedures reflect the installation team's lack of experience in this area. The situations typically facing the administrator of stand-alone nodes are: 1. First-time installation of 10.1 from a 9.x system. (This will require INVOL'd the disk.) 2. Hardware problem with disk requires INVOL of disk (or, as in my recent case, replacement with new disk). The fact that there is no network and no partner node, the only source of software is the system release materials. I should also mention that it takes about 4 hours to install DOMAIN and the FTN and CC compilers from cartridge tapes on a DN3000, starting with a format/write/read-back INVOL on a 170Mb disk. (That's a long time!) [1] No Diagnostics on Boot Tape Consider the following: Disk crashes repeatedly, due to real hardware problem. Crashes have made mince-meat out of system on disk. It won't boot properly, nor will diagnostics in /sau8 run. Only recourse is to INVOL disk and re-install system from cartridges. Would like to be able to run diagnostics before attempting the install in order to determine cause of problem. However, the boot system (on tape) does not include any diagnostics, only INVOL, SALVOL, CALENDAR, and DOMAIN. This means the only way diagnostics can be run after the disk is INVOL'd is by first doing a complete install of the operating system, which may fail due to the hardware problem. Suggestion: Include, along with system distribution tapes, a diagnostic tape. (Obviously, this tape would have to include all the sau's.) An initialization procedure could prompt the user for configuration. This tape must be bootable as a self-contained system, making no assumptions about what is already on the disk. [2] Need System Backup/Restore Procedure On a stand-alone node, should disk problems occur requiring an INVOL or disk replacement, there is no way to restore the system except by doing a complete re-installation. Doing this is not only an extreme waste of time (it may take more than one re-install to get it right), but it also loses any customizations of the running system that the system adminstrator may have done since the system was last installed. Doing a re-install of the system from the installation media is not something an adminstrator is likely to look forward to doing. Suggestion: Provide a system backup/restore procedure. Once the system is installed and customized, a backup system can be produced as a bootable tape. Booting from this tape regenerates the system on the disk. Later, the adminstrator can reload his user data backup tapes. Having a system backup procedure is preferred to merely saving the MINST configuration files to control the system installation. The administrator has no idea how MINST works, and shouldn't need to. The minimal DOMAIN system that comes up from the distributed boot system immediately puts the user into MINST. To do something extra to read in the saved config files is too much too much to ask! [3] Streamline the Install Procedure for Stand-Alone Nodes The "Authorized Area" concept is ridiculous for stand-alone nodes. There is no reason why the install procedure must be partitioned into load and install operations. They should be combined into one operation on stand-alone nodes. MINST should ask at the start if this is an install on a stand-alone node. Then it need only to ask what system options are to be selected, and then load/install accordingly. There is no reason to load into the /install directory and make links. The load should be directly to the proper system directories. Also, only the selected parts of the system need be loaded. With the current MINST, even though I have selected only SAU8, it loads ALL the sau's, but installs only sau8 into the root directory with links. This is true for the other parts of the system I deselect: they get loaded into /install, but the links are not made. To conserve disk space after the install I must (very carefully!) delete items from /install. This should not be necessary! Only the installation tools should be loaded into /install on a stand-alone node load/install. The system should be installed in its proper place. After installation I should feel confident that rm -r /install will not get me into trouble, and will recover disk space successfully. [4] Recovering from an Install that Went Bad If something goes wrong during the load/install procedure (wrong answer given, tape read error, user error, etc.) there is no way to confidently restart the install without going all the way back to an INVOL. Suppose something goes wrong and MINST exits with errors. Without knowing the exact parameters that the canned procedure used to start up MINST, there is no way to manually restart MINST properly. For one thing, MINST and INSTALL++ must be more forgiving and robust. The WORST thing they can do is just exit! They must be restartable, or at least indicate to the user how to restart and proceed. [5] Documentation! Much could be said about Apollo's documentation of the system install process. None of it would be good, and I am glad to hear that the documentation is being re-done. Here are some suggestions: o It would be very valuable, when explaining the questions that MINST and INSTALL put to the administrator, to indicate the implications of each of the possible answers. There are many instances when you ask yourself, "What happens if I say yes?!". The documentation doesn't give a clue. o Rather than issue Installation Notes that say "Disregard the 3rd paragraph on page 27 of the Installation Guide", issue change pages that update the installation guide! Currently, before attempting the installation, I had to read 3 documents, each one updating and correcting details in the one previous. The psychological damage to one's confidence that the installation procedure is workable creates tremendous anxiety and additional unneeded stress in those of us that are forced undertake these trials. o Include an appendix that explains and suggests procedures for installing and maintaining system software on stand-alone nodes. I suspect that there are a good enough number of such configurations to justify at least an appendix. It should be written by someone who has actually and successfully installed and customized the system on a stand-alone node. Some of the issues mentioned in this APR need to be addressed in this appendix: -- How to do the install efficiently -- What can be ignored, and what is required for a stand-alone node. -- Which daemons are not needed. Which are? -- Recovering disk space after the install -- How to set up a simple registry. -- Doing backups.. what needs to be saved. [6] Misleading INSTALL++ Question Deletes System! After successfully installing DOMAIN once, I went on to install FTN and CC compilers. Running INSTALL++ I was confronted with the questions: "Would you like to recover disk space by deleting the following OS branches: 1. os " Of course I would! I assumed that this was going to delete all the secondary links to the operating system that are in /install. And, after answering yes, I saw INSTALL go on to delete all the links to the operating system that were in the /install directory. Fine! BUT, then it went on to delete all the primary links as well. I watched in horror as the operating system leaked out onto the floor and I was left with nothing but 4 hours wasted. I quickly opened another a window and tried the ls command: % ls ls: command not found. All was lost! And its because none of this is properly described in the documentation (What is an OS branch, as far as INSTALL is concerned?) And, because INSTALL should not have asked! (This is a stand-alone system, remember??) (You can imagine my state of composure after this happened! "Hacker goes Berserk! Axes Computer!") --------------- I hope this information is helpful in improving the day-to-day life of those of us with stand-alone nodes. Please do not hesitate to call on me for further information. PS: The opinions expressed herein are my own and do not in any way represent those of my employer, Pacific-Sierra Research Corp. -> Richard Friedman (415) 540-5216 END TEXT SET DESCRIPTION END APOLLO PRODUCT REPORT -- ...Richard Friedman rchrd@well.uucp (Pacific-Sierra Research/Berkeley, CA.) also: {lll-crg,pacbell,hplabs}!well!rchrd
tom@fangorn.gsfc.nasa.gov (Thomas D. Schardt) (07/27/89)
I whole-heartily agree with Richard Friedman's description of what needs to be done to improve OS installations on stand-alone Apollo system. As I attempted to install SR10.1 on our DN3500, the thought of dragging the machine to the roof and pitching it over the side crossed my mind. I didn't, but the experience has left a bad taste in my mouth when it comes to that machine. Tom Schardt Bitnet: K4TDS at SCFVM NASA/Goddard Space Flight Center Internet: K4TDS@SCFVM.GSFC.NASA.GOV Code 632 Opinions expressed are my own and do not Greenbelt, MD 20771 necessarily reflect the opinions of my employer
abair@turbinia.oakhill.uucp (Alan Bair) (07/28/89)
I also agree with many of Richard's problems. I would especially like to emphasize the problem of determing just how to install the OS to use the least amount of disk space; hard links or copy. If you use hard links it is not clear whether there is actually a space saving. Then if you do a copy, erasing the /install is the next step, but what can be erased. Just a little more information would be handy. Even large networks could make use of this information. If there is one thing I hate about installation software, its when facts about what is being done or how items are related are hidden from the user. I can understand not telling much as a default, but I should have a option to ask for it or read about it. Please consider all of these ideas and proposals in a positive light. Alan Bair SPS CAD Austin, Texas Motorola, Inc. UUCP cs.utexas.edu!oakhill!turbinia!abair