CLAYTON@XRT.UPENN.EDU ("Clayton, Paul D.") (09/28/87)
Information From TSO Financial - The Saga Continues... Chapter 27 - September 28, 1987 The following article was submitted for answers, which I have injected into the text. Speedup VAX-cluster startup-procedure ------------------------------------- Booting our VAX-cluster takes too long. It takes approximately 20 minutes before we have all systems up and running from a coldstart. The first bottleneck is the meeting that the 3 VAXes seem to have. Perhaps adding a quorum-disk can speedup that mumbling about entering and leaving the cluster? Adding a quorum disk will NOT speed things up when the 'meeting' is taking place. It will also LENGTHEN the time taken during cluster transistion when a system leaves the cluster for any reason. I am currently engaged in a serious conversation with my LOCAL and AREA support reps over the way my cluster is configured. DEC is predicting disaster, and in one case it may result. But I have lost two HSC's and THEY crapped up 4 disks. You are not suppose to win. The second bottleneck is SYSTARTUP.COM itself. There are a lot of software products that have to be handled. Some of them I start up by submitting a job from batch. I edited one command-procedure to startup jnet from batch and if there is a problem I get a notification by MAIL from that command-procedure. Good idea to start things in batch if you can. My practice to to NOT START the batch queues during boot time. The reason is that battery backup on the system clocks have failed me at least 3 times. The result is a bad system time. The worst disaster to date was a 8700 booted with a time two YEARS ahead of what it should have been. We have 40+ batch jobs that do ALL SORTS of things depending on the system time. We ended up rebuilding a lot of disk packs to get the files back to what they should have been. Sigh... The third bottleneck sometimes is the mount-verification of 10 RA81's if one of the systems was not shut down properly. I can handle that by switching off mount-verification. Again, I would NOT recommend that Mount-Verification be turned off. You could end up with scrambled bit maps in the cache and a lost disk way after any sensible point of rebuild. I HIGHLY recommend that MVTIMEOUT be set to something HUGE like 64000 so that a HSC50 with a TU58 can reboot BEFORE the disks can complete the verification procedure. I am going to rewrite the startup-procedure. I want to implement a multi-trap-rocket taking care of dependencies. For example I should first startup DECnet and after that PSI. I think of writing one command-procedure which can be called with an argument indicating the command-procedure (eg. CMSSTARTUP.COM) There should be some error-checking and problems should be reported by MAIL using a distribution list pointing to our system-managers. SYSTARTUP.COM then could look like: $... $ @SUB_TO_BATCH CMSSTARTUP $ @SUB_TO_BATCH ACMSTART $ ... Good idea. Just do this in SYSTARTUP.COM not STARTUP.COM. Now for my question: Anybody having some time-saving hints how to handle this all? I know how to do it but every hint might save me some time. To save you time reading boring remarks: We are also currently doing the same type of things. I know of no current DECUS tape that performs as we want. Have fun with your systems. Paul D. Clayton - Manager Of Systems TSO Financial - Horsham, Pa. USA Address - CLAYTON%XRT@CIS.UPENN.EDU