CLAYTON@XRT.UPENN.EDU ("Clayton, Paul D.") (08/23/87)
Information From TSO Financial - The Saga Continues... Chapter 19 - August 22, 1987 I am giving serious consideration to renaming my computer room the 'Little House Of Horrors'. This chapter is a continuation of Chapters 13 and 15. It is also the basis for my wondering at what point do you declare a disaster and go someplace else and start anew. My HSC70 appears to be running good, the HSC50 is still making everyone question its sanity, one 11/785 is now running a system disk all by itself (and trashing it on a regular basis), the SNA Gateway is functioning (the IBM systems its connected to are having problems though), I have a loaner RA81 disk from DEC to make up for the additional system disk THEY asked me make and use, and the new 8350 is shipping TODAY. We have had 'Area Support' in conjunction with local support in our computer room for the past three weeks now and things have not settled down. The details of the horrors are as follows. The HSC's and disk problems were detailed earlier and the recent sequence is one HDA replacement, one whole RA81 replacement (as in everything but the slide racks goes), six requestor cards, three L100 HSC/CI Link Cards replaced and two HSC CPU cards. Two other disks need a format/verify to clean them up and then it 'should' be okay. Maybe. One 11/785 has been booted from its own system disk in the hope of limiting the damage when it gets trashed, which it still does, and the FPA was pulled in hopes of eliminating the corruption problem. It seems that the FPA failed some diag tests while VMS was still running so DEC suggested the FPA be replaced. As they had NO spares at the time, this is another sore point, the FPA was simply pulled and the machine rebooted. We are waiting for the corruption to show up again, at which point they want to swap the entire CI interface. For some reason DEC seems satified that the problem is not with the HSC50 on which it is the ONLY disk being worked. I have NO confidence with the HSC50 so the rest of the farm is on the HSC70. I also have to wonder about the SNA Gateway software, but more on that later. So now I am burdened with two system disks on which to maintain the production boot files and executables, and the duration of this test is not known. It has also come to light that the FPA diags always report failure on certain tests when run while VMS is still up. This was learned after the FPA was pulled and we suffered down time to allow it to be pulled. And DEC wonders why I have no faith in their diags?? Sigh... The SNA Gateway software was a REAL TRIP to get loaded and working. First, I am to this day not convinced they are not part of my problems. I chose the 11/785 that DEC has since asked me to boot from its own disk, as the load host for the SNA Gateway software. The physical Gateway is located in my computer room at Horsham, Pa.. The hardware was installed by the local Blue Bell office of DEC, which is 25 minutes away. The software was installed by a guy in the Washington DC office of DEC who traveled 4+ hours to get to Horsham. The final configuration setup and checkout was done by a lady from the Boston Mass. ACS group of DEC. She wasted a day and flew in. Now I know why the cost was over $25K. Anyway, before the ACS gal showed up, we where having problems getting the Gateway to load the third and final file, THE OPERATING SYSTEM. The first two are small, less then 20 blocks each, while the third is over 1,000 blocks. The errors we where getting from DECnet had the message 'Device Timeout'. After various configurations, it was found to load without problems if the Gateway and the 11/785 where the ONLY two things on a DELNI. All other configurations failed with the timeout error. We also have Terminal Server Manager (TSM) and it was showing timeout errors when talking with some terminal servers on the network. This network is rather large, going through T1's and Vitalink Ethernet bridges to eight buildings along the east coast. The network appeared to have no problems except for TSM and the Gateway. After spending one week between 23:00 and 1:00 we got it down to one T1/bridge that was causing the problem. We then went crawling around ceilings and floors and found that one Vitalink bridge had a hardware problem, another Vitalink bridge had software problems due to a crapped up floppy load media. The strange part is that with all these problems, this link maintained LAT communications. Once this was corrected, the Gateway loaded all three files without problems and now the ACS group came in to finish the job. It only took a couple of hours and all was working on our end. The IBM's on the other end where having problems of their own. I have to wonder if the condition is contagious?? So all was fine with the Gateway and the ACS gal wanted to go home. I raised the issue about having her install BUT not enable the Gateway software on the 'normal' system disk, since all this was done to a system disk that is going to be trashed (hopefully). She asked around and the decision was NOT to install the software citing license issues. Now this I feel is TOTALLY inappropiate considering it was DEC who wanted me to run the current system disk configuration. I did not win the argument at the time so she went home and the software is loaded and running from a disk that I have every intention of trashing. My local office, both sales and field service have been made aware of my feelings on this and they will foot the bill to have everyone back on a return visit when the time comes. I knew the Gateway was trouble. Should have blown up the delivery truck. The 'normal' display on the front of the Gateway consists of two cirles, side by side, with different segments of the LED's lighting up in sequence. The appearance is that of two circles NEVER touching but always going around one another. I think it fits DEC and IBM as corporations perfectly. I have to wonder if this was a marketing decision or a programmer stumbled into it?? Then my HIGH capacity tape drives from Emulex showed up. The ones I am getting hold 650+MB (formatted) per cartridge. They use the Emulex TC13 controller and look like a 'MS' device to VMS. Well the guy that showed up to do the installation was not a UNIBUS address guru and the result was a machine down all day and still no tape drives. The UNIBUS these first drives are going on has a TS11 already on it, and a couple of other devices that you have to do the 'connect' to yourself, and he could not understand what was going on. They try again on Monday and another person is scheduled to show up that knows about UNIBUS addressing. They are slick little drives and cartridges. More on these in the future. Then there was the lunch with the VP of Field Service from System Industries to go over some recent problems with the local office and the new SI83/93 drives that I have. Also the final test of a product we beta tested for DEC to be announced at DECWorld that should help some of you out there. More on this after DECWorld. Then there was a couple days vacation to recover from the tribulations and to see if TSO still needed me. They didn't. Luckily I am not looking for job security. The question remains. At what point do you declare a disaster?? Paul D. Clayton - Manager Of Systems TSO Financial - Horsham, Pa. USA Address - CLAYTON%XRT@CIS.UPENN.EDU