CLAYTON@XRT.UPENN.EDU ("Clayton, Paul D.") (12/29/87)
Information From TSO Financial - The Saga Continues... Chapter 33 - December 28, 1987 The past several weeks have been an educational and trying time for my staff that supports the computers here at TSO. I have been accumulating the various items and will proceed to detail them here for your knowledge in the hope you can prevent similar events from happening to you. 1. We have long planned the upgrade of the PRO consoles to REV 5 for the VAX 8XXX systems we have to take advantage of several bug fixes and allow the VAX Cluster Console (VCS) system to accumulate the operator messages at a faster rate then 1200 baud. The day arrived and the upgraded RD disks were placed in the PRO's and powered up. All appeared to be fine. The menu system was used to allow the transfer of info from the PRO to the VCS at a higher baud rate and all seemed in order. We have since come to find out that the Rev 5 consoles have some nifty problems of their own. The first to get us was when we used the VCS to control the 8XXX and wanted to edit the startup command file located on the PRO from the VCS system. Everything was fine till it came time to exit from the 'CONTROL' program. The CONTROL program, located in the PRO, is the software which allows the 8XXX processor to transmit/receive data in the boot process and also allows for the PRO to be the operator console. It has several additional functions which are not noted here. Anyway, upon exiting the CONTROL program the PRO shuts the 'Remote User Port' and 'Remote Console' down and very effectivly isolates the PRO from the VCS. At this point the ONLY course of action is to use the PRO and perform the task that is needed. This is what the VCS was supposed to eliminate, the need to BE in the computer room. My local DEC office is looking into this. 2. The second PRO problem came to light at 10:00 PM on Friday night on one of the two 8700 systems that is used for NIGHTLY batch runs. My staff logged a service call for the DF112 modem that is used for user/RDC dial-ins to the system as it was not working. When the CE appeared he performed several tests and apparently came to final testing of the old unit. WITHOUT asking my operators, the CE had RDC log into the PRO and putz around. When RDC got out, the 8700 DIED. Without telling the operator about the crash, he inquired as to why the system would not respond. Further research from DEC has brought to light a possible problem when RDC gets out of the PRO Rev 5 console. The net result is a crashed machine. Needless to say, I am now VERY leary of RDC on any machines except the hardy 785's I have. The other side problem to this was that the CE unplugged a CRT, also without asking and which had a program running on it, to use it as a test of the modem. In a very short time, one 8700 and a program that had run all day died. If you have expieranced similar PRO problems you migh want to get in touch with your local office. Now I am looking forward to Rev. 6 of the PRO to eliminate that problem, which at the same time will hopefully provide a 'full' RDC capability for the 8XXX systems. 3. The Ethernet interface for the 8XXX systems has also proven to be a point of interest in the short past. We had been having a total disconnect of all terminal server connections to the 8700 processors and the only recovery was a reboot of the machine. The intersting side issue is that DECnet continued to work over Enet. After asking, several times, and proper conversations between local DEC and TSC/Maynard, the solution was to upgrade the Enet interface. We had DEBnet boards and we upgraded to DEBna units, rev D4. The problems did not go away, but did not happen as often as before. The next change after several more conversations was another upgrade from DEBna D4 to F2. The problems continued and another DEBna was installed, Rev. F4, in conjunction with the following new images: LATCP Rev. with link id = "LAT+ V1.1-2 1-FEB-1987:14:13" LATSYM Rev. with link id = "LAT+ V1.1 1-FEB-1987:14:13" LTDRIVER Rev. with link id = "LAT+ V1.1-27X 29-JUL-1987:15:20" This latest version of the solution has not had any LAT disconnects now and a period of DECnet circuit bounces has also stopped. I have also just received another upgrade of the DS100, DS200 engines and further patches to LTDRIVER, LATSYM and ETDRIVER which at this point have not been implemented. 4. We have purchased and installed four Decserver 500 terminal servers for a new building we are moving into. The hardware install was done on a time and materials basis and resulted in a savings when compared to the 'quoted' install prices for the options we have. The networks group did the install of the Enet backbone, H4000 taps, DS500 units and the options we purchased. We signed off on the install and received the booklet that the Networks group puts together showing the configuration and the results of the 'TDR' testing. At this point the networks group has very effectively disassociated themselves from our equipment. Any service calls placed on the equipment is handled by the same group that services my 8XXX systems, not the Networks group. The problem we had is that the software we received, in conjunction with the DS500 units contained the DS500 engine and configuration command files and assorted programs, would not work. It would load once and then we would start the configuration process and then attempt a reload to prove the configuration, and the result was a dead DS500. We use TSM, which is a nice product if you have a number of terminal servers. With 158 DS100/DS200 terminal servers and 4 DS500's, command files are a way of life. It turns out that the Rev 1.0 engine distributated in the installation kit has a problem running at all and a replacement can be gotten from DEC Colorado by your local office after several conversations. The replacment works AS LONG as TSM 1.0 is NOT used to configure the engine for the type of setup you want. If TSM 1.0 is used, the result is an unbootable DS500 after any changes are made to the engine. A new version of TSM 1.1S is currently available and will work with the DS500 units. My contention here is that NO information is in the DS500 documentation that says TSM 1.1S or higher is needed. As it turns out, we have not received the latest version yet and NEVER knew the problem existed. 5. The 8200 system we have is a version that has the 'small' BI backplane. The whole CPU, memory and BI cards are in a box the size of a UNIBUS and located high in a 36" cabinet. There are air vents in the front of this cabinet to allow the flow of outside air into the box and provide the cooling that is needed. There is a terminal located on top of the 8200 as its 'the right height' for standing and working a tube. This sounds mundane until you have someone with a sports jacket or overcoat walk in front of the unit. I was wearing an overcoat one day in the room and using the tube on top of the 8200. My coat was sucked across the vents and the air flow sensors detected reduced/no flow and tripped the circuit breakers. The mean time between closing the vents and circuit breaker trip is about 2 seconds. This occured twice before we figured out what was happening. There is NOTHING to tell you WHY the circuit breakers tripped, they just cut out. We now have the tube moved down one cabinet and we are leary of any coats, either sport or overcoats, in the computer room. 6. I have purchased Megatape cartridge tape drives from EMULEX which hold 650MB on one cartridge. We do backups to these devices to speed the process up and shrink the space needed to store the tapes. The units were originally daisy chained and the PC board to perform the daisy chain that was designed by EMULEX was a disgrace. The PC layout did NOT take into consideration that two boards had to be placed side by side. The result was oversize boards that REQUIRED one board being bent up and one bent down to work. This results in a flexing of the mother board, something not to be done often. The units have since been setup with their own controller and the EMULEX PC board is no longer needed. 7. I have moved an 8530 processor from one computer room to another and in the course of doing so, discovered several interesting items. The wheels that are located under the units in a 8530 are setup such that two swivel and two do not. This was probably done to allow steering from only one end. What I was hoping to do in my case was to simply uncable the 8530, raise the legs and roll the CPU and CI expansion cabinet out the door. They were to remain bolted together to save time in the other computer room. When it came time to move the unit, it was discovered that the only direction the two cabinets would roll together was width ways. In other words, a door width of more then 50 inches is needed to remove them when bolted together. The result was we had to unbolt them to get them out and back in. This requires that at least 10 bolts be removed and the top of the cabinet be removed to get the last two out. Several internal cable assemblies also have to be disconnected in the process. I did not save any time. 8. I have concluded a deal with System Industries that will result in my selling 10 RA-81 disks and receiving 8 SI93C disks. The heat load goes down from 22.9K BTU's to 5.08K BTU's. The electric load goes down from 78 amps to 26.88 amps. The floor space goes down from 16.5 square feet to 7.8 square feet. The access time goes down by up to 30 MS per I/O. The available disk space per spindle goes up from 456MB to 858MB. The total disk space goes up from 4.56 GIGABYTES to 6.86 GIGABYTES. And I now have two free ports on each of my HSC's for future expansion, before having to spend an additional $135,200 on two more HSC's and disk requestors. And they were shipped on December 28, 1987. The remaining RA81 disks will be used for system packs. I will submit another article when an accumulation of events warrents it. :-) Paul D. Clayton - Manager Of Systems TSO Financial - Horsham, Pa. USA Address - CLAYTON%XRT@CIS.UPENN.EDU