[comp.os.vms] Problems And Solutions For Several Areas Of Interest...

CLAYTON@XRT.UPENN.EDU ("Clayton, Paul D.") (12/29/87)

Information From TSO Financial - The Saga Continues...
Chapter 33 - December 28, 1987

The past several weeks have been an educational and trying time for my staff 
that supports the computers here at TSO. I have been accumulating the various 
items and will proceed to detail them here for your knowledge in the hope you 
can prevent similar events from happening to you.

1. We have long planned the upgrade of the PRO consoles to REV 5 for the VAX 
8XXX systems we have to take advantage of several bug fixes and allow the VAX 
Cluster Console (VCS) system to accumulate the operator messages at a faster 
rate then 1200 baud. The day arrived and the upgraded RD disks were placed in 
the PRO's and powered up. All appeared to be fine. The menu system was used to 
allow the transfer of info from the PRO to the VCS at a higher baud rate and
all seemed in order. We have since come to find out that the Rev 5 consoles 
have some nifty problems of their own. The first to get us was when we used 
the VCS to control the 8XXX and wanted to edit the startup command file 
located on the PRO from the VCS system. Everything was fine till it came time 
to exit from the 'CONTROL' program. The CONTROL program, located in the PRO, 
is the software which allows the 8XXX processor to transmit/receive data in 
the boot process and also allows for the PRO to be the operator console. It 
has several additional functions which are not noted here. Anyway, upon exiting 
the CONTROL program the PRO shuts the 'Remote User Port' and 'Remote Console' 
down and very effectivly isolates the PRO from the VCS. At this point the ONLY 
course of action is to use the PRO and perform the task that is needed. This 
is what the VCS was supposed to eliminate, the need to BE in the computer 
room. My local DEC office is looking into this. 

2. The second PRO problem came to light at 10:00 PM on Friday night on one of 
the two 8700 systems that is used for NIGHTLY batch runs. My staff logged a 
service call for the DF112 modem that is used for user/RDC dial-ins to the 
system as it was not working. When the CE appeared he performed several tests 
and apparently came to final testing of the old unit. WITHOUT asking 
my operators, the CE had RDC log into the PRO and putz around. When RDC got 
out, the 8700 DIED. Without telling the operator about the crash, he inquired 
as to why the system would not respond. Further research from DEC has brought 
to light a possible problem when RDC gets out of the PRO Rev 5 console. The 
net result is a crashed machine. Needless to say, I am now VERY leary of RDC 
on any machines except the hardy 785's I have. The other side problem to this 
was that the CE unplugged a CRT, also without asking and which had a program 
running on it, to use it as a test of the modem. In a very short time, one 
8700 and a program that had run all day died. If you have expieranced similar 
PRO problems you migh want to get in touch with your local office. Now I am 
looking forward to Rev. 6 of the PRO to eliminate that problem, which at the 
same time will hopefully provide a 'full' RDC capability for the 8XXX systems.

3. The Ethernet interface for the 8XXX systems has also proven to be a point 
of interest in the short past. We had been having a total disconnect of all 
terminal server connections to the 8700 processors and the only recovery was a 
reboot of the machine. The intersting side issue is that DECnet continued to 
work over Enet. After asking, several times, and proper conversations between 
local DEC and TSC/Maynard, the solution was to upgrade the Enet interface. We 
had DEBnet boards and we upgraded to DEBna units, rev D4. The problems did not 
go away, but did not happen as often as before. The next change after several 
more conversations was another upgrade from DEBna D4 to F2. The problems 
continued and another DEBna was installed, Rev. F4, in conjunction with the
following new images:
	LATCP Rev. with link id = "LAT+ V1.1-2 1-FEB-1987:14:13"
	LATSYM Rev.  with link id = "LAT+ V1.1 1-FEB-1987:14:13"
	LTDRIVER Rev. with link id = "LAT+ V1.1-27X 29-JUL-1987:15:20"
This latest version of the solution has not had any LAT disconnects now and a 
period of DECnet circuit bounces has also stopped. I have also just received 
another upgrade of the DS100, DS200 engines and further patches to LTDRIVER, 
LATSYM and ETDRIVER which at this point have not been implemented.

4. We have purchased and installed four Decserver 500 terminal servers for a 
new building we are moving into. The hardware install was done on a time 
and materials basis and resulted in a savings when compared to the 
'quoted' install prices for the options we have. The networks group did the 
install of the Enet backbone, H4000 taps, DS500 units and the options we 
purchased. We signed off on the install and received the booklet that the 
Networks group puts together showing the configuration and the results of the 
'TDR' testing. At this point the networks group has very effectively 
disassociated themselves from our equipment. Any service calls placed on the 
equipment is handled by the same group that services my 8XXX systems, not the
Networks group. The problem we had is that the software we received, in 
conjunction with the DS500 units contained the DS500 engine and configuration 
command files and assorted programs, would not work. It would load once and 
then we would start the configuration process and then attempt a reload to 
prove the configuration, and the result was a dead DS500. We use TSM, which is 
a nice product if you have a number of terminal servers. With 158 DS100/DS200 
terminal servers and 4 DS500's, command files are a way of life. It turns out 
that the Rev 1.0 engine distributated in the installation kit has a problem 
running at all and a replacement can be gotten from DEC Colorado by your local
office after several conversations. The replacment works AS LONG as TSM 1.0 is 
NOT used to configure the engine for the type of setup you want. If TSM 1.0 is 
used, the result is an unbootable DS500 after any changes are made to the 
engine. A new version of TSM 1.1S is currently available and will work with the
DS500 units. My contention here is that NO information is in the DS500 
documentation that says TSM 1.1S or higher is needed. As it turns out, we have 
not received the latest version yet and NEVER knew the problem existed.

5. The 8200 system we have is a version that has the 'small' BI backplane. The 
whole CPU, memory and BI cards are in a box the size of a UNIBUS and located 
high in a 36" cabinet. There are air vents in the front of this cabinet to 
allow the flow of outside air into the box and provide the cooling that is 
needed. There is a terminal located on top of the 8200 as its 'the right height'
for standing and working a tube. This sounds mundane until you have someone 
with a sports jacket or overcoat walk in front of the unit. I was wearing 
an overcoat one day in the room and using the tube on top of the 8200. My coat 
was sucked across the vents and the air flow sensors detected reduced/no flow 
and tripped the circuit breakers. The mean time between closing the vents and
circuit breaker trip is about 2 seconds. This occured twice before we figured 
out what was happening. There is NOTHING to tell you WHY the circuit breakers 
tripped, they just cut out. We now have the tube moved down one cabinet and we 
are leary of any coats, either sport or overcoats, in the computer room.

6. I have purchased Megatape cartridge tape drives from EMULEX which hold 
650MB on one cartridge. We do backups to these devices to speed the process up 
and shrink the space needed to store the tapes. The units were originally 
daisy chained and the PC board to perform the daisy chain that was designed 
by EMULEX was a disgrace. The PC layout did NOT take into consideration that 
two boards had to be placed side by side. The result was oversize boards that 
REQUIRED one board being bent up and one bent down to work. This results in a 
flexing of the mother board, something not to be done often. The units have 
since been setup with their own controller and the EMULEX PC board is no 
longer needed.  

7. I have moved an 8530 processor from one computer room to another and in the 
course of doing so, discovered several interesting items. The wheels that are 
located under the units in a 8530 are setup such that two swivel and two do 
not. This was probably done to allow steering from only one end. What I was 
hoping to do in my case was to simply uncable the 8530, raise the legs and 
roll the CPU and CI expansion cabinet out the door. They were to remain bolted 
together to save time in the other computer room. When it came time to move 
the unit, it was discovered that the only direction the two cabinets would 
roll together was width ways. In other words, a door width of more then 50 
inches is needed to remove them when bolted together. The result was we had to 
unbolt them to get them out and back in. This requires that at least 10 bolts 
be removed and the top of the cabinet be removed to get the last two out. 
Several internal cable assemblies also have to be disconnected in the process.
I did not save any time.

8. I have concluded a deal with System Industries that will result in my 
selling 10 RA-81 disks and receiving 8 SI93C disks. The heat load goes down 
from 22.9K BTU's to 5.08K BTU's. The electric load goes down from 78 amps to 
26.88 amps. The floor space goes down from 16.5 square feet to 7.8 square feet.
The access time goes down by up to 30 MS per I/O. The available disk space 
per spindle goes up from 456MB to 858MB. The total disk space goes up from 
4.56 GIGABYTES to 6.86 GIGABYTES. And I now have two free ports on each of my 
HSC's for future expansion, before having to spend an additional $135,200 on 
two more HSC's and disk requestors. And they were shipped on December 28, 
1987. The remaining RA81 disks will be used for system packs.

I will submit another article when an accumulation of events warrents it. :-)

Paul D. Clayton - Manager Of Systems
TSO Financial - Horsham, Pa. USA
Address - CLAYTON%XRT@CIS.UPENN.EDU