[comp.os.vms] Latest Series Of Problems, Problems And MORE Problems...

CLAYTON@XRT.UPENN.EDU ("Clayton, Paul D.") (10/12/87)
Information From TSO Financial - The Saga Continues...
Chapter 29 - October 11, 1987

Its been awhile since my last blurb and this edition is to bring several items
to light for comtemplation and comment, if needed.

Leading the list is networks. My rendition of how computers came to being is
that CPU's were created and someone said they were good. The next day, someone 
wanted to log on the CPU's and networks were created. And EVERYTHING went to 
HELL. Computers systems have not recovered. I say the above because of the 
following reasons.

1. We have an Ethernet backbone shipped around the country via T1 links and 
Ethernet bridges. On a T1 link, there is a master clock that synchs ALL T1
modems together. Our network is setup for single branch failure on the phone 
lines. In this blessed event, traffic will 'fail-over' to the remaining links 
and attempt to continue. The rub comes in having one clock do all the work with
the T1 modems. Should the POWER to the CLOCK or the CLOCK itself fail, all T1 
modems are USELESS. For us, the power failed and the T1's went to lunch. Our
Ethernet, PBX tie lines, dedicated data circuits and computer systems went to
LUNCH with them. In researching this more, it may NOT be possible to have a
second clock on the network. The vendor is checking further. Sigh...

2. In reading all the messages about DECnet/Ethernet problems, and the latest
bit about the unavailable buffers, I consider networks to be extremely 
unsophisticated and in DIRE need of new technologies and/or tools. I had to
chuckle about the response concerning the use of Ethernim in locating a 
node pumping out garbage. While we have Ethernim and run it from time to time,
the displaying of a significant number of Broadcast or Multicast messages is
almost useless. There is no way to isolate the source, short of trimming the
network. How many networks can be trimmed to find the source? Ours can not due
to our using it for so many different things at all times of the day and night.

On to other topics.

Our Vax Cluster Console (VCS) system has arrived AT LAST. I have/had high 
hopes for the system but in light of the network problem listed in item #1 
part of the project is being SERIOUSLY reconsidered. Anyway, the boxes all 
came, all fifteen (15) of them and nothing was damaged. Ours is a LARGER 
configuration then is normal for a VCS. We are having a seperate Ethernet 
backbone for it, to eliminate problem in item #1, and adding terminal servers
to do some special functions for us. Our VCS system was planned to control 15+
systems both local and remote. The gotcha's started occuring quickly.

1. The CONSOLE port on a MicroVAX II system is NOT a DB25 connector. Its one
of those SUB-D types that someone thought was a good idea. WRONG. The fiber 
optic links are all based on DB-25 connectors. One of which has to go onto the
back of the processor in place of the console connector. This can NOT be done
on the MVAX II systems. Stalemate for the moment, Colorado is working on it.

2. The 85/87/88 PRO consoles ALSO present a problem. The VCS gets plugged into
the PRO and the question is where. The current statements from DEC is that it
gets plugged into the RDC port on the PRO and the MDS01 box gets plugged into 
the VCS system. This is a bad idea due to only having one MDS01 per VCS. I
have on several bad days, lost more then one system. How is DEC to perform the
RDC function on several systems from one MDS01?? The other alternative is to 
have many MDS01's on several DHV11 ports. I did NOT order enough interfaces 
for that solution. My experience with the folks at RDC is that it is sometimes
extremely hard for them to get things right and be able to work my system(s)
from afar. And to expect the RDC gang to be able to UNDERSTAND and USE VCS 
type commands I feel is asking to much, at least for this point in time. The 
other purpose we use the MDS01 boxes for is dial up lines into our systems. If 
all the MDS01's get moved to the VCS system, then I have to have a bunch of 
junk accounts on it for people to log into and then SET HOST to another system.
This is also something I DO NOT WANT!!!

3. The fiber optic links also caused problems. I was told ONLY of the power
supply on the VCS end of the links. It turns out that there is a power supply
on the opposite end as well. This causes EXTRA power outlets to be run for each
system that has a VCS tie in. There is one good point here, in that the power
supply on the VCS end is a 16 tap deal with ONE power tail. Someone used their
head on that one at least.

4. On the plug that attaches to the processor being controlled by the VCS, there
is a switch that allows for output to go to the VCS or a LOCAL terminal. This
would be needed should the VCS system be unavailable for what ever reason. This
sounded like a GOOD idea. I then asked about its impact when connected to the
HSC console port. The deal here is that should you TURN OFF the console terminal
that is connected to the HSC console port, the HSC WILL REBOOT. This is NOT
healthy in a prodcution shop, and NOT something to happen in the midst of LOSING
ALL system consoles at one time. I have one HSC50 and a reboot of it would have
me down for MINUTES/HOURS while the TU58 whirls!! The question boils down to 
this, "Does the switching from VCS to LOCAL maintain the signals in the HSC 
that will PREVENT a HSC reboot?". Answer is UNKNOWN as yet and Colorado is 
looking into this also.

All this and the software has NOT been installed yet. Who knows what results
I will have when that task is completed. Needless to say at least ONE more
chapter of 'The Saga...' should come out of it.

Also got an UPS system approved and the room it will reside in started. It is 
due to arrive by the end of October. Just to prove we were correct in getting
one, the power has FAILED four (4) times in the span of three (3) days. Three
of the hits resulted from UNKNOWN power shutdowns from our supplier. Two of
these were within 15 minutes of each other. Just enough time between hits to
have the MOUNT commands starting on a large disk farm in the bootup files. You
better BELIEVE we do the REBUILDS during the MOUNTs. The fourth was a lightning
hit. All four lasted less then 10 seconds. The UPS is batteries good for 10 
minutes at 75KVA. Since we only use about 50KVA, we have close to 20 minutes
worst case. I shall rest easier when its in, and my local field service is also
glad we are getting one. 

Thats enough for this edition, so until things break, again, I shall continue
my vigil and hope for the best and get the worst. :-)

Paul D. Clayton - Manager Of Systems
TSO Financial - Horsham, Pa. USA
Address - CLAYTON%XRT@CIS.UPENN.EDU