[mod.computers.vax] SI Disk Subsystem Problems And Use Of SI83C Drives...

CLAYTON@xrt.upenn.EDU.UUCP (03/31/87)

Information From TSO Financial - The Saga Continues...
Chapter 3 - March 30, 1987

I have been reading about the various problems that people of been having 
with the disk subsystems from System Industries with a humor that only comes 
from being there. It is from that origination point that I want to pass 
along some information concerning my past history with SI gear.

During employment with a previous company, an agreement was signed with 
SI for the purchase of several 9700 controller systems coupled with CDC 9766 
disk drives. At the time this was the best equipment that SI offered. The 
equipment was being used in an on-line database environment, using a 
database system that the company had developed itself starting in VMS V1.X 
days and had been updating with each new VMS release.

The problems started appearing on any databases that were located on the SI 
gear and manifested themselves in the form of broken chains and lost data. 
No hardware errors were reported by VMS. At the same time we had several DEC 
RM05 drives that also contained some databases, and these databases NEVER 
seemed to break for mysterious reasons. Of course we would lose one due to 
a head crash or read/write logic problems, but that is expected.

Over the course of the next several years and legal action between my 
company and SI, the problem was resolved to being lost interrupts within the 
controller which confused it, and data was getting lost as a result.
The result at the time was SI replaced the 9700 controller/9766 disk 
combinations with DEC Massbus/RM05 drives and paid for their maintenance. 
Clearly this was a hard pill for SI to swallow and took a long time to hash 
out. 

The DEC drives remained until SI announced and provided as replacements the 
9900 controller with Fujitsu's Eagles. The companies agreed on a one-for-one 
swap and the new gear was brought in for check out and test runs. After 
several tests our results showed that this new combination had problems also 
that showed up as a 'FATAL CONTROLLER ERROR' error code to the program 
requesting the I/O to be performed. Now it should be noted that these errors 
ONLY showed up on the 9700/9766 and the new 9900/Eagle combinations under 
EXTREMELY HEAVY I/O loads, which is what our application system typically 
produced during the course of the day.

It was costly to rerun our database system to receive the errors and prove 
our point to SI, so I wrote a program and supporting command files that places 
an EXTREMELY HEAVY I/O load on a disk, particularly if more than one job is 
run per disk. 

The result of using this test program was the error would be recorded and 
informational messages are generated completely identifying the error. 
Messages are also printed that tell how much I/O has occurred on regular 
intervals. 

Once the proof of the errors was shown to SI and they ran a copy of the 
program in Calif., a firmware update was made to our controller and the 
errors have since disappeared.

If anyone is having problems with a disk system, and a test is needed to 
subject it to heavy I/O loads I recommend the program which I wrote. If after 
running the program, errors are reported, contact your local field service 
office and show them printouts with the errors listed.

The program is large and written in RATFIV, a FORTRAN pre-processor 
language, and also uses structured macros in the MACRO-32 subroutines.For 
this reason, I am not including the program with this message. I am going to 
get the program on the Spring '87 VAX SIG tape in a subdirectory of TSO. 
This way, sites without a FORTRAN compiler can use the program.

It needs to be noted that SI has corrected the errors that we had been 
encountering and that no further errors had been found. The databases have 
been running on the Eagles for around 2 years now and no problems have been 
experienced by the drives and the controllers. The subsystems are single 
ported to VAX 11/780's.

Regarding the use of the SI83C disk subsystems, I am scheduled to receive 8 
drives in a single cabinet which is the same size as a single RA-81 quad 
pack. The drives will be dual ported between a HSC50 and a HSC70. After I run 
my I/O tester and everything looks good including HSC failover, the disks will 
be used as normal data disks and shadow set members. If I encounter any 
problems, I will put the results to INFO-VAX for general consumption and 
comment.

I hope this clears up the debate a little or provides some guidelines for 
the future.

THE INFORMATION IN THIS DOCUMENT IS SUBJECT TO CHANGE WITHOUT NOTICE AND 
SHOULD NOT BE CONSTRUED AS A COMMITMENT OR COMMENT BY TSO FINANCIAL, NATIONAL 
TEACHERS LIFE INS., COLONIAL NATIONAL BANK, COLONIAL NATIONAL LEASING, ADVANTA 
USA AND ANY COMPANIES I HAVE WORKED WITH IN THE PAST AS EMPLOYEE OR USER. THE 
COMPANIES LISTED ABOVE ASSUME NO RESPONSIBILITY FOR ANY ERRORS THAT MAY APPEAR.

Paul D. Clayton - Systems Manager
TSO Financial - Horsham, Pa. USA
Address - CLAYTON%XRT@CIS.UPENN.EDU