CLAYTON@xrt.upenn.EDU.UUCP (03/31/87)
Information From TSO Financial - The Saga Continues... Chapter 3 - March 30, 1987 I have been reading about the various problems that people of been having with the disk subsystems from System Industries with a humor that only comes from being there. It is from that origination point that I want to pass along some information concerning my past history with SI gear. During employment with a previous company, an agreement was signed with SI for the purchase of several 9700 controller systems coupled with CDC 9766 disk drives. At the time this was the best equipment that SI offered. The equipment was being used in an on-line database environment, using a database system that the company had developed itself starting in VMS V1.X days and had been updating with each new VMS release. The problems started appearing on any databases that were located on the SI gear and manifested themselves in the form of broken chains and lost data. No hardware errors were reported by VMS. At the same time we had several DEC RM05 drives that also contained some databases, and these databases NEVER seemed to break for mysterious reasons. Of course we would lose one due to a head crash or read/write logic problems, but that is expected. Over the course of the next several years and legal action between my company and SI, the problem was resolved to being lost interrupts within the controller which confused it, and data was getting lost as a result. The result at the time was SI replaced the 9700 controller/9766 disk combinations with DEC Massbus/RM05 drives and paid for their maintenance. Clearly this was a hard pill for SI to swallow and took a long time to hash out. The DEC drives remained until SI announced and provided as replacements the 9900 controller with Fujitsu's Eagles. The companies agreed on a one-for-one swap and the new gear was brought in for check out and test runs. After several tests our results showed that this new combination had problems also that showed up as a 'FATAL CONTROLLER ERROR' error code to the program requesting the I/O to be performed. Now it should be noted that these errors ONLY showed up on the 9700/9766 and the new 9900/Eagle combinations under EXTREMELY HEAVY I/O loads, which is what our application system typically produced during the course of the day. It was costly to rerun our database system to receive the errors and prove our point to SI, so I wrote a program and supporting command files that places an EXTREMELY HEAVY I/O load on a disk, particularly if more than one job is run per disk. The result of using this test program was the error would be recorded and informational messages are generated completely identifying the error. Messages are also printed that tell how much I/O has occurred on regular intervals. Once the proof of the errors was shown to SI and they ran a copy of the program in Calif., a firmware update was made to our controller and the errors have since disappeared. If anyone is having problems with a disk system, and a test is needed to subject it to heavy I/O loads I recommend the program which I wrote. If after running the program, errors are reported, contact your local field service office and show them printouts with the errors listed. The program is large and written in RATFIV, a FORTRAN pre-processor language, and also uses structured macros in the MACRO-32 subroutines.For this reason, I am not including the program with this message. I am going to get the program on the Spring '87 VAX SIG tape in a subdirectory of TSO. This way, sites without a FORTRAN compiler can use the program. It needs to be noted that SI has corrected the errors that we had been encountering and that no further errors had been found. The databases have been running on the Eagles for around 2 years now and no problems have been experienced by the drives and the controllers. The subsystems are single ported to VAX 11/780's. Regarding the use of the SI83C disk subsystems, I am scheduled to receive 8 drives in a single cabinet which is the same size as a single RA-81 quad pack. The drives will be dual ported between a HSC50 and a HSC70. After I run my I/O tester and everything looks good including HSC failover, the disks will be used as normal data disks and shadow set members. If I encounter any problems, I will put the results to INFO-VAX for general consumption and comment. I hope this clears up the debate a little or provides some guidelines for the future. THE INFORMATION IN THIS DOCUMENT IS SUBJECT TO CHANGE WITHOUT NOTICE AND SHOULD NOT BE CONSTRUED AS A COMMITMENT OR COMMENT BY TSO FINANCIAL, NATIONAL TEACHERS LIFE INS., COLONIAL NATIONAL BANK, COLONIAL NATIONAL LEASING, ADVANTA USA AND ANY COMPANIES I HAVE WORKED WITH IN THE PAST AS EMPLOYEE OR USER. THE COMPANIES LISTED ABOVE ASSUME NO RESPONSIBILITY FOR ANY ERRORS THAT MAY APPEAR. Paul D. Clayton - Systems Manager TSO Financial - Horsham, Pa. USA Address - CLAYTON%XRT@CIS.UPENN.EDU