CLAYTON@XRT.UPENN.EDU ("Clayton, Paul D.") (11/08/87)
Information From TSO Financial - The Saga Continues... Chapter 32, Part 2 Of 2 - November 7, 1987 The last item of the installation was to become knowledgeable of the terms of the field service contract that is bundled into the purchase. The call window is 9:00 AM to 5:00 PM, Monday to Friday. The window can be expanded at additional cost to what ever is required. The time lag from placing a call for service to having someone here working is to be less then four (4) hours. I am told that these terms can be different according to your location and the distance to the nearest SI field service office. It needs to be noted here that I will not have DEC or other vendors perform the maintenance on these drives for a long time. My reason is that the controller is brand new to the market and still receiving significant fixes and upgrades. I feel the ONLY way to get these fixes in a timely manner is to have the manufacturer perform the maintenance. It also helps when you concern yourself with the sparing issue at the local office. With that done, everyone cleared out and I put the drives to immediate use by running a program I wrote some time ago to heavily load a disk with I/O. The program is designed such that a 400,000 block file is created and random length records, 1 to 32,000 bytes, are written then read from the file on a purely random basis. After the write is complete, a read of the same information at the same location on disk is performed and the information is compared with what was written to insure that there was no errors in writing to the disk. This process continues until the program is aborted or more then 5,000,000 write operations to the disk are completed. Status report lines are generated every so often to indicate how the test is progressing. Should an error occur, all pertinent information is printed out and the operation is retried up to ten (10) times before moving on. The result is a disk that physically feels likes its attempting to rip its heart out. Two editions, for a total of 800,000 blocks of usage, were run on two different unloaded cpu's, one 8700 the other a 8500. The SPM package from DEC accumulated performance statistics during this time period and the following results tell the comparison between various types of disks. The 'response time' stat has been defined to me as the time it takes an I/O request to travel from the VAX bulkhead CI connection, through the HSC, out to the drive, have it performed, and back to the bulkhead. This therefore includes any time needed by the HSC to setup the request for the disk. The tests on the SI93C drives were done on a VAX 8350, with the attached processor enabled. +---------- Shadow Disk Statistics ---------+ ! Serv Resp ! ! Rate Time Time Queue ! ! (/s) (ms) (ms) Length ! ! ------ ------ ------ ------ ! ! RA$81 14.9 67 131 2.0 ! ! SI$83C 16.6 60 117 1.9 ! ! SI$93C 17.5 56 101 1.8 ! +-------------------------------------------+ +-------- Seperate Disk Statistics ---------+ ! Serv Resp ! ! Rate Time Time Queue ! ! (/s) (ms) (ms) Length ! ! ------ ------ ------ ------ ! ! RA$81 16.8 60 117 2.0 ! ! SI$83C 19.2 52 100 1.9 ! ! SI$93C 20.9 48 88 1.8 ! +-------------------------------------------+ Table B Comparison Of Disk Performance Using Test Program For both the tables shown above, it needs to be noted that there was always an I/O request that was waiting to execute on the drive and therefore no 'idle' time as far as the drive was concerned. This is shown in the 'Queue Length' value for each test. The other item to note here is that the 'Response Time' values for the shadow disks are larger then the values for the separate disks. The basis for this, I feel, is that while the shadow disks were on different HSC requestor cards, the test program performs a read after EVERY write. The result is that both members of the shadow set have to complete the previous write operation and then the HSC does a comparison between them to see which could provide the information faster on the following read. The increase that is shown, therefore is the time to completely update both shadow members and do the comparison. I also feel that these numbers represent the worst case scenario, except if both shadow members are on the same requestor, that you should encounter. The point to remember with shadow sets is that the 'win' situation is with I/O that is largely read requests instead of write requests, and the test program is exactly opposite this. The problems, other then the ones listed above, that I have had to date are as follows with any updates that I know at the time of this writing. Error messages on the wrong HSC channel. This problem existed if you had the drives dual ported between two HSC's and the drive was 'selected' by one of them. If the drive reported any errors, both HSC's received the message packet. The problem existed for the HSC which did not have the drive selected and therefore was not in its list of 'known drives'. If the errors happened frequent enough, the opposing HSC invoked ILEXER to find the problem and it could not find the drive and this continued until the HSC would declare the drive inoperatble or crash. This has since been corrected and new firmware is being distributed to existing sites and installed on new systems. I have not had the firmware in long enough to determine if it is truly fixed myself. Concurrent with the firmware update, there may be a need to have the disk(s) reformatted to provide more space for diagnostic testing on the disk. SI has just announced and provided to the field offices a box that will enable them to format and exercise a FUJI/NEC drive fully, without any HSC or host support needed. The third screw back on the rack slides which hold the SI controller in the cabinet can push the metal standoff inwards and the result is a C-Mod card that is bent. This was the case on the SI83C drives and was remedied by removing the screw completely. The problem is a screw that is about 1/16th of an inch to long and the metal standoff between C-Mod cards being perfectly located to coincide with this screw. The SI controller box has only one power switch in its current form. The implication here is that should one C-Mod card fail in the controller cabinet, the entire cabinet has to be powered down to fix the problem, which would also cause up to four drives to unavailable for use during the outage. The space inside the cabinet is layed out in such a way that any work done to one C-Mod card would require that all drives with controllers in the cabinet be turned over to field service. The attachment of the SDI and drive cables to the back of the controller cabinet are done in a very compact way, which can cause headaches for the field service representative. Similar cables for two drives are directly over one another and the cable hold downs have screws on the top. The result is cramped space between the top and bottom cables and the back of the cabinet when it is pulled out on the slide racks. The cable from the SI83C drives to the SI controller cabinet is approximately three (3) inches wide and is installed in such a manner that it covers over an air vent in the power supply that is four (4) inches wide. The result is an air vent with only one (1) inch of effective air flow on a power supply. The failure rate to date has been one C-Mod card, one power supply one FUJI drive and two NEC drives. The C-Mod card and power supply failures I would attribute to the power failures we have had recently (four in five months) and is minor compared to the five HSC requestor cards, two HSC CPU cards, three HSC CI Link cards and various DEC HDA and VAX 8XXX problems I have had in the same period. The FUJI HDA failure happened immediately after delivery and the initial power up, and I consider it the result of shipping damage. The two NEC HDA replacements I attribute to power failures also. These are located in another computer room I have in Wilmington, Delaware. On a GOOD week we ONLY have one power failure, enough said. There appears to be a 'latching' problem between the 'C-MOD' card in the SI controller and the FUJI/NEC drives. The problem comes up when you power down a drive and power it up. Or power a controller cabinet down then up. The write protect signal and drive ready signals can be latched in the wrong state and require several more power down/up cycles to clear them. Overall I have to say that I am very pleased with the disk drives, their performance and the support of my local field service office. SI appears to have a product on the market that provides an alternative to the equipment that is offered by DEC and the pricing is very attractive. Of all the issues that I have talked about here, the only one that could change due to your getting the SI disk drives is the support of the local field service office. I view this as an important issue and one that needs constant monitoring and changes. I feel it is the users responsibility to watch the equipment on a day to day basis and to notify the local office of any problems. It is also the users responsibility to press any issues that arise over questionable or incomplete support that you may be receiving. If the field service office is not supporting you to the extent that you feel is needed, take it up with your salesman, and let them work the issue for you. If they can not resolve the issue call the west coast, but only after all other avenues have been tested. NOTE*** All comments, statements and facts here are my own, and not that of my employers, National Teachers Life Insurance, Teachers Service Organization (TSO) or any of their subsidiaries. All rights to this article are reserved. This article is not meant to be a 'Sales' pitch of the product. I have no connections with SI, short of HEAVILY using their equipment. Any electronic reprint of this article MUST completely contain this NOTE. NO PERMISSION IS GIVEN TO REPRINTING THIS ARTICLE OR ANY PARTS OF IT ON PAPER, OR SIMILAR SUBSTANCES. Paul D. Clayton Manager Of Systems TSO Financial Corp. Horsham, Pa. USA 19044 Address - CLAYTON%XRT@CIS.UPENN.EDU