CLAYTON@XRT.UPENN.EDU ("Clayton, Paul D.") (11/08/87)
Information From TSO Financial - The Saga Continues...
Chapter 32, Part 2 Of 2 - November 7, 1987
The last item of the installation was to become knowledgeable of
the terms of the field service contract that is bundled into the
purchase. The call window is 9:00 AM to 5:00 PM, Monday to Friday. The
window can be expanded at additional cost to what ever is required. The
time lag from placing a call for service to having someone here working
is to be less then four (4) hours. I am told that these terms can be
different according to your location and the distance to the nearest SI
field service office. It needs to be noted here that I will not have DEC
or other vendors perform the maintenance on these drives for a long time.
My reason is that the controller is brand new to the market and still
receiving significant fixes and upgrades. I feel the ONLY way to get
these fixes in a timely manner is to have the manufacturer perform the
maintenance. It also helps when you concern yourself with the sparing
issue at the local office.
With that done, everyone cleared out and I put the drives to
immediate use by running a program I wrote some time ago to heavily
load a disk with I/O. The program is designed such that a 400,000 block
file is created and random length records, 1 to 32,000 bytes, are
written then read from the file on a purely random basis. After the
write is complete, a read of the same information at the same location
on disk is performed and the information is compared with what was
written to insure that there was no errors in writing to the disk. This
process continues until the program is aborted or more then 5,000,000
write operations to the disk are completed. Status report lines are
generated every so often to indicate how the test is progressing.
Should an error occur, all pertinent information is printed out and the
operation is retried up to ten (10) times before moving on. The result
is a disk that physically feels likes its attempting to rip its heart
out. Two editions, for a total of 800,000 blocks of usage, were run on
two different unloaded cpu's, one 8700 the other a 8500. The SPM
package from DEC accumulated performance statistics during this time
period and the following results tell the comparison between various
types of disks. The 'response time' stat has been defined to me as the
time it takes an I/O request to travel from the VAX bulkhead CI
connection, through the HSC, out to the drive, have it performed, and
back to the bulkhead. This therefore includes any time needed by the
HSC to setup the request for the disk. The tests on the SI93C drives
were done on a VAX 8350, with the attached processor enabled.
+---------- Shadow Disk Statistics ---------+
! Serv Resp !
! Rate Time Time Queue !
! (/s) (ms) (ms) Length !
! ------ ------ ------ ------ !
! RA$81 14.9 67 131 2.0 !
! SI$83C 16.6 60 117 1.9 !
! SI$93C 17.5 56 101 1.8 !
+-------------------------------------------+
+-------- Seperate Disk Statistics ---------+
! Serv Resp !
! Rate Time Time Queue !
! (/s) (ms) (ms) Length !
! ------ ------ ------ ------ !
! RA$81 16.8 60 117 2.0 !
! SI$83C 19.2 52 100 1.9 !
! SI$93C 20.9 48 88 1.8 !
+-------------------------------------------+
Table B
Comparison Of Disk Performance Using Test Program
For both the tables shown above, it needs to be noted that there
was always an I/O request that was waiting to execute on the drive and
therefore no 'idle' time as far as the drive was concerned. This is
shown in the 'Queue Length' value for each test. The other item to note
here is that the 'Response Time' values for the shadow disks are larger
then the values for the separate disks. The basis for this, I feel, is
that while the shadow disks were on different HSC requestor cards, the
test program performs a read after EVERY write. The result is that both
members of the shadow set have to complete the previous write operation
and then the HSC does a comparison between them to see which could
provide the information faster on the following read. The increase that
is shown, therefore is the time to completely update both shadow
members and do the comparison. I also feel that these numbers represent
the worst case scenario, except if both shadow members are on the same
requestor, that you should encounter. The point to remember with shadow
sets is that the 'win' situation is with I/O that is largely read
requests instead of write requests, and the test program is exactly
opposite this.
The problems, other then the ones listed above, that I have had to
date are as follows with any updates that I know at the time of this
writing.
Error messages on the wrong HSC channel. This problem existed if
you had the drives dual ported between two HSC's and the drive was
'selected' by one of them. If the drive reported any errors, both HSC's
received the message packet. The problem existed for the HSC which did
not have the drive selected and therefore was not in its list of 'known
drives'. If the errors happened frequent enough, the opposing HSC
invoked ILEXER to find the problem and it could not find the drive and
this continued until the HSC would declare the drive inoperatble or
crash. This has since been corrected and new firmware is being
distributed to existing sites and installed on new systems. I have not
had the firmware in long enough to determine if it is truly fixed
myself. Concurrent with the firmware update, there may be a need to
have the disk(s) reformatted to provide more space for diagnostic
testing on the disk. SI has just announced and provided to the field
offices a box that will enable them to format and exercise a FUJI/NEC
drive fully, without any HSC or host support needed.
The third screw back on the rack slides which hold the SI
controller in the cabinet can push the metal standoff inwards and the
result is a C-Mod card that is bent. This was the case on the SI83C
drives and was remedied by removing the screw completely. The problem
is a screw that is about 1/16th of an inch to long and the metal
standoff between C-Mod cards being perfectly located to coincide with
this screw.
The SI controller box has only one power switch in its current
form. The implication here is that should one C-Mod card fail in the
controller cabinet, the entire cabinet has to be powered down to fix
the problem, which would also cause up to four drives to unavailable
for use during the outage. The space inside the cabinet is layed out in
such a way that any work done to one C-Mod card would require that all
drives with controllers in the cabinet be turned over to field service.
The attachment of the SDI and drive cables to the back of the
controller cabinet are done in a very compact way, which can cause
headaches for the field service representative. Similar cables for two
drives are directly over one another and the cable hold downs have
screws on the top. The result is cramped space between the top and
bottom cables and the back of the cabinet when it is pulled out on the
slide racks. The cable from the SI83C drives to the SI controller
cabinet is approximately three (3) inches wide and is installed in such
a manner that it covers over an air vent in the power supply that is
four (4) inches wide. The result is an air vent with only one (1) inch
of effective air flow on a power supply.
The failure rate to date has been one C-Mod card, one power supply
one FUJI drive and two NEC drives. The C-Mod card and power supply
failures I would attribute to the power failures we have had recently
(four in five months) and is minor compared to the five HSC requestor
cards, two HSC CPU cards, three HSC CI Link cards and various DEC HDA
and VAX 8XXX problems I have had in the same period. The FUJI HDA
failure happened immediately after delivery and the initial power up,
and I consider it the result of shipping damage. The two NEC HDA
replacements I attribute to power failures also. These are located in
another computer room I have in Wilmington, Delaware. On a GOOD week we
ONLY have one power failure, enough said.
There appears to be a 'latching' problem between the 'C-MOD' card
in the SI controller and the FUJI/NEC drives. The problem comes up when
you power down a drive and power it up. Or power a controller cabinet
down then up. The write protect signal and drive ready signals can be
latched in the wrong state and require several more power down/up
cycles to clear them.
Overall I have to say that I am very pleased with the disk drives,
their performance and the support of my local field service office. SI
appears to have a product on the market that provides an alternative to
the equipment that is offered by DEC and the pricing is very
attractive. Of all the issues that I have talked about here, the only
one that could change due to your getting the SI disk drives is the
support of the local field service office. I view this as an important
issue and one that needs constant monitoring and changes. I feel it is
the users responsibility to watch the equipment on a day to day basis
and to notify the local office of any problems. It is also the users
responsibility to press any issues that arise over questionable or
incomplete support that you may be receiving. If the field service
office is not supporting you to the extent that you feel is needed,
take it up with your salesman, and let them work the issue for you. If
they can not resolve the issue call the west coast, but only after all
other avenues have been tested.
NOTE***
All comments, statements and facts here are my own, and not that
of my employers, National Teachers Life Insurance, Teachers Service
Organization (TSO) or any of their subsidiaries. All rights to this article
are reserved. This article is not meant to be a 'Sales' pitch of the
product. I have no connections with SI, short of HEAVILY using their
equipment. Any electronic reprint of this article MUST completely contain
this NOTE. NO PERMISSION IS GIVEN TO REPRINTING THIS ARTICLE OR ANY
PARTS OF IT ON PAPER, OR SIMILAR SUBSTANCES.
Paul D. Clayton
Manager Of Systems
TSO Financial Corp.
Horsham, Pa. USA 19044
Address - CLAYTON%XRT@CIS.UPENN.EDU