[comp.sys.hp] Serious SCSI drive problems

clark@killer.DALLAS.TX.US (Clark Brown) (12/30/88)

We are running a cluster of a 9000/360 server with two 9000/370 clients.
We have several disk drives connected to the server, including 3 hp-ib
connected drives and four SCSI connected drives.

Ever since we added the SCSI drives, we have had a certain trouble with
them.  We occasionally get data errors on reads and writes.  For instance,
we tried an experiment.  We used cp to copy a 5 Mb file.  Then we used
cmp to compare the original to the result.  Out of 10 copies, 4-6 of them
fail.

We had this problem on our 350, and now we see it on our new 370.  The new
machine has a new drive, controller, cable, etc.  The only thing it has
in common with the old machine is the O/S, which is HP-UX 6.2.

We have been working with hardware and software support at HP since 11/1,
but they don't have any answers yet.  Have any of you seen this problem?
Any ideas about what we might try next?

Thanks in advance,

Clark

killer!clark

paulp@hpfcdc.HP.COM (Paul Perlmutter) (12/31/88)

I would like to know more about the configuration you are running.
We have determined that with certain configurations, the early
SCSI boards could cause problems as described by Clark.

In particular, using the older boards with an HP-IB interface at
interrupt level 4 (the same interrupt as the SCSI card) there is
a potential for data corruption.  It has been corrected and we urge
you to conact your field engineer.

Paul Perlmutter
Ft. Collins
229-2549

paulp@hpfcdc.HP.COM (Paul Perlmutter) (12/31/88)

One more detail I forgot to mention:  if you have another interface
card at interrupt 4 (such as the HP98625) then a temporary workaround
to the problem would be to change the interrupt level of the HPIB.
I.e., let the SCSI card have a "dedicated" interrupt level.  For 
the HP98625B, change this to interrupt level 3.  If you have an 
HP98625A in your system, please call me.

Paul Perlmutter

kc@hprnd.HP.COM (Kurt Chan) (01/04/89)

Could you describe the physical configuration for the SCSI portion of your
system?

  1. Is your system cabling less than 6 meters total?

  2. Is there a plug-on terminator on the last device in the chain?

  3. Are there any other SCSI devices on the bus that are powered down (or up)?

  4. Has your HP support person measured TERMPWR to determine if it is within
     spec (and that the fuse is not blown on the host adapter)?  It should be
     no less than 4.0V at SCSI pin 38 on the 370 bulkhead - the higher the
     better.  The tri-state drivers on the 370 *might* still work even without
     the proper termination or terminator power, but would cause link errors.

Kurt Chan
HP, Roseville Networks Division
916-785-5621

glen@hpfcmr.HP.COM (Glen Robinson) (01/07/89)

Actually, this is a formally escalated site.   A workaround
provided by an earlier contributor of putting the fast HP-IB
interfaces to interrupt level 3 (with the SCSI remaining at
interrupt level 4) fixed the customer to the best of my knowledge.

A modified SCSI daughter card has been sent to the CEC for 
delivery to the customer, and this should solve the problem.

For any interested folks, the problem was that the 98265-66501
(SCSI daughter card on the Human Interface Board) trashed an
internal buffer if polled during a transfer and it was not the
card that had requested an interrupt.  Polling followed an 
interrupt service request chain and if another card at the
same interrupt level had interrupted ---> garbaged data when the
(non-interrupting) SCSI card was polled in the chain.
This has been fixed with the 98276-66502 card.  

This is a fairly low incidence (but disastrous) bug because 
the population of users having the configuration is small.
If any readers fit that configuration, the workaround will work,
and you should notify your field engineer for a permanent fix.

Note that this problem is confined to the SCSI daughter card
part number 98265-66501 only.

kinsell@hpfclm.HP.COM (Dave Kinsell) (01/08/89)

>This has been fixed with the 98276-66502 card.  

Surely this should be the 98265-66502.