[comp.periphs.scsi] BIG problem with 1542B and Quantum disks

tmanos@wyvern.uucp (Tom Manos) (05/09/91)

Hello netters,

I need help in a big way.

I've started having problems with my UNIX system since I installed an
Adaptec AHA-1542B and two Quantum SCSI disks: a ProDrive 210S and a ProDrive
105S, both internal.  I'm running with the following h/w & s/w:
Zeos 386/20, 8MB RAM, no coprocessor, generic VGA card.
Microport SysV/386 r3.2.2 with the Columbia Data Products driver set.

I'm experiencing the following symptoms:

When both drives are installed and mounted doing:
find <drive1> -depth -print | cpio -pd > <drive2>
will hang the system after a random but small number of copies.  The disks
will seem to slow down for a few seconds, until they come to a complete halt
from which there is no return save the reset button.  The second disk is
totally unusable. As long as I'm only accessing one disk, everything works.

When only the 210S is mounted, the system will run, but disk throughput
seems slow, and when there is intense disk activity (like when news is being
processed), keyboard response is agonizingly slow while waiting for the
disk.  Sometimes it takes more than a minute for a response.  Also, even
with only one drive mounted, I've noticed that when using the floppy
(with cpio for instance), the floppy head will do wierd seeks, like it got
a read error and is retrying, when there is SCSI disk activity.  I'm
running the floppy off the 1542B. 

The AHA-1542B is configured as follows:
SCSI id 7
Synchronous transfer enabled
SCSI parity enabled
DMA channel 5
DMA request 5
DMA acknowledge 5
Interrupt channel 11
DMA speed 5MB/s
Adaptec bios enabled
bios base addr DC00
auto sense enabled

The drives are configured as SCSI id 0 & 1, with parity enabled.  I've got
terminators on the 1542B and the farthest drive from the card (the 105S).

Can anyone tell me what my problem could be?  Microport can't.  Alliance
Peripheral Systems (my retailer) can't.  I've got a 105MB doorstop right
now that I need to make useful.

Please help!
Thanks!

Tom  (tmanos@wyvern or ...!uunet!xanth!wyvern!tmanos)
-- 
Tom Manos      Norfolk, VA      tmanos@wyvern       (...!xanth!wyvern!tmanos)
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
I gotta get away from this day to day running around.
Everybody knows this is nowhere.  - Neil Young

pa@curly.appmag.com (Pierre Asselin) (05/10/91)

In article <1991May9.002915.262@wyvern.uucp>
tmanos@wyvern.uucp (Tom Manos) writes:
>I've started having problems with my UNIX system since I installed an
>Adaptec AHA-1542B and two Quantum SCSI disks: a ProDrive 210S and a ProDrive
>105S, both internal.  [...]
       ^^^^^^^^^^^^^

Here goes nothing:  my own "internal" 105S came with terminating
resistors.  Look for SIP's on the PC board just above the SCSI
connector.  Try wiring one drive at an unterminated end of the bus.

As long as I'm here, should I even *think* of removing the drive's
terminating resistors?  I hate to fix something that ain't broke,
but I'd like to do things right.  Tom will have to do something
(maybe).  Is trading-in the drives his only option?

--Pierre Asselin, R&D, Applied Magnetics Corp.  I speak for me.

abc@rock.concert.net (Alan Clegg) (05/10/91)

In article <1991May9.002915.262@wyvern.uucp> tmanos@wyvern.uucp (Tom Manos) writes:
. When both drives are installed and mounted doing:
. find <drive1> -depth -print | cpio -pd > <drive2>
. will hang the system after a random but small number of copies.  The disks
. will seem to slow down for a few seconds, until they come to a complete halt
. from which there is no return save the reset button.  The second disk is
. totally unusable. As long as I'm only accessing one disk, everything works.

Are you saying that the drives actually spin down?  If so, you may not have
a large enough power supply in your machine.

[Stupid suggestion, but, who knows...]

-abc
-- 
"this is a hellacious hairball and we  | Alan Clegg - Network Programmer
 need to start by acknowledging that   | MCNC -- Center for Communications
 everybody can't get what they want"   | Research Triangle Park, NC, USA
[ Paul Vixie, on RFC-822 Extensions ]  | abc@concert.net, abc@mcnc.org

manos@wisteria.cs.odu.edu (Tom Manos) (05/11/91)

In article <1991May10.140925.7452@rock.concert.net> abc@rock.concert.net (Alan Clegg) writes:
>In article <1991May9.002915.262@wyvern.uucp> tmanos@wyvern.uucp (Tom Manos) writes:
>. When both drives are installed and mounted doing:
>. find <drive1> -depth -print | cpio -pd > <drive2>
>. will hang the system after a random but small number of copies.  The disks
>. will seem to slow down for a few seconds, until they come to a complete halt
>. from which there is no return save the reset button.  The second disk is
>. totally unusable. As long as I'm only accessing one disk, everything works.
>
>Are you saying that the drives actually spin down?  If so, you may not have
>a large enough power supply in your machine.
>
>[Stupid suggestion, but, who knows...]

Not a stupid suggestion, I just didn't make myself clear.  The drives don't
spin down, but disk I/O slows to a standstill.  Not immediately, but over
the space of 10 seconds or so.

I still haven't gotten any useful suggestions on what is wrong here...
Anybody???

Tom (tmanos@wyvern  or ...!uunet!xanth!wyvern!tmanos)

bill@unixland.uucp (Bill Heiser) (05/11/91)

In article <711@curly.appmag.com> pa@appmag.com (Pierre Asselin) writes:
>
>As long as I'm here, should I even *think* of removing the drive's
>terminating resistors?  I hate to fix something that ain't broke,

It depends on where the drive is on the bus!  The rules are that the
devices at each end of the bus require termination;  other devices should
not be terminated.


-- 
bill@unixland.uucp                 The Think_Tank BBS & Public Access Unix
    ...!uunet!think!unixland!bill
    ..!{uunet,bloom-beacon,esegue}!world!unixland!bill
508-655-3848 (2400)   508-651-8723 (9600-HST)   508-651-8733 (9600-PEP-V32)

manos@wisteria.cs.odu.edu (Tom Manos) (05/12/91)

In article <1991May10.220314.13500@unixland.uucp> bill@unixland.uucp (Bill Heiser) writes:
>In article <711@curly.appmag.com> pa@appmag.com (Pierre Asselin) writes:
>It depends on where the drive is on the bus!  The rules are that the
>devices at each end of the bus require termination;  other devices should
>not be terminated.
>
>
Nope, termination is not the answer.  The 1542B is terminated, along with
the furthest drive (the 105S).  The first drive (the 210S) is not.

Still no answer to this problem.  I figured Roy Neese would figure this one
out the day I posted.  Roy, you out there?

Tom

bill@unixland.uucp (Bill Heiser) (05/12/91)

In article <1991May11.174540.13653@cs.odu.edu> manos@wisteria.cs.odu.edu (Tom Manos) writes:
>
>Still no answer to this problem.  I figured Roy Neese would figure this one
>out the day I posted.  Roy, you out there?
>

Well, I saw something similar to your problem a while back.  I was adding
a "new" disk to the system -- when I tried to tar/cpio files from the old
disk to the new disk, it would go for a few minutes then sort of sputter
out.  Then the machine would hang solid.

I took the "new" disk back and exchanged it for another, and (that) problem
went away.  

I didn't reply to this sooner, because it doesn't give you any concrete help.
I don't know if this is your problem, and if it is, I don't know how you
can tell which disk is the offendor.  Sorry...

bill


-- 
bill@unixland.uucp                 The Think_Tank BBS & Public Access Unix
    ...!uunet!think!unixland!bill
    ..!{uunet,bloom-beacon,esegue}!world!unixland!bill
508-655-3848 (2400)   508-651-8723 (9600-HST)   508-651-8733 (9600-PEP-V32)

rwhite@jagat.uucp (Robert White) (05/13/91)

Ok, dumb question... do you have a driver written spesifically for the
SCSI board you are using?  Is it also for the UNIX Release you are using?
Using the bare minimum knowlege of SCSI and the discription of your
symptoms I came up with what follows.

Since SCSI is a interupt-response kind of deal that allows for but
does not necessarily include multiple outstanding requests (depending
on the controller logic) you could be experiencing problems with
instruction interleaving on the SCSI bus.

The multiple drive disaster formula comes into play when a software
driver is expecting a blockeing behavior our of the controller and
does not get it.  To Whit:

Driver places a request for X:A (sector A from drive X)
   and returns with no data because the SCSI system is running ASYNC.

Driver places another request for X:A and returns with data 'cause
   the previous request has returned and teh data "just happens"
   to be in place in the SCSI bus buffer.

Everybody is happy but the system is slowed down because the lower
level driver automatic retry routine is acting as a busy-wait and
is actively consuming cpu during disk activity.  The heavier the
activity the more cpu is waiusted processing interuupts that are
unnecessary.

If the driver is set to do "fair scheduling" and there is more
than one device on the bus, the model fails...

   Processor                 SCSI bus
    request                 data cache
===============          ================
      X:A                      ---
      Y:B                      X:A
      X:A                      Y:B
      Y:B                      X:A
<and so on...>

When the second device is added to the system the probability
of consicutive requests for the same data diminish as the load
reaches equal proportions across the diverse drives.

NOTE: this is a simple model of asynchronus drive scheduling
taken directly from some of my readings involving such matters
and may not accurately represent the SCSI access methods.
But it is a stock problem and it does match your symptoms.

Depending on exactly how your driver is configured in its implementation
it may be as simple as rebuilding your kernel with a couple of
parameters jiggled.  If you don't have the correct driver, and/or
the board you possess isnt one of those supported by the native
internals provided by the manufacturer you may just need to
either a)get the driver updated by the controler manufacturer,
or b) change controlers for on that is supported.
-- 
Robert C. White Jr.          |  I have moved my news reading activities
rwhite@jagat           <Home |  not directly related to my job off of my
rwhite@nusdecs         <Work |  employers machine.  Please use "jagat"
"Like most endevors, life is seriously over-advertised and under-funded"

herman@corpane.uucp (Harry Herman) (05/15/91)

In <711@curly.appmag.com> pa@curly.appmag.com (Pierre Asselin) writes:

>Here goes nothing:  my own "internal" 105S came with terminating
>resistors.  Look for SIP's on the PC board just above the SCSI
>connector.  Try wiring one drive at an unterminated end of the bus.

>As long as I'm here, should I even *think* of removing the drive's
>terminating resistors?  I hate to fix something that ain't broke,
>but I'd like to do things right.  Tom will have to do something
>(maybe).  Is trading-in the drives his only option?

>--Pierre Asselin, R&D, Applied Magnetics Corp.  I speak for me.

As I understand the SCSI spec, only the electrical ends of the cable
are to be terminated.  Generally the controller is at one end of
the cable, and some drive is at the other end.  We have routinely
removed the terminators from all drives in between.  In one case
we purchased a drive that had its terminators SOLDERED IN PLACE.
So, I put that at the electrical end of the cable, but still set
the LUN jumper to be a higher unit number.  So, on that system, the
physical drive ordering is unit 2, unit 0 and unit 1 with the controller
after unit 1, and terminators in unit 2 and the controller.

				Harry Herman
				herman@corpane