[comp.sys.amiga] 2091 problems

stan@teroach.UUCP (Stan Fisher) (07/20/90)

In article <1518@faatcrl.UUCP> jprad@faatcrl.UUCP (Jack Radigan) writes:
>papa@pollux.usc.edu (Marco Papa) writes:
>
>>Maybe I should have qualified 'stomping' as BOTH reading and writing 
>>location 0. On an A1000, ZeroMung & Co. will help you to find writing loc 0
>>problems.  No way for you to find 'reading loc 0' problems.
>
>Think about this again.  On a machine with a 2091 and a still stomped location
>zero, you get a semephore guru.  If I stomp on location zero with a non-zero
>value using an A1000 and then run JR-Comm, it does not guru.  This is what is
>so baffling.
>
>  -jack-

I seem to be one of the lucky ones that has had no problems since I 
followed the procedure for changing which FFS is being used on my 2091...   

My problem, and one I've heard of elsewhere, which doesn't seem to be being
addressed by Commodore is when you've got multiple fast SCSI devices (drives)
and do copies between the two drives (physical drives, not just partitions)
the whole system hangs with the drive access LED stuck on.  The only way out
is to re-boot and then wait for the validator to do it's thing.
There seems to be a REAL problem with the SCSI bus arbitration on the 2091!
Can't someone at Commodore comment on this?
I already posted this problem to c.s.a.t and heard nothing back.

Stan

  Stan Fisher -  stan@teroach.phx.mcd.mot.com -  asuvax!mcdphx!teroach!stan
  Motorola Microcomputer Division, Tempe, Arizona   -  Voice (602) 438-3228
  Call our User Group BBS "M.E.C.C.A." running Atredes 1.1 @ (602) 893-0804

stevem@sauron.Columbia.NCR.COM (Steve McClure) (07/21/90)

In article <13242@mcdphx.phx.mcd.mot.com> stan@teroach.UUCP (Stan Fisher) writes:
+
+I seem to be one of the lucky ones that has had no problems since I 
+followed the procedure for changing which FFS is being used on my 2091...   
+
+My problem, and one I've heard of elsewhere, which doesn't seem to be being
+addressed by Commodore is when you've got multiple fast SCSI devices (drives)
+and do copies between the two drives (physical drives, not just partitions)
+the whole system hangs with the drive access LED stuck on.  The only way out
+is to re-boot and then wait for the validator to do it's thing.
+There seems to be a REAL problem with the SCSI bus arbitration on the 2091!
+Can't someone at Commodore comment on this?
+I already posted this problem to c.s.a.t and heard nothing back.

My dealer quoted a CBM memo that new 2091 firmware was forthcoming to fix
this problem.  Supposed to be here end of July.

Steve
-- 
----------------------------------------------------------------------
Steve		email: Steve.McClure@Columbia.NCR.COM	803-791-7054
The above are my opinions, which NCR doesn't really care about anyway!
CAUSER's Amiga BBS! | 803-796-3127 | 8pm-8am 8n1 | 300/1200/2400

phils@tekig5.PEN.TEK.COM (Philip E Staub) (07/24/90)

In article <13242@mcdphx.phx.mcd.mot.com> stan@teroach.UUCP (Stan Fisher) writes:
>My problem, and one I've heard of elsewhere, which doesn't seem to be being
>addressed by Commodore is when you've got multiple fast SCSI devices (drives)
>and do copies between the two drives (physical drives, not just partitions)
>the whole system hangs with the drive access LED stuck on.  The only way out
>is to re-boot and then wait for the validator to do it's thing.
>There seems to be a REAL problem with the SCSI bus arbitration on the 2091!
>Can't someone at Commodore comment on this?
>I already posted this problem to c.s.a.t and heard nothing back.
>
>Stan
>
>
>  Stan Fisher -  stan@teroach.phx.mcd.mot.com -  asuvax!mcdphx!teroach!stan
>  Motorola Microcomputer Division, Tempe, Arizona   -  Voice (602) 438-3228
>  Call our User Group BBS "M.E.C.C.A." running Atredes 1.1 @ (602) 893-0804

After looking at several device drivers, I've suspected this to be a problem 
for a while now, and it's not limited to the A2091.

I suspect that the problem has its roots in some sample device driver code
which Commodore has been distributing (and updating, from time to time). The
code implements a sample ramdisk driver, but it has shown up in various
incarnations, including as the basis for a hard disk driver by at least one
other manufacturer.

The gist of it is something like this:

The driver starts up a separate task for each physical drive. Each task
has a "busy" flag to indicate that the drive is in use, to prevent 
multiple processes from attempting to access the same drive at the same
time. This works just fine, as long as you only have one drive connected,
or if you're only accessing one drive at a time.

Unfortunately, access to multiple drives must pass through the same
SCSI controller chip on the interface board, and there isn't (last time
I looked) any mechanism to prevent an attempt by these multiple tasks
to access the controller at the same time. This could happen if you have
two disk intensive programs running at the same time, or if you have a 
single program which does asynchronous I/O to the harddisk, and it happened 
to access files on both drives.

To better understand what can happen, assume that program A wants to access
a file on drive A. The SCSI transfer takes place by sequencing the SCSI bus
through a specific set of bus states. The bus cannot be in more than one
state at a time, and it is not allowed to deviate from the prescribed
sequence. So program A's transfer gets part way through when program B wants
to access drive B. Drive B's driver task has no way of knowing that 
Drive A's task is in the middle of a transfer, so it starts it's own
transfer, causing chaos, because Drive A can't understand why it's sequence
has been interrupted (it refuses to release the bus), and since the bus is 
locked by Drive A, Drive B's transfer cannot begin.

I can think of at least two ways to handle this. The first would be to
implement a semaphore-like mechanism to control access to the controller
chip itself. This can be a bit tricky, though, if future support of
disconnect/reconnect-capable drives is anticipated. You have to release the
semaphore upon disconnect and re-acquire it upon reconnect, as well as the
arbitration for initial access.

The second way (which may not work) would be to check the chip status
prior to beginning the sequence of states which constitutes a bus transfer.
The reason I say it may not work is that there is a potential race condition
between checking controller status and beginning the transfer. At least a 
forbid()/permit() would be necessary here.

Perhaps the use of exec semaphores will ultimately provide the cleanest
solution. But from some red flags I see in the AutoDocs on semaphores, I'd 
suspect it wouldn't be totally safe to use them on 1.3 or earlier. Perhaps
some comments from C= on the conditions under which 1.3 semaphores fail
might be appropriate here.

Now, understand that this is just presumption on my part, since I don`t have
access to the source for Commodore's 2091 driver, but it seems to fit the
symptoms. I may be all wet, but I submit it's at least something to check
into.

Sorry for this technical discussion in this newsgroup. I've re-directed
followups back to comp.sys.amiga.tech. Also sorry I wasn't paying attention
when you posted to c.s.a.t before.

At any rate, as Stan said, comments from C= would be welcome.

-Phil
-- 
------------------------------------------------------------------------------
Phil Staub, phils@tekig5.PEN.TEK.COM
Definition: BUG: A feature (present or absent) which is (at best) inconvenient.