[comp.periphs.scsi] [comp.sys.sun] SCSI Bus Errors

David.B.Stewart@fas.ri.cmu.edu (02/23/90)

Original-posting-by: David.B.Stewart@fas.ri.cmu.edu
Original-subject: SCSI Bus Errors
Reposted-by: emv@math.lsa.umich.edu (Edward Vielmetti)

Our SCSI controller interrupt handlers is detecting bus errors on our
system for unknown reason at arbitrary times when a M68020 coprocessor
board is in our system.

SYSTEM:	Sun 3/160 w/ 4MBytes RAM, SunOS 4.0.3.

SCSI:  Defined as following in the CONFIG file; it works fine as long as
the coprocessor board is not being used.

   controller      si0 at vme24d16 ? csr 0x200000 priority 2 vector siintr 0x40
   disk            sd0 at si0 drive 0 flags 0
   disk            sd1 at si0 drive 1 flags 0
   tape            st0 at si0 drive 32 flags 1

COPROCESSOR BOARD:  It was originally at intr priority level 3, but I
thought that could be the problem, so I lowered it to the same intr level
priority.  (Ideally I would like to keep my device at priority level 3).
device  hxm0 at vme32d32 ? csr 0x40000000 priority 2 vector hxmintr 0x52

THE ERROR THAT KEEPS RECURRING IS THE FOLLOWING:  When I get one, I get
dozens, and sometimes hundreds.  Sometimes the system recovers, other
times I am forced to Abort the system and reboot, at which time my
filesystem is usually damaged to the point where I have to run fsck
manually.

siintr: bus error during dma
	last phase= 0x0 (BUS FREE)
	csr= 0x334f  bcr= 42440  tcr= 0x4
	cbsr= 0x60 (DATA OUT)  cdr= 0x0  mr= 0x2  bsr= 0x10
	target= 0, lun= 0    DMA addr= 0x5a00  count= 42496 (57344)
	cdb=    a  0  9c  6  70  

PROBLEM:

The coprocessor is accessed through an mmap() driver.  It's interrupt
handler only does a psignal().

Currently, the coprocessor board NEVER becomes bus master, with the
exception of generating the Level 2 (or 3) interrupt.  One or more
processes on the Sun Workstation are reading/writing the memory and
registers of the coprocessor board, all of which are between 0x40000000
and 0x40400000 in A32D32 space. (4MBytes of memory) I have tried setting
the board to other random addresses, both in A32D32 and A42D32 space, but
to no avail.

The problem happens at random times; I could not pinpoint a certain event
that causes the error.  

REQUEST:

Could someone with more information to the possible cause of the error
please send me email.  I will summarize if I get any solutions.

Thanks a great deal,

David B. Stewart, Dept. of Elec. & Comp. Engr., and The Robotics Institute, 
	Carnegie Mellon University,  email: stewart@faraday.ece.cmu.edu 
The following software is now available; ask me for details
        CHIMERA II, A Real-time OS for Sensor-Based Control Applications