[comp.sys.sgi] Fuji 2266SA problem

tadguy@abcfd01.larc.nasa.gov (Tad Guy) (04/10/91)

Does anyone have experience with Fujitsu 2266SA disks in power series
workstations?  I've installed these disks in three of our hosts (a
4D/210, a 4D/320 and a 4D/340 all running Irix 3.3.2), and they seemed
to work fine.  However, now that they are getting serious use, I've
begun to see this message appear in SYSLOG:

	dks0d2s10: Unrecoverable device error: Internal controller error.

(Descriptive, huh?)  

Often this message seems benign, but occasionally an inode on that
disk will become locked and attempts to access it result in the
accessing process hanging in disk wait.  The only fix I've found so
far is to reboot, and then the filesystem does NOT appear corrupted
when fsck checks it.

I've reformatted, but that didn't change the behavior.  The formatting
occurred without error.  The drive is the only terminated device on
the bus, and is the last device on the chain...

I suspect I need to change a config jumper on the drive, but don't
know which one(s) are suspect.  The current jumper settings are below.
With the exception of the SCSI command level and the sync mode being
disabled, these are the factory settings.

Any suggestions or solutions would be appreciated.  
	...tad

Drive:
	Fujitsu 2266SA
Host:
	Silicon Graphics 4D/320VGX w/64M memory, running IRIX 3.3.2
Legend:
	1 = short (closed)
	0 = open

CN3:	1111010
            ^^^-- scsi target = 2
	   ^----- time monitoring enabled
	  ^------ read-ahead caching enabled
         ^------- reserved
        ^-------- normal operation mode

CNH1:	11111111
              ^^- ack signal wait monitoring time == infinity
              ^^- selection monitoring time = 250ms, 128 retry 
             ^--- unit attention report mode = respond with check cond status
         ^^^^---- reserved
        ^-------- led display operates with scsi bus 

CNH2:   01111110
               ^- synchronous mode disabled
              ^-- synchronous transfer rate = 0.96 - 4.8MB/s
             ^--- parity check enabled
            ^---- motor start on power on
           ^----- per default value = 0
          ^------ check condition status not posted on parameter rounding
         ^------- save data pointer issued for disc after data transfer
        ^-------- scsi level = scsi-1/ccs (have also tried scsi-2, same)

CNH4:	11
	^^------- connecting termpwr pin and terminating resistor
	          and connecting power to termpwr pin
                  and connecting power to idd terminating resistor

olson@anchor.esd.sgi.com (Dave Olson) (04/10/91)

In <TADGUY.91Apr9170915@abi2.larc.nasa.gov> tadguy@abcfd01.larc.nasa.gov (Tad Guy) writes:

| Does anyone have experience with Fujitsu 2266SA disks in power series
| workstations?  I've installed these disks in three of our hosts (a
| 4D/210, a 4D/320 and a 4D/340 all running Irix 3.3.2), and they seemed
| to work fine.  However, now that they are getting serious use, I've
| begun to see this message appear in SYSLOG:
| 
| 	dks0d2s10: Unrecoverable device error: Internal controller error.

I can't speak to the rest of the problems, but this actually *IS*
descriptive.  The driver is reporting in pseudo-English just what
the drive has just reported, namely that there was an internal
error on the embedded controller that could be recovered from.

In other words, a command failed with check condition status, and
the primary sense code was 4 (Unrecoverable device error), with
an additional sense code value of 0x44 (Internal controller error).

The driver is passing on to you as much info as it got from the drive.
It doesn't sound like this error is related to jumper settings, but it
is possible.

These (or similar) wordings are found in the request sense descriptions
in most scsi device manuals.
--

	Dave Olson

Life would be so much easier if we could just look at the source code.

townsend@RAINBOW.UCHICAGO.EDU ("R. Michael Townsend") (04/11/91)

> 
> Does anyone have experience with Fujitsu 2266SA disks in power series
> workstations?  I've installed these disks in three of our hosts (a
> 4D/210, a 4D/320 and a 4D/340 all running Irix 3.3.2), and they seemed
> to work fine.  However, now that they are getting serious use, I've
> begun to see this message appear in SYSLOG:
> 
> 	dks0d2s10: Unrecoverable device error: Internal controller error.
> 

I am quite familiar with this!  Our situation: a 4D/380 with several 2266's
and several 2263's, 9 track tape, 8mm, and QIC all on an IO2; OS: IRIX 3.3.2.
The above messages which left us dead in the water for a week basically (the 
2263's failed to work again (even when putting them back on a PI (which is where
they were originally formatted))), started when we installed an IO3 to
allow even more SCSI devices.  I ended up doing the following: going back
to the IO2 until separate (off line) tests can be run on a similarly
configured machine with an IO3; leaving off some of the devices I wanted
connected to our machine (there are no SCSI slots free); and attempting to
minimize the cable lengths used in connecting all the devices.  In past lives
(e.g. VAX days), I have always found length limitations on buses to be very
good source of spurious bad peripheral behavior, current technology is no
exception.  Supposedly SCSI bus length max is 6 meters -- adding up all the
devices I have on my system currently, I seem to be WAY OVER spec.  The IO2
seems to tolerate this more than the IO3.  There may be other
incompatibilities that I do not know about, which is why the investigation
continues (especially since I want to reattatch all my devices and can not
in the current IO2 configuration).

INQUIRY and synchronous jumpers closed on the disks WILL NOT WORK.  This too
is being investigated as I very much want to increase the speed of all the
Fuji's I have on the machine.

If SG doesn't make some blanket announcement on configurational guidelines
in this matter when all is "resolved", I will do so those interested (the
volume of which seems to grow daily).

						R. Michael Townsend
						townsend@rainbow.uchicago.edu

tadguy@abcfd01.larc.nasa.gov (Tad Guy) (04/11/91)

olson@anchor.esd.sgi.com (Dave Olson) writes:
> tadguy@abcfd01.larc.nasa.gov (Tad Guy) writes:
> | Does anyone have experience with Fujitsu 2266SA disks in power series
> | workstations? 
> | 	dks0d2s10: Unrecoverable device error: Internal controller error.
> 
> this actually *IS* descriptive.  ...  a command failed with check
> condition status, and the primary sense code was 4 (Unrecoverable
> device error), with an additional sense code value of 0x44 (Internal
> controller error).
> 
> The driver is passing on to you as much info as it got from the drive.
> It doesn't sound like this error is related to jumper settings, but it
> is possible.

Thanks...

Shortly after I posted my message, I discovered a 2266 related posting
in comp.periphs.scsi, that recommended turning off the read-ahead
cache.  I've done so and haven't had any errors yet (in over 24
hours).  In a separate email message, I was told that this is a known
Fuji firmware bug...

I'm keeping my fingers crossed, but it looks good so far.  
Thanks for the responses.

> Life would be so much easier if we could just look at the source code.

Yes.
	...tad