tadguy@abcfd01.larc.nasa.gov (Tad Guy) (04/10/91)
Does anyone have experience with Fujitsu 2266SA disks in power series workstations? I've installed these disks in three of our hosts (a 4D/210, a 4D/320 and a 4D/340 all running Irix 3.3.2), and they seemed to work fine. However, now that they are getting serious use, I've begun to see this message appear in SYSLOG: dks0d2s10: Unrecoverable device error: Internal controller error. (Descriptive, huh?) Often this message seems benign, but occasionally an inode on that disk will become locked and attempts to access it result in the accessing process hanging in disk wait. The only fix I've found so far is to reboot, and then the filesystem does NOT appear corrupted when fsck checks it. I've reformatted, but that didn't change the behavior. The formatting occurred without error. The drive is the only terminated device on the bus, and is the last device on the chain... I suspect I need to change a config jumper on the drive, but don't know which one(s) are suspect. The current jumper settings are below. With the exception of the SCSI command level and the sync mode being disabled, these are the factory settings. Any suggestions or solutions would be appreciated. ...tad Drive: Fujitsu 2266SA Host: Silicon Graphics 4D/320VGX w/64M memory, running IRIX 3.3.2 Legend: 1 = short (closed) 0 = open CN3: 1111010 ^^^-- scsi target = 2 ^----- time monitoring enabled ^------ read-ahead caching enabled ^------- reserved ^-------- normal operation mode CNH1: 11111111 ^^- ack signal wait monitoring time == infinity ^^- selection monitoring time = 250ms, 128 retry ^--- unit attention report mode = respond with check cond status ^^^^---- reserved ^-------- led display operates with scsi bus CNH2: 01111110 ^- synchronous mode disabled ^-- synchronous transfer rate = 0.96 - 4.8MB/s ^--- parity check enabled ^---- motor start on power on ^----- per default value = 0 ^------ check condition status not posted on parameter rounding ^------- save data pointer issued for disc after data transfer ^-------- scsi level = scsi-1/ccs (have also tried scsi-2, same) CNH4: 11 ^^------- connecting termpwr pin and terminating resistor and connecting power to termpwr pin and connecting power to idd terminating resistor
olson@anchor.esd.sgi.com (Dave Olson) (04/10/91)
In <TADGUY.91Apr9170915@abi2.larc.nasa.gov> tadguy@abcfd01.larc.nasa.gov (Tad Guy) writes: | Does anyone have experience with Fujitsu 2266SA disks in power series | workstations? I've installed these disks in three of our hosts (a | 4D/210, a 4D/320 and a 4D/340 all running Irix 3.3.2), and they seemed | to work fine. However, now that they are getting serious use, I've | begun to see this message appear in SYSLOG: | | dks0d2s10: Unrecoverable device error: Internal controller error. I can't speak to the rest of the problems, but this actually *IS* descriptive. The driver is reporting in pseudo-English just what the drive has just reported, namely that there was an internal error on the embedded controller that could be recovered from. In other words, a command failed with check condition status, and the primary sense code was 4 (Unrecoverable device error), with an additional sense code value of 0x44 (Internal controller error). The driver is passing on to you as much info as it got from the drive. It doesn't sound like this error is related to jumper settings, but it is possible. These (or similar) wordings are found in the request sense descriptions in most scsi device manuals. -- Dave Olson Life would be so much easier if we could just look at the source code.
townsend@RAINBOW.UCHICAGO.EDU ("R. Michael Townsend") (04/11/91)
> > Does anyone have experience with Fujitsu 2266SA disks in power series > workstations? I've installed these disks in three of our hosts (a > 4D/210, a 4D/320 and a 4D/340 all running Irix 3.3.2), and they seemed > to work fine. However, now that they are getting serious use, I've > begun to see this message appear in SYSLOG: > > dks0d2s10: Unrecoverable device error: Internal controller error. > I am quite familiar with this! Our situation: a 4D/380 with several 2266's and several 2263's, 9 track tape, 8mm, and QIC all on an IO2; OS: IRIX 3.3.2. The above messages which left us dead in the water for a week basically (the 2263's failed to work again (even when putting them back on a PI (which is where they were originally formatted))), started when we installed an IO3 to allow even more SCSI devices. I ended up doing the following: going back to the IO2 until separate (off line) tests can be run on a similarly configured machine with an IO3; leaving off some of the devices I wanted connected to our machine (there are no SCSI slots free); and attempting to minimize the cable lengths used in connecting all the devices. In past lives (e.g. VAX days), I have always found length limitations on buses to be very good source of spurious bad peripheral behavior, current technology is no exception. Supposedly SCSI bus length max is 6 meters -- adding up all the devices I have on my system currently, I seem to be WAY OVER spec. The IO2 seems to tolerate this more than the IO3. There may be other incompatibilities that I do not know about, which is why the investigation continues (especially since I want to reattatch all my devices and can not in the current IO2 configuration). INQUIRY and synchronous jumpers closed on the disks WILL NOT WORK. This too is being investigated as I very much want to increase the speed of all the Fuji's I have on the machine. If SG doesn't make some blanket announcement on configurational guidelines in this matter when all is "resolved", I will do so those interested (the volume of which seems to grow daily). R. Michael Townsend townsend@rainbow.uchicago.edu
tadguy@abcfd01.larc.nasa.gov (Tad Guy) (04/11/91)
olson@anchor.esd.sgi.com (Dave Olson) writes: > tadguy@abcfd01.larc.nasa.gov (Tad Guy) writes: > | Does anyone have experience with Fujitsu 2266SA disks in power series > | workstations? > | dks0d2s10: Unrecoverable device error: Internal controller error. > > this actually *IS* descriptive. ... a command failed with check > condition status, and the primary sense code was 4 (Unrecoverable > device error), with an additional sense code value of 0x44 (Internal > controller error). > > The driver is passing on to you as much info as it got from the drive. > It doesn't sound like this error is related to jumper settings, but it > is possible. Thanks... Shortly after I posted my message, I discovered a 2266 related posting in comp.periphs.scsi, that recommended turning off the read-ahead cache. I've done so and haven't had any errors yet (in over 24 hours). In a separate email message, I was told that this is a known Fuji firmware bug... I'm keeping my fingers crossed, but it looks good so far. Thanks for the responses. > Life would be so much easier if we could just look at the source code. Yes. ...tad