davy@riacs.edu (08/31/89)
Hi. We have a Sun 3/180 with two Xylogics 451 controllers and a Xylogics 753 controller. One of the 451's has a Fuji M2351 Eagle and two CDC 9720-500 disks on it. The other 451 has two CDC 9720-500s. The 753 has two Fuji M2382s on it. Everything worked fine when we first had this. Then, we had a spare 451 controller which we weren't sure was okay. We swapped it into the system, verfied it worked, and then swapped it back out, restoring the original system. Unfortunately, during this exercise, we broke the pin off one of the VME/Multibus adapters, and in the process of fixing all this, might have flipped one of the dip switches on the adapter or the controller. Now we are seeing these messages on the two Fuji M2382 drives on the 753 controller: xd1c: read retry (operation timeout) -- blk #64, abs blk #64 xd1c: read retry (operation timeout) -- blk #70512, abs blk #70512 xd0c: read restore (drive not on cylinder) -- blk #1400528, abs blk #1400528 xd0c: read restore (drive not on cylinder) -- blk #1540976, abs blk #1540976 xd1c: read restore (drive not on cylinder) -- blk #1400496, abs blk #1400496 xd1c: write restore (drive not on cylinder) -- blk #1400800, abs blk #1400800 xd1c: read retry (operation timeout) -- blk #70464, abs blk #70464 The messages seem to be more or less equally distributed between the two drives (seems to depend on how much the drive is being used at the time), and the block numbers vary a lot. In general things seem to be okay when only one drive is being used, but problems occur when both are in use. (This was determined by seeing the messages when both drives were fsck'd in the same pass, and not seeing the messages when each drive is fsck'd in a different pass.) Overall we get a lot of these messages, but we may go for half an hour without getting any and then suddenly get five or six. We have checked through all the drive manuals, controller manuals, Sun manuals, etc. I even got things looked up in the Sun field engineer's manual by a friend. As near as we can tell, everything is set up properly, all the dip switches are in the right places, all the controllers are in the same slots, etc. We tried swapping 451s again. Tried swapping 753s. Tried changing drive cables. Tried unplugging all the drives except the Eagle (root drive) and the M2382s. Tried pulling out one 451 controller. None of this seemd to make any difference. So, the questions: 1. Has anyone else seen this behavior? 2. More importantly, does anyone know how to fix it? Thanks in advance Dave Curry davy@riacs.edu {rutgers,ames}!riacs!davy