roy@umix.cc.umich.edu (Roy Richter) (01/16/89)
I also have a Sun 3/260. In September I started getting messages similar to yours; disk resequencer errors. At first, it was once a month; by the end of November, once a week; and in Mid-December, once a day. Finally I had a disk failure over Xmas. Sun installed a new disk, and no problem since. Roy Richter UUCP: {umix,edsews,mcf}!rphroy!roy Physics Dept, GM Research CSNet: rrichter@gmr.com Internet: roy%rphroy.uucp@umix.cc.umich.edu
dinah@shell.UUCP (Dinah Anderson) (01/16/89)
> xy0a: read retry (disk sequencer error) -- blk #495, abs blk #495 >Is this a serious disk problem that I should worry about? The system >seems to be working well. What is a disk sequencer error? Would diag >have to be used to fix this problem? We have received these errors on several different systems (several different controller/disk combinations.) Sometimes the disk stops working and sometimes we get a couple of the error messages and then things proceed normally. I would like to know what the errors mean and under what circumstances they occur. I would also like to know what we should do about them. Dinah Anderson Shell Oil Company, Information Center (713) 795-3287 ...!{sun,psuvax,soma,rice,ut-sally,ihnp4}!shell!dinah
msommer@watson.bbn.com (01/19/89)
> > xy0a: read retry (disk sequencer error) -- blk #495, abs blk #495 > > We have received these errors on several different systems (several > different controller/disk combinations.) Sometimes the disk stops working > and sometimes we get a couple of the error messages and then things > proceed normally. >... Dinah, We were besieged with a bunch of these errors a few months ago on our 3/160, with a 2351A Eagle disk, and a Xylogics 450 controller. Our problem was caused by a low voltage near the controller board's backplane slot. Luckily, our Sun field service rep had heard reports of such a problem (though he had never seen it before, himself), and knew exactly what to check. Once he found the problem, he just tweaked the voltage and, presto, the errors disappeared. If this is the cause of your (and others') problems, and you've called Sun about them without any success, it suggests there's a lack of communication among Sun f.s. reps. I should note our 3/160 was originally a 1/160 or 2/160. The CPU board and power supply are both Sun 3 models. We upgraded our power supply after suffering repeated crashes (several months after the CPU board was upgraded, when we tried to add a memory board). The backplane, however, has never been upgraded. My intuition tells me this may have something to do with the voltage problems. People should probably avoid upgrading their systems in such a piecemeal fashion. mark sommer msommer@bbn.com
rlk@think.com (Robert L. Krawitz) (01/19/89)
We had a lot of these once, and it indicated a bad spot on the disk (we also had hard errors at the same time, though). Reformatting the partition didn't help us; we had to move the affected partition to another disk. harvard >>>>>> | Robert Krawitz <rlk@think.com> 245 First St. bloom-beacon > |think!rlk (postmaster) Cambridge, MA 02142 topaz >>>>>>>> . Thinking Machines Corp. (617)876-1111
eap@bu-it.bu.edu (Eric A. Pearce) (01/25/89)
tomc@dftsrv.gsfc.nasa.gov (Tom Corsetti): >Recently, our Sun 3/260 crashed because of a power outage.... >Well, today, almost a >week later, I shutdown and rebooted, and got the message: > xy0a: read retry (disk sequencer error) -- blk #495, abs blk #495 >Is this a serious disk problem that I should worry about?... dinah@shell.UUCP (Dinah Anderson): >... >I would like to know what the errors mean and under what circumstances >they occur. I would also like to know what we should do about them. I looked up the error in my Xylogics 451 manual: "Disk Sequencer Error - The disk sequencer did not finish its operation within the allowed time. Several factors may cause this problem. - The 451 did not receive the servo clock signal from the the selected disk drive. Check the B cable; if the connection is good, try a different B cable port on the 451. - The 451 is not receiving any read data from the selected drive. Check the B cable. - The Multibus may be preventing the 451 from gaining proper access." The manual entry I quote from above suggests the problem could be with the cabling or the controller itself, but this has not been the case for us. A bad controller usually spews out large numbers of errors with random block numbers over more than one disk. A bad cable will produce random block errors on one drive (since it's unlikely that more than one cable would crap out at a time.) We had drive cable problems on some rack-mounted systems (3/180's and 3/280's). I believe they were caused by repeated flexing of the drive cables by the doors on the back of the cabinets. The older rack setups have several feet of cable that dangle out of the back of the cabinet and move every time you open the door. (The doors have since been removed - I have not seen any cooling problems so far). A bad disk usually will have errors that give sequential block numbers or at least repeat them numerous times. If you only get an occasional disk error, such as one a week, you might be safe to just map or slip the bad spots, but in my experience, any errors that occur with regularity are indicative of future trouble. If you have a Sun hardware contract, I would have them replace it as soon as possible. If they balk at replacing a drive with only a few errors, push them a bit. It *is* possible for systems to run for long periods without disk problems. I would do a full level 0 of the disk as soon as possible. If you act before a crisis, you can have a scheduled downtime for a drive replacement. You would do a level 0 dump and Sun would come in and replace it. This would make the restore much easier, as you would not have to worry about multi-level backups, not to mention the time you would save. I have seen this error on Fujitsu 2351's ("single" Eagle) and 2361's ("double" or "super" Eagle). It was always accompanied by a massive number of disk errors. Our local Sun field service will replace single Eagles as a whole but they replace only parts of double Eagles (in this case the HDA and the servo board). The "Eagle" series of drives seem to be rather sensitive to power fluctuations. the newer Hitachi DK815-10 and NEC D2363 seem to be more tolerant. -e Eric Pearce ARPANET eap@bu-it.bu.edu Boston University Information Technology CSNET eap%bu-it@bu-cs 111 Cummington Street JNET jnet%"ep@buenga" Boston MA 02215 UUCP !harvard!bu-cs!bu-it!eap 617-353-2780 voice 617-353-6260 fax BITNET ep@buenga