pes@mitre-bedford.ARPA (04/19/85)
I understand that hard disk errors with DEC UDA-50 controllers and RA60/RA80 drives have been a problem for many VAX UNIX installations. There was an article posted recently to UNIX-INFO by Ed Merrill and Andy Linton which addressed bad block replacement and DEC standard 144 and 166 disk media. However, it did not provide me with a solution to my immediate problem. I'm running 4.2BSD on a VAX11/780 with RA60 disk drives. Occasionally while executing tasks which have a lot of disk activity, I get the following errors at the console: uda0: hard error, disk transfer error, unit 0, grp 0x0, hdr 0x26aec uda0: hard error, SDI error, unit 0, event 0353, hdr 0x0 The second error repeats about 6 times. What do these errors mean? I'm not an expert on device drivers, but I've got some local opinions that the driver is to blame. Has anyone had similar problems or know of a cause? Or better yet a fix? Thanks. Paul Silvey pes@mitre-bedford.arpa
mmason@psu-cs.UUCP (Mark C. Mason) (04/22/85)
> > uda0: hard error, disk transfer error, unit 0, grp 0x0, hdr 0x26aec > uda0: hard error, SDI error, unit 0, event 0353, hdr 0x0 > > The second error repeats about 6 times. > What do these errors mean? I'm not an expert on device drivers, > anyone had similar problems or know of a cause? Or better yet a fix? Ever since our DEC service rep started replacing the spindle brushes on our 3 ra81s on a regular basis, we have had this problem all but disappear. When the error messages start cropping up again, usually after about 3 mo., we check the brushes and usually find one or two that need replacement. You might also persuade your rep to check the hardware revs on your disks; new ones seem to come out about twice a year. Mark
zemon@fritz.UUCP (Art Zemon) (04/24/85)
In article <> pes@mitre-bedford.ARPA writes: > I'm running 4.2BSD on a VAX11/780 with RA60 disk drives. >Occasionally while executing tasks which have a lot of disk activity, I >get the following errors at the console: > > uda0: hard error, disk transfer error, unit 0, grp 0x0, hdr 0x26aec > uda0: hard error, SDI error, unit 0, event 0353, hdr 0x0 I am having the same problem with an RA81 on an 11/750. Any advice counterindicative of reformating the disk would be greatly appreciated. Phone me if necessary. I'm currently planning to reformat the entire disk on May 5. Thanks, -- -- Art Zemon FileNet Corp. ...! {decvax, ihnp4, ucbvax} !trwrb!felix!zemon
shimell@stc.UUCP (Dave Shimell) (04/26/85)
In article <10072@brl-tgr.ARPA> pes@mitre-bedford.ARPA writes: > > I understand that hard disk errors with DEC UDA-50 controllers >and RA60/RA80 drives have been a problem for many VAX UNIX installations. >............................................................. Has >anyone had similar problems or know of a cause? Or better yet a fix? >Thanks. > >Paul Silvey >pes@mitre-bedford.arpa We run binary Ultrix on our 785 and 750's. Since the beginning of the year we experienced bad block problems on one of our ra81's. HDA replacements were tried 3 times by DEC, each seeming to cure the problem until a month or so later. Then we would get a crash with a bad block in the inode area. This is particularly painful as fsck can't fix this problem. In the end DEC did two things: 1. Modified the strapping on each of our HDA'a. (The straps set in a certain way cause less uncorrectable ECC errors - contact DEC Field Service.) 2. Delivered /rabads - this is a standalone program from release 1.1 Ultrix. Rabads can be used to inspect and patch the bad block table. Once patched, the UDA50 ensures that the disk appears contiguous to the operating system. Clearly, rabads could be used on any O/S providing it can be loaded into memory (it's a standalone program). Now the bad news - I'm not sure whether DEC would supply you with rabads unless you run Ultrix. However, if you think you have a bad block and your Field Service Engineer can't get the hardware to work reliably, it might be in both your interests for DEC to supply rabads. Unfortunately, since rabads is DEC proprietary software, I am unable to send it to anyone. These two mods seem (fingers crossed) to have solved our problems. Yes, we have had crashes but they do not follow the patern experienced previously. Regards, Dave Shimell. shimell@stc.UUCP {root44, ukc, idec, stl, creed, stc-[bcdf]}!stc-a!shimell -- Regards, Dave Shimell. shimell@stc-a.UUCP {root44, ukc, idec, stl, creed, stc-[bcdf]}!stc-a!shimell
jsdy@hadron.UUCP (Joseph S. D. Yao) (05/07/85)
> uda0: hard error, disk transfer error, unit 0, grp 0x0, hdr 0x26aec > uda0: hard error, SDI error, unit 0, event 0353, hdr 0x0 > The second error repeats about 6 times. > What do these errors mean? ... While working on a System V driver, this happened a lot. The field service person had just declared the hardware to be perfect, so I also assumed it was a problem in a not-yet-quite-bugless driver. I worked for weeks (off and on) to make the driver more and more perfect. The problem was hardware. One of the boards in the drive itself (the interface to the outside world, i think) had to be replaced. For a day we ran on the opposite side of the dual access. We found this all out when it got so bad we went back to DEC. Our regular field service person then came out to run diagnostics. She found the problem immediately. (*sigh*) Try swapping the drive cables in back from "A" to "B", and spinning up the drive with the "B" button pushed in, rather than the "A". See if that makes a difference. Then get your field service to run lots and lots of diagnasties (on A) to show your advisors. Joe Yao hadron!jsdy@seismo.{ARPA,UUCP}
ron@ron1.UUCP (Ron Saad) (06/26/85)
>> uda0: hard error, disk transfer error, unit 0, grp 0x0, hdr 0x26aec >> uda0: hard error, SDI error, unit 0, event 0353, hdr 0x0 >> The second error repeats about 6 times. >> What do these errors mean? ... > > While working on a System V driver, this happened a lot. The field > service person had just declared the hardware to be perfect, so I also > assumed it was a problem in a not-yet-quite-bugless driver. I worked > for weeks (off and on) to make the driver more and more perfect. > > The problem was hardware. One of the boards in the drive itself (the > interface to the outside world, i think) had to be replaced. For a day > we ran on the opposite side of the dual access. We found this all out > when it got so bad we went back to DEC. Our regular field service > person then came out to run diagnostics. She found the problem > immediately. (*sigh*) > > Try swapping the drive cables in back from "A" to "B", and spinning > up the drive with the "B" button pushed in, rather than the "A". > See if that makes a difference. Then get your field service to run > lots and lots of diagnasties (on A) to show your advisors. > > Joe Yao hadron!jsdy@seismo.{ARPA,UUCP} *** REPLACE THIS LINE WITH YOUR LUNCH *** We are rather new on the net, so I missed the original article. I tried to contact the author of the 'solution' several times via mail, but never got a response, so I assume my message never got there. We have been having the same problem with our UDA50-RA81-RA60 system. We are running a VAX 11/780 with 4.2 BSD. Since the problem occurred on a Sys-V machine also, I assume it's not the drivers. Our service people have replaced every board on the system (all the RA60 boards, the uda50, the personality module on the RA81), all with no success - the problem keeps occurring. It wouldn't be so bad if it just crashed the system, but sometimes UNIX does not recover, and just hangs there till I come in and force a reboot. If anyone out there in wizard-land can give us more information, it would be greatly appreciated (the service people are now going to replace the power supply boards ... :-) If the person who posted the problem originally has succeeded in solving the problem - PLEASE tell us how ? -- ------------the opinions expressed above etc. etc. -------------- Ron Saad (4Z4UY) Sys Adm - Center for Advanced Technology in Telecommunications Polytechnic Institute of New York UUCP: ...{ihnp4,seismo}!{philabs,cmcl2}!ron1!ron MAIL: 333 Jay St. Brooklyn, N.Y. 11201 PHONE: (718) 643-7303