eichelbe@nadc.arpa (11/07/86)
I have a 4.1 BSD UNIX system on a VAX 11/780. Recently we installed a second UNIBUS and a UDA50 controller on that second UNIBUS, all by itself. We figured that this way the UDA50 would not kill the activity on the first UNIBUS. The UDA50 is connected to three RA81 disk drives. We are having a problem where we get a "uda0: disk transfer error" followed by a bunch of soft errors and sometimes a hard error. The RA81s have been reformated so it's not a bad block problem. When we get the error the access light on the particular drive being accessed goes out and the application that was doing the write to the RA81 disk drive bombs out. So there are no reads or writes being done by an application to the RA81 at this point. But a few seconds later the RA81 access light comes back on and we get about the same error sequence. Then the application is actually able to report its exit and no more errors are printed - the RA81 is not being accessed any longer. We are able to make a new file system ("mkfs"), mount the file system, create a lost+found directory, dismount the file system, run "fsck", and remount the file system with no problems. The above application that was mentioned just keeps filling a given RA81 with 100K-long files named x.0, x.1, x.2, etc. as a write test. This is nothing more strenuous than a big disk-to-disk tar or dd would be. The errors come up at seemingly random times and have occurred on all three disks. Starting over from scratch on any given RA81 does not seem to matter - the errors don't occur at the same time/place on the disk again. Sometimes they occur earlier, sometimes later. I have a single UNIBUS with a UDA50 and three RA81 disk drives on another VAX 11/780 and I can run the same application with not a single error. Both VAX computers use 4.1 BSD and have the same driver software. In fact, the VAX where everything works has about 5 times the number of users on it at any given time. All switches on the UDA50 boards on both VAXes are set exactly alike. QUESTION: Has anyone ever seen such a problem? If so, what did you do about it? Thank you. Jon Eichelberger eichelbe@NADC.ARPA