steve@dartvax.UUCP (Steve Campbell) (08/10/87)
In article <6683@dartvax.UUCP> I wrote: >Although conventional wisdom says not to put more than 1 UDA50 per >unibus, we are trying to do just that. We have added a second UDA50 to >the bus and a third-party device called a USI/HRS from a company named >Shitashi which claims to enhance the unibus bandwidth enough to permit >the second uda. The other devices on the unibus are a DEUNA and 2 DZ11s. > >For testing purposes, we moved 2 RA81s from uda0 to uda1, so we have... > >controller uda0 at uba0 csr 0172150 vector udintr >disk ra0 at uda0 drive 0 >disk ra1 at uda0 drive 1 >controller uda1 at uba0 csr 0172550 vector udintr >disk ra2 at uda1 drive 2 >disk ra3 at uda1 drive 3 > >As far as we can tell, the hardware is working just fine. >But ... if we do a large number of accesses to files on any disk >USING PATHNAMES, then do a sync, the 2 disks on the second uda cannot >be accessed, and the command - and terminal - trying to do so hangs >completely. Several people suggested that the configuration specification was the problem; others said no, the config is OK. Ed Gould posted a nice mini- dissertation on how configuration names are mapped. Conclusion: the configuration is OK. Other people suggested adjusting the time delay jumper on the UDA50. [BTW, beware of a typo in the DEC UDA50 Users Manual table that tells how to set that jumper. The pins are mislabeled.] Someone else suggested changing UDABURST in the driver. Conclusion: these adjustments no doubt affect performance, but they were not the cause of my problem. Scott Bradner (harvisr!sob) pointed me in the right direction: > the 4.3 uda driver has a bug that causes the drives on a 2nd controller > to appear to go off line under load, any processes that are accessing those > drives will hang forever. Jean Huens (kulcs!jean) got closer: > I got similar problems on a microvax. We have there (on a Q-bus) an > RQDX (+- uda compatible : same driver) from DEC and a second RQDX > compatible controller (Sigma) with an Fujitsu Eagle. Ocassionaly > processes got hung waiting for the fujitsu. The problem was that the > controller was idle (without outstanding commands) But there were still > request from Unix waiting (looks like interrupt lost or race > condition). I looked in the uda driver from Ultrix 1.2 and saw they > start there a timer which calls the udastart routine regularly. (once a > minute) This cured the problem with the disk. Jean sent me that modified driver. I installed it and ran my standard test that would hang the system. It hung as always... but as soon as the timer that Jean mentions went off, the hung command completed normally. It was spooky, as though there was a little gremlin in there that got poked every minute or so and un-jammed things. Now this jerky operation of the system was not good enough for production work, but it seems to clinch what was causing the problem. I would like to hear an explanation from someone who knows the hardware well. Which leads me to... The final solution to the problem. In one posting Chris Torek wrote: > What makes Steve's problem particularly perplexing is that everything > works at least a little bit. The machine finds the controllers > and drives, and can talk to them a bit, e.g., with raw I/O. Raw > transfers do not really work the I/O system very hard, though, so > I suspect some sort of hardware glitch with `simultaneous' transfers. > > (My first suggestion, of course, was to try my driver....) Well, I hate a smart aleck, especially one who turns out to be right. I tried Chris's driver, and it solves the problem. The new configuration works as well as the old. Very nice work, Chris. Chris's driver prints some identification information at boottime, including the following from my machine: Aug 9 12:34:56 libdev vmunix: uda0: version 5 model 6 ... Aug 9 12:34:56 libdev vmunix: uda1: version 4 model 6 Is that different version number significant to this problem? In conclusion, thanks to all for the help, and especially to Chris Torek for the new driver that doesn't have the bug. And how about a fix from Berkeley for their standard driver? Steve Campbell Dartmouth College
chris@mimsy.UUCP (Chris Torek) (08/11/87)
In article <6842@dartvax.UUCP> steve@dartvax.UUCP (Steve Campbell) writes: >Scott Bradner (harvisr!sob) pointed me in the right direction: >>the 4.3 uda driver has a bug that causes the drives on a 2nd controller >>to appear to go off line under load, any processes that are accessing those >>drives will hang forever. I know nothing of this bug. The 4.3BSD driver does have a `feature' which irritates a microcode bug in some UDA50s, causing the controller itself to hang. This is rare, and current UDA50s do not exhibit the bug at all unless you have a 785 or 8600. Controller hangs are distinguished by the light patterns on one of the two modules in the Unibus box: one of the LEDs stops blinking. There is another bug in 8600s that loses UDA50 interrupts under heavy interrupt load (we get it while using the 4.3BSD rdump on Sun 3s). I do not understand the details, but my driver recovers eventually (at least if you have all your UDA50s on the same Unibus!---there is a bug in the reset code in mscp.c). >Jean Huens (kulcs!jean) got closer: >>... I looked in the uda driver from Ultrix 1.2 and saw they >>start there a timer which calls the udastart routine regularly. Ugh! >In one posting Chris Torek wrote: >>(My first suggestion, of course, was to try my driver....) >Well, I hate a smart aleck, especially one who turns out to be right. Shall I make a point of being wrong on occasion? :-) >Chris's driver prints some identification information at boottime, >including the following from my machine: > Aug 9 12:34:56 libdev vmunix: uda0: version 5 model 6 > ... > Aug 9 12:34:56 libdev vmunix: uda1: version 4 model 6 >Is that different version number significant to this problem? I am not sure what is different between versions 4 and 5; version 3 still exhibits the Get Unit Status hang bug on 780s. We still have some version 4 controllers here, and they work fine. Of course I *am* using my driver.... -- In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 7690) Domain: chris@mimsy.umd.edu Path: seismo!mimsy!chris