steve@dartvax.UUCP (Steve Campbell) (07/15/87)
We have a VAX 785 with all FCO's applied running 4.3BSD with all fixes applied. Its unibus currently has a UDA50 with 4 RA81s on it. We plan to add 2 more disks, requiring another UDA50. Although conventional wisdom says not to put more than 1 UDA50 per unibus, we are trying to do just that. We have added a second UDA50 to the bus and a third-party device called a USI/HRS from a company named Shitashi which claims to enhance the unibus bandwidth enough to permit the second uda. The other devices on the unibus are a DEUNA and 2 DZ11s. For testing purposes, we moved 2 RA81s from uda0 to uda1, so in terms of the config file we went from this... controller uda0 at uba0 csr 0172150 vector udintr disk ra0 at uda0 drive 0 disk ra1 at uda0 drive 1 disk ra2 at uda0 drive 2 disk ra3 at uda0 drive 3 ...to this... controller uda0 at uba0 csr 0172150 vector udintr disk ra0 at uda0 drive 0 disk ra1 at uda0 drive 1 controller uda1 at uba0 csr 0172550 vector udintr disk ra2 at uda1 drive 2 disk ra3 at uda1 drive 3 As far as we can tell, the hardware is working just fine. All devices interrupt at boottime, and all four disks are accessable AS RAW DEVICES. We can fsck them all in parallel, mount them, and dd from the raw devices. But - you knew there was a "but" - there is a problem. Even in single user mode, if we do a large number of accesses to files on any disk USING PATHNAMES, then do a sync, the 2 disks on the second uda cannot be accessed, and the command - and terminal - trying to do so hangs completely. For example just doing an ls -lR of a smallish file system on ra1 (1000 files), output to /dev/null, then sync, then an ls of anything on ra2 or ra3, and the terminal (console) hangs, and we have to reboot. A comparable find(1) will do the trick, too. The sync is important; without it we can still access the disks, but after it we're dead. On the other hand, the sync alone, ie without the preceding ls or find, causes no problem. Forcing a core dump of the hung system shows the hung command to be in what ps(1) calls "D" state, sleeping on runout in the scheduler. The kernel "u" structure appears to be empty - as if there were no current process. Needless to say, the same operation causes no problem when all four disks are on uda0. I would suspect hardware if (a) we hadn't swapped everything in sight, including the 2 UDA50's and removed the HSR, and (b) things didn't work perfectly as long as we use the raw devices. I would appreciate and comments or suggestions from the net. Steve Campbell steve@Dartmouth.EDU
lad@eplrx7.UUCP (Lawrence Dziegielewski) (07/17/87)
In article <6683@dartvax.UUCP>, steve@dartvax.UUCP (Steve Campbell) writes: > > For testing purposes, we moved 2 RA81s from uda0 to uda1, so in terms of the > config file we went from this... > > controller uda0 at uba0 csr 0172150 vector udintr > disk ra0 at uda0 drive 0 > disk ra1 at uda0 drive 1 > disk ra2 at uda0 drive 2 > disk ra3 at uda0 drive 3 > > ...to this... > > controller uda0 at uba0 csr 0172150 vector udintr > disk ra0 at uda0 drive 0 > disk ra1 at uda0 drive 1 > controller uda1 at uba0 csr 0172550 vector udintr > disk ra2 at uda1 drive 2 > disk ra3 at uda1 drive 3 > I have 2 uda's running on several MicroVaxes, and they all run fine. I suspect that it's you config that may be wrong. Each uda can support 4 devices, and you have to (or should) tell config about them. So, your config should look like this: controller uda0 at uba0 csr 0172150 vector udintr disk ra0 at uda0 drive 0 disk ra1 at uda0 drive 1 disk ra2 at uda0 drive 2 disk ra3 at uda0 drive 3 controller uda1 at uba0 csr 0172550(or 0160334) vector udintr disk ra4 at uda1 drive 0 disk ra5 at uda1 drive 1 disk ra6 at uda1 drive 2 disk ra7 at uda1 drive 3 In your configuration, you're asking unix to find ra2 on uda1 drive 2. This is not logically possible. uda0 supports ra0, 1 ,2 and 3, and the next uda device will support ra4, 5, 6 and 7. That is what works for me. Also, logical drive ra4 must be at physical drive 0 on the 2nd uda(not 2). Now I'll admit I don't have this up on a 785, but it does work for the 3 MicarVaxes I run. And I also use the secondary uda address of 0160334 in the MVaxes floating address space. You may want to check on the uda secondary address on a 785, but I don't know why it'd be different. I suggest you try the above configuration. You can even call me if you get stuck, I have done this so many times I think I could do it in my sleep.
lad@eplrx7.UUCP (Lawrence Dziegielewski) (07/17/87)
Sorry, my .signature didn't get appended to the last posting, so here it is. Lawrence A. Dziegielewski E.I. DuPont Co. (302) 695-1311 Engineering Physics Lab ...dgis!eplrx7!lad Wilmington, 7I p Line
pdb@sei.cmu.edu (Patrick Barron) (07/18/87)
In article <441@eplrx7.UUCP> lad@eplrx7.UUCP (Lawrence Dziegielewski) writes: >I have 2 uda's running on several MicroVaxes, and they all run fine. I If you have a MicroVAX, then you *don't* have a UDA-50, which is a UNIBUS device. The controller used on the Q-Bus is the KDA-50. >suspect that it's you config that may be wrong. Each uda can support 4 >devices, and you have to (or should) tell config about them. So, your >config should look like this: > [config deleted] >In your configuration, you're asking unix to find ra2 on uda1 drive 2. >This is not logically possible. uda0 supports ra0, 1 ,2 and 3, and the >next uda device will support ra4, 5, 6 and 7. That is what works for me. >Also, logical drive ra4 must be at physical drive 0 on the 2nd uda(not 2). >Now I'll admit I don't have this up on a 785, but it does work for the 3 >MicarVaxes I run. And I also use the secondary uda address of 0160334 in >the MVaxes floating address space. You may want to check on the uda >secondary address on a 785, but I don't know why it'd be different. It doesn't matter what you call the ra* devices, as far as I know. If you really wanted to do something silly, you could put ra0, ra2, ra4, and ra6 on uda0, and ra1, ra3, ra5, and ra7 on uda1. Also, if you *knew* you weren't ever going to use more than (for instance) two drives on each controller, you could put ra0 and ra1 on uda0, and ra2 and ra3 on uda1 (even though there is no really good reason to actually do something like this, except for the minimal savings in the size of the kernel). As far as the problem at hand goes: the reason I'd heard that you shouldn't put more than one UDA-50 on a single UNIBUS is that is chews up a *lot* of bus bandwidth. The logical consequence of two UDA's should be degraded performance, right? I'd never heard of having the system hang because of it. One last consideration: do you actually have enough backplane power to run two UDA-50's along with whatever else you have? I know that marginal power can hang systems up or crash them (the DEUNA used to do this all the time). --Pat.
chris@mimsy.UUCP (Chris Torek) (07/18/87)
>In article <6683@dartvax.UUCP> steve@dartvax.UUCP (Steve Campbell) writes: >>controller uda0 at uba0 csr 0172150 vector udintr >>disk ra0 at uda0 drive 0 >>disk ra1 at uda0 drive 1 >>controller uda1 at uba0 csr 0172550 vector udintr >>disk ra2 at uda1 drive 2 >>disk ra3 at uda1 drive 3 In article <441@eplrx7.UUCP> lad@eplrx7.UUCP (Lawrence Dziegielewski) writes: >... I suspect that it's you config that may be wrong. Nope. >Each uda can support 4 devices, This is true. There are only four places to attach drives to the controller. To make the quoted statement comprehensive, add the words `up to' between `support' and `4'. >and you have to (or should) tell config about them. Nay, not so. >In your configuration, you're asking unix to find ra2 on uda1 drive 2. >This is not logically possible. It most certainly is. The requirements are that no Unix-name (raN) can map to the same drive, or in MSCP parlance, unit[*], number on the same uda50 controller. ra0 can be unit 2 on uda1, ra1 unit 7 on uda3, ra2 unit 0 on uda0, and so forth. ----- [*This is a rather unfortunate term, as Unix uses `unit' to mean the number after the word `ra'. E.g., `ra1' is Unix unit 1, though it may be MSCP unit 7: `ra1 at uda3 drive 7'.] >... I also use the secondary uda address of 0160334 in the MVaxes >floating address space. The UDA50A's csr address is set by switches on one of the two boards. If your configuration matches your switches, you are in good shape. Even if not, some fancy footwork in autoconf can sometimes save the day. The `standard' set of UDA50 addresses is 0772150, 0772550, and 0777550 (0772150 is the same as 0172150 due to the funny Unibus mapping). What makes Steve's problem particularly perplexing is that everything works at least a little bit. The machine finds the controllers and drives, and can talk to them a bit, e.g., with raw I/O. Raw transfers do not really work the I/O system very hard, though, so I suspect some sort of hardware glitch with `simultaneous' transfers. (My first suggestion, of course, was to try my driver....) Incidentally, for those running the driver I posted in April, I may soon be posting some patches. In particular, the code should now work on Microvax IIs, although without a small patch to ubainit() the crash dump code will continue to fail just like the 4.3 driver. There is a bug fix relating to disk profiling (dk_busy is cleared too soon) and another dealing with Unibus resets (I am not sure of the presence of the bug, but I had to rewrite that section of code anyway for a KDB50 driver). I have no Microvax handy for testing as yet, and I have other things I must do first, so just consider this a teaser. :-) -- In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 7690) Domain: chris@mimsy.umd.edu Path: seismo!mimsy!chris
ed@mtxinu.UUCP (Ed Gould) (07/20/87)
>> For testing purposes, we moved 2 RA81s from uda0 to uda1, so in terms of the >> config file we went from this... >> controller uda0 at uba0 csr 0172150 vector udintr >> disk ra0 at uda0 drive 0 >> disk ra1 at uda0 drive 1 >> disk ra2 at uda0 drive 2 >> disk ra3 at uda0 drive 3 >> ...to this... >> controller uda0 at uba0 csr 0172150 vector udintr >> disk ra0 at uda0 drive 0 >> disk ra1 at uda0 drive 1 >> controller uda1 at uba0 csr 0172550 vector udintr >> disk ra2 at uda1 drive 2 >> disk ra3 at uda1 drive 3 >I have 2 uda's running on several MicroVaxes, and they all run fine. I >suspect that it's you config that may be wrong. Each uda can support 4 >devices, and you have to (or should) tell config about them. So, your >config should look like this: >controller uda0 at uba0 csr 0172150 vector udintr >disk ra0 at uda0 drive 0 >disk ra1 at uda0 drive 1 >disk ra2 at uda0 drive 2 >disk ra3 at uda0 drive 3 >controller uda1 at uba0 csr 0172550(or 0160334) vector udintr >disk ra4 at uda1 drive 0 >disk ra5 at uda1 drive 1 >disk ra6 at uda1 drive 2 >disk ra7 at uda1 drive 3 >In your configuration, you're asking unix to find ra2 on uda1 drive 2. >This is not logically possible. Close, but not quite. The correct config would be controller uda0 at uba0 csr 0172150 vector udintr disk ra0 at uda0 drive 0 disk ra1 at uda0 drive 1 controller uda1 at uba0 csr 0172550 vector udintr disk ra2 at uda1 drive 0 disk ra3 at uda1 drive 1 The config file defines a mapping between logical names (ra0) and physical names (uda0, drive 0). This mapping is essentially arbitrary. In order for another mapping (names in /dev to device numbers) to remain untouched by this experiment, it is important to maintain the names ra2 and ra3 for the two drives that were moved. Otherwise, the entries in /dev would need to be changed. There is no need to describe devices that do not exist, nor is there a need to reserve names for them. (Sometines, though, it is convenient to do so.) The entire (new) mapping may be summarized as follows. Note that there are eight minor devices for each physical disk. /dev/ra0? => (9, ( 0- 7)) == ra0 => uda0 drive 0 /dev/ra1? => (9, ( 8-15)) == ra1 => uda0 drive 1 /dev/ra2? => (9, (16-23)) == ra2 => uda1 drive 0 /dev/ra3? => (9, (24-31)) == ra3 => uda1 drive 1 | | | | ------------ | | | | physical name | | | --------- logical name | | ------------------- minor device number | ------------------------- major device number --------------------------------------- name in file system -- Ed Gould mt Xinu, 2560 Ninth St., Berkeley, CA 94710 USA {ucbvax,decvax}!mtxinu!ed +1 415 644 0146 "A man of quality is not threatened by a woman of equality."
lad@eplrx7.UUCP (Lawrence Dziegielewski) (07/20/87)
In article <1929@aw.sei.cmu.edu>, pdb@sei.cmu.edu (Patrick Barron) writes: > In article <441@eplrx7.UUCP> lad@eplrx7.UUCP (Lawrence Dziegielewski) writes: > >I have 2 uda's running on several MicroVaxes, and they all run fine. I > > If you have a MicroVAX, then you *don't* have a UDA-50, which is a UNIBUS > device. The controller used on the Q-Bus is the KDA-50. We are using MSCP controllers that look like uda-50's to our unix. In the config I call 'em uda0 and uda1, not kda (never even heard of a kda...). His config file still could be wrong. I know enough about the subject to know that it may be possible. > It doesn't matter what you call the ra* devices, as far as I know. If you > really wanted to do something silly, you could put ra0, ra2, ra4, and ra6 > It may matter to his flavor of unix. Mine (mt Xinu 4.3) expects the config just as I originally posted it. I hope this fellow posts the fix as soon as he gets it, I'd be interested in finding out what the problem was. Larry D.
steve@dartvax.UUCP (Steve Campbell) (07/21/87)
In article <6683@dartvax.UUCP> I wrote: >Although conventional wisdom says not to put more than 1 UDA50 per >unibus, we are trying to do just that. We have added a second UDA50 to >the bus and a third-party device called a USI/HRS from a company named >Shitashi which claims to enhance the unibus bandwidth enough to permit >the second uda. The other devices on the unibus are a DEUNA and 2 DZ11s. > >For testing purposes, we moved 2 RA81s from uda0 to uda1, so we have... > >controller uda0 at uba0 csr 0172150 vector udintr >disk ra0 at uda0 drive 0 >disk ra1 at uda0 drive 1 >controller uda1 at uba0 csr 0172550 vector udintr >disk ra2 at uda1 drive 2 >disk ra3 at uda1 drive 3 > >As far as we can tell, the hardware is working just fine. >But ... if we do a large number of accesses to files on any disk >USING PATHNAMES, then do a sync, the 2 disks on the second uda cannot >be accessed, and the command - and terminal - trying to do so hangs >completely. Several people replied with suggestions about the hardware, including adjusting the delay jumper on the UDA50s, swapping the backplane position of the 2 UDA's, and changing the value of UDABURST in the driver. None of these experiments made any difference; the system still hangs as described. I am therefore reasonably confident that the problem is not in the hardware. Further experimenting (always in single-user mode) has turned up the following evidence that perhaps someone with more knowledge of the kernel than I have might be able to use. The following sequence ALWAYS hangs: [reboot]; mount -a; find ...; sync; ls ... The find searches about 1000 files for a non-existant file name, so it just chases around the file system. The ls is of a directory on a disk on the second UDA50, and it's this ls that hangs. BUT, the hardware is NOT hung; a dd of the raw disk done instead of the ls works fine. Moreover, an extra sync done after the mount will postpone the hang, ie the ls shown will not hang, but later one will. A umount/mount sequence will also postpone the problem for a few minutes only. So what's going on? Is the namei cache perhaps involved? Any suggestions or pointers toward further tests would be welcome. Steve Campbell
rbj@icst-cmr.arpa (Root Boy Jim) (07/31/87)
Close, but not quite. The correct config would be controller uda0 at uba0 csr 0172150 vector udintr disk ra0 at uda0 drive 0 disk ra1 at uda0 drive 1 controller uda1 at uba0 csr 0172550 vector udintr disk ra2 at uda1 drive 0 disk ra3 at uda1 drive 1 The config file defines a mapping between logical names (ra0) and physical names (uda0, drive 0). This mapping is essentially arbitrary. In order for another mapping (names in /dev to device numbers) to remain untouched by this experiment, it is important to maintain the names ra2 and ra3 for the two drives that were moved. Otherwise, the entries in /dev would need to be changed. There is no need to describe devices that do not exist, nor is there a need to reserve names for them. (Sometines, though, it is convenient to do so.) I will preface this remark by saying that I don't know RA's and UDA's from Proteon ring nets, but I do know something about disk drives. In my config file, we have the following: controller mba0 at nexus ? disk hp0 at mba0 drive 0 disk hp1 at mba0 drive 1 controller mba1 at nexus ? disk hp2 at mba1 drive 2 disk hp3 at mba1 drive 3 I see no reason why one must start numbering drives at zero. Whether or not this is true of UDA/RA devices I don't know. It is clear that the way he did it involves the least amount of switch diddling or unit number plug swapping. As to why it dies when you `sync' it, a possible explanation is that you are forcing I/O to *all* drives at once, something that may freak it out. -- Ed Gould mt Xinu, 2560 Ninth St., Berkeley, CA 94710 USA {ucbvax,decvax}!mtxinu!ed +1 415 644 0146 "A man of quality is not threatened by a woman of equality." A man of exclusive or is not threatened by a woman of greater than or equal to. (Root Boy) Jim Cottrell <rbj@icst-cmr.arpa> National Bureau of Standards Flamer's Hotline: (301) 975-5688 You mean you don't want to watch WRESTLING from ATLANTA? P.S. How about the SUPERBOWL from DALLAS?