[comp.bugs.4bsd] Cant access disks on second UDA50

steve@dartvax.UUCP (Steve Campbell) (07/15/87)

We have a VAX 785 with all FCO's applied running 4.3BSD with all fixes applied.
Its unibus currently has a UDA50 with 4 RA81s on it.  We plan to add 2 more
disks, requiring another UDA50.

Although conventional wisdom says not to put more than 1 UDA50 per
unibus, we are trying to do just that.  We have added a second UDA50 to
the bus and a third-party device called a USI/HRS from a company named
Shitashi which claims to enhance the unibus bandwidth enough to permit
the second uda.  The other devices on the unibus are a DEUNA and 2 DZ11s.

For testing purposes, we moved 2 RA81s from uda0 to uda1, so in terms of the
config file we went from this...

controller	uda0	at uba0 csr 0172150		vector udintr
disk		ra0	at uda0 drive 0
disk		ra1	at uda0 drive 1
disk		ra2	at uda0 drive 2
disk		ra3	at uda0 drive 3

...to this...

controller	uda0	at uba0 csr 0172150		vector udintr
disk		ra0	at uda0 drive 0
disk		ra1	at uda0 drive 1
controller	uda1	at uba0 csr 0172550		vector udintr
disk		ra2	at uda1 drive 2
disk		ra3	at uda1 drive 3

As far as we can tell, the hardware is working just fine.  All devices
interrupt at boottime, and all four disks are accessable AS RAW DEVICES.
We can fsck them all in parallel, mount them, and dd from the raw devices.

But - you knew there was a "but" - there is a problem.  Even in single
user mode, if we do a large number of accesses to files on any disk
USING PATHNAMES, then do a sync, the 2 disks on the second uda cannot
be accessed, and the command - and terminal - trying to do so hangs
completely.  For example just doing an ls -lR of a smallish file system
on ra1 (1000 files), output to /dev/null, then sync, then an ls of
anything on ra2 or ra3, and the terminal (console) hangs, and we have
to reboot.  A comparable find(1) will do the trick, too. 

The sync is important; without it we can still access the disks, but
after it we're dead.  On the other hand, the sync alone, ie without the
preceding ls or find, causes no problem.  Forcing a core dump of the
hung system shows the hung command to be in what ps(1) calls "D" state,
sleeping on runout in the scheduler.  The kernel "u" structure appears
to be empty - as if there were no current process.  Needless to say,
the same operation causes no problem when all four disks are on uda0.

I would suspect hardware if (a) we hadn't swapped everything in sight,
including the 2 UDA50's and removed the HSR, and (b) things didn't work
perfectly as long as we use the raw devices.

I would appreciate and comments or suggestions from the net.

						Steve Campbell
						steve@Dartmouth.EDU

lad@eplrx7.UUCP (Lawrence Dziegielewski) (07/17/87)

In article <6683@dartvax.UUCP>, steve@dartvax.UUCP (Steve Campbell) writes:
> 
> For testing purposes, we moved 2 RA81s from uda0 to uda1, so in terms of the
> config file we went from this...
> 
> controller	uda0	at uba0 csr 0172150		vector udintr
> disk		ra0	at uda0 drive 0
> disk		ra1	at uda0 drive 1
> disk		ra2	at uda0 drive 2
> disk		ra3	at uda0 drive 3
> 
> ...to this...
> 
> controller	uda0	at uba0 csr 0172150		vector udintr
> disk		ra0	at uda0 drive 0
> disk		ra1	at uda0 drive 1
> controller	uda1	at uba0 csr 0172550		vector udintr
> disk		ra2	at uda1 drive 2
> disk		ra3	at uda1 drive 3
> 
I have 2 uda's running on several MicroVaxes,  and they all run fine.  I
suspect that it's you config that may be wrong.  Each uda can support 4
devices,  and you have to (or should) tell config about them.  So,  your
config should look like this:

controller	uda0	at uba0 csr 0172150		vector udintr
disk		ra0	at uda0 drive 0
disk		ra1	at uda0 drive 1
disk		ra2	at uda0 drive 2
disk		ra3	at uda0 drive 3
controller	uda1	at uba0 csr 0172550(or 0160334)	vector udintr
disk		ra4	at uda1 drive 0
disk		ra5	at uda1 drive 1
disk		ra6	at uda1 drive 2
disk		ra7	at uda1 drive 3

In your configuration,  you're asking unix to find ra2 on uda1 drive 2.
This is not logically possible.  uda0 supports ra0, 1 ,2 and 3,  and the
next uda device will support ra4, 5, 6 and 7.  That is what works for me.
Also,  logical drive ra4 must be at physical drive 0 on the 2nd uda(not 2).
Now I'll admit I don't have this up on a 785,  but it does work for the 3
MicarVaxes I run.  And I also use the secondary uda address of 0160334 in
the MVaxes floating address space.  You may want to check on the uda
secondary address on a 785,  but I don't know why it'd be different.

I suggest you try the above configuration.  You can even call me if you get
stuck,  I have done this so many times I think I could do it in my sleep.

lad@eplrx7.UUCP (Lawrence Dziegielewski) (07/17/87)

Sorry,  my .signature didn't get appended to the last posting,  so here it is.


Lawrence A. Dziegielewski			E.I. DuPont Co.
(302) 695-1311					Engineering Physics Lab
...dgis!eplrx7!lad				Wilmington,  7I p
Line

pdb@sei.cmu.edu (Patrick Barron) (07/18/87)

In article <441@eplrx7.UUCP> lad@eplrx7.UUCP (Lawrence Dziegielewski) writes:
>I have 2 uda's running on several MicroVaxes,  and they all run fine.  I

If you have a MicroVAX, then you *don't* have a UDA-50, which is a UNIBUS
device.  The controller used on the Q-Bus is the KDA-50.

>suspect that it's you config that may be wrong.  Each uda can support 4
>devices,  and you have to (or should) tell config about them.  So,  your
>config should look like this:
> [config deleted]
>In your configuration,  you're asking unix to find ra2 on uda1 drive 2.
>This is not logically possible.  uda0 supports ra0, 1 ,2 and 3,  and the
>next uda device will support ra4, 5, 6 and 7.  That is what works for me.
>Also,  logical drive ra4 must be at physical drive 0 on the 2nd uda(not 2).
>Now I'll admit I don't have this up on a 785,  but it does work for the 3
>MicarVaxes I run.  And I also use the secondary uda address of 0160334 in
>the MVaxes floating address space.  You may want to check on the uda
>secondary address on a 785,  but I don't know why it'd be different.

It doesn't matter what you call the ra* devices, as far as I know.  If you
really wanted to do something silly, you could put ra0, ra2, ra4, and ra6
on uda0, and ra1, ra3, ra5, and ra7 on uda1.  Also, if you *knew* you weren't
ever going to use more than (for instance) two drives on each controller,
you could put ra0 and ra1 on uda0, and ra2 and ra3 on uda1 (even though there
is no really good reason to actually do something like this, except for the
minimal savings in the size of the kernel).

As far as the problem at hand goes:  the reason I'd heard that you shouldn't
put more than one UDA-50 on a single UNIBUS is that is chews up a *lot* of
bus bandwidth.  The logical consequence of two UDA's should be degraded
performance, right?  I'd never heard of having the system hang because of it.

One last consideration:  do you actually have enough backplane power to
run two UDA-50's along with whatever else you have?  I know that marginal
power can hang systems up or crash them (the DEUNA used to do this all the
time).

--Pat.

chris@mimsy.UUCP (Chris Torek) (07/18/87)

>In article <6683@dartvax.UUCP> steve@dartvax.UUCP (Steve Campbell) writes:
>>controller	uda0	at uba0 csr 0172150		vector udintr
>>disk		ra0	at uda0 drive 0
>>disk		ra1	at uda0 drive 1
>>controller	uda1	at uba0 csr 0172550		vector udintr
>>disk		ra2	at uda1 drive 2
>>disk		ra3	at uda1 drive 3

In article <441@eplrx7.UUCP> lad@eplrx7.UUCP (Lawrence Dziegielewski) writes:
>... I suspect that it's you config that may be wrong.

Nope.

>Each uda can support 4 devices,

This is true.  There are only four places to attach drives to the
controller.  To make the quoted statement comprehensive, add the
words `up to' between `support' and `4'.

>and you have to (or should) tell config about them.

Nay, not so.

>In your configuration, you're asking unix to find ra2 on uda1 drive 2.
>This is not logically possible.

It most certainly is.  The requirements are that no Unix-name (raN)
can map to the same drive, or in MSCP parlance, unit[*], number on
the same uda50 controller.  ra0 can be unit 2 on uda1, ra1 unit 7
on uda3, ra2 unit 0 on uda0, and so forth.

-----
[*This is a rather unfortunate term, as Unix uses `unit' to mean
the number after the word `ra'.  E.g., `ra1' is Unix unit 1, though
it may be MSCP unit 7: `ra1 at uda3 drive 7'.]

>... I also use the secondary uda address of 0160334 in the MVaxes
>floating address space.

The UDA50A's csr address is set by switches on one of the two
boards.  If your configuration matches your switches, you are in
good shape.  Even if not, some fancy footwork in autoconf can
sometimes save the day.  The `standard' set of UDA50 addresses is
0772150, 0772550, and 0777550 (0772150 is the same as 0172150 due
to the funny Unibus mapping).

What makes Steve's problem particularly perplexing is that everything
works at least a little bit.  The machine finds the controllers
and drives, and can talk to them a bit, e.g., with raw I/O.  Raw
transfers do not really work the I/O system very hard, though, so
I suspect some sort of hardware glitch with `simultaneous' transfers.

(My first suggestion, of course, was to try my driver....)

Incidentally, for those running the driver I posted in April, I
may soon be posting some patches.  In particular, the code should
now work on Microvax IIs, although without a small patch to ubainit()
the crash dump code will continue to fail just like the 4.3 driver.
There is a bug fix relating to disk profiling (dk_busy is cleared
too soon) and another dealing with Unibus resets (I am not sure of
the presence of the bug, but I had to rewrite that section of code
anyway for a KDB50 driver).  I have no Microvax handy for testing
as yet, and I have other things I must do first, so just consider
this a teaser.  :-)
-- 
In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 7690)
Domain:	chris@mimsy.umd.edu	Path:	seismo!mimsy!chris

ed@mtxinu.UUCP (Ed Gould) (07/20/87)

>> For testing purposes, we moved 2 RA81s from uda0 to uda1, so in terms of the
>> config file we went from this...

>> controller	uda0	at uba0 csr 0172150		vector udintr
>> disk		ra0	at uda0 drive 0
>> disk		ra1	at uda0 drive 1
>> disk		ra2	at uda0 drive 2
>> disk		ra3	at uda0 drive 3

>> ...to this...

>> controller	uda0	at uba0 csr 0172150		vector udintr
>> disk		ra0	at uda0 drive 0
>> disk		ra1	at uda0 drive 1
>> controller	uda1	at uba0 csr 0172550		vector udintr
>> disk		ra2	at uda1 drive 2
>> disk		ra3	at uda1 drive 3

>I have 2 uda's running on several MicroVaxes,  and they all run fine.  I
>suspect that it's you config that may be wrong.  Each uda can support 4
>devices,  and you have to (or should) tell config about them.  So,  your
>config should look like this:

>controller	uda0	at uba0 csr 0172150		vector udintr
>disk		ra0	at uda0 drive 0
>disk		ra1	at uda0 drive 1
>disk		ra2	at uda0 drive 2
>disk		ra3	at uda0 drive 3
>controller	uda1	at uba0 csr 0172550(or 0160334)	vector udintr
>disk		ra4	at uda1 drive 0
>disk		ra5	at uda1 drive 1
>disk		ra6	at uda1 drive 2
>disk		ra7	at uda1 drive 3

>In your configuration,  you're asking unix to find ra2 on uda1 drive 2.
>This is not logically possible.

Close, but not quite.  The correct config would be

controller	uda0	at uba0 csr 0172150		vector udintr
disk		ra0	at uda0 drive 0
disk		ra1	at uda0 drive 1
controller	uda1	at uba0 csr 0172550		vector udintr
disk		ra2	at uda1 drive 0
disk		ra3	at uda1 drive 1

The config file defines a mapping between logical names (ra0) and
physical names (uda0, drive 0).  This mapping is essentially arbitrary.
In order for another mapping (names in /dev to device numbers) to remain
untouched by this experiment, it is important to maintain the names
ra2 and ra3 for the two drives that were moved.  Otherwise, the entries
in /dev would need to be changed.  There is no need to describe devices
that do not exist, nor is there a need to reserve names for them.
(Sometines, though, it is convenient to do so.)

The entire (new) mapping may be summarized as follows.  Note that there
are eight minor devices for each physical disk.

	/dev/ra0?   =>   (9, ( 0- 7)) == ra0   =>   uda0 drive 0
	/dev/ra1?   =>   (9, ( 8-15)) == ra1   =>   uda0 drive 1
	/dev/ra2?   =>   (9, (16-23)) == ra2   =>   uda1 drive 0
	/dev/ra3?   =>   (9, (24-31)) == ra3   =>   uda1 drive 1
	    |		  |	|	  |	    ------------
	    |		  |	|	  |	    physical name
	    |		  |	|	  --------- logical name
	    |		  |	------------------- minor device number
	    |		  ------------------------- major device number
	    --------------------------------------- name in file system




-- 
Ed Gould                    mt Xinu, 2560 Ninth St., Berkeley, CA  94710  USA
{ucbvax,decvax}!mtxinu!ed   +1 415 644 0146

"A man of quality is not threatened by a woman of equality."

lad@eplrx7.UUCP (Lawrence Dziegielewski) (07/20/87)

In article <1929@aw.sei.cmu.edu>, pdb@sei.cmu.edu (Patrick Barron) writes:
> In article <441@eplrx7.UUCP> lad@eplrx7.UUCP (Lawrence Dziegielewski) writes:
> >I have 2 uda's running on several MicroVaxes,  and they all run fine.  I
> 
> If you have a MicroVAX, then you *don't* have a UDA-50, which is a UNIBUS
> device.  The controller used on the Q-Bus is the KDA-50.

We are using MSCP controllers that look like uda-50's to our unix.  In the
config I call 'em uda0 and uda1,  not kda (never even heard of a kda...).

His config file still could be wrong.  I know enough about the subject to
know that it may be possible.

> It doesn't matter what you call the ra* devices, as far as I know.  If you
> really wanted to do something silly, you could put ra0, ra2, ra4, and ra6
> 
It may matter to his flavor of unix.  Mine (mt Xinu 4.3) expects the config
just as I originally posted it.  

I hope this fellow posts the fix as soon as he gets it,  I'd be interested
in finding out what the problem was.

Larry D.

steve@dartvax.UUCP (Steve Campbell) (07/21/87)

In article <6683@dartvax.UUCP> I wrote:

>Although conventional wisdom says not to put more than 1 UDA50 per
>unibus, we are trying to do just that.  We have added a second UDA50 to
>the bus and a third-party device called a USI/HRS from a company named
>Shitashi which claims to enhance the unibus bandwidth enough to permit
>the second uda.  The other devices on the unibus are a DEUNA and 2 DZ11s.
>
>For testing purposes, we moved 2 RA81s from uda0 to uda1, so we have...
>
>controller	uda0	at uba0 csr 0172150		vector udintr
>disk		ra0	at uda0 drive 0
>disk		ra1	at uda0 drive 1
>controller	uda1	at uba0 csr 0172550		vector udintr
>disk		ra2	at uda1 drive 2
>disk		ra3	at uda1 drive 3
>
>As far as we can tell, the hardware is working just fine.  
>But ...  if we do a large number of accesses to files on any disk
>USING PATHNAMES, then do a sync, the 2 disks on the second uda cannot
>be accessed, and the command - and terminal - trying to do so hangs
>completely.

Several people replied with suggestions about the hardware, including
adjusting the delay jumper on the UDA50s, swapping the backplane position
of the 2 UDA's, and changing the value of UDABURST in the driver.  None
of these experiments made any difference; the system still hangs as
described.  I am therefore reasonably confident that the problem is not
in the hardware.

Further experimenting (always in single-user mode) has turned up the
following evidence that perhaps someone with more knowledge of the kernel
than I have might be able to use.  The following sequence ALWAYS hangs:

	[reboot]; mount -a; find ...; sync; ls ...

The find searches about 1000 files for a non-existant file name, so it
just chases around the file system.  The ls is of a directory on a disk
on the second UDA50, and it's this ls that hangs.  BUT, the hardware is
NOT hung; a dd of the raw disk done instead of the ls works fine.
Moreover, an extra sync done after the mount will postpone the hang, ie
the ls shown will not hang, but later one will.  A umount/mount sequence
will also postpone the problem for a few minutes only.

So what's going on?  Is the namei cache perhaps involved?  Any suggestions
or pointers toward further tests would be welcome.

						Steve Campbell