[net.unix-wizards] 4.2bsd uba question

parker@nrl-css.ARPA (Alan Parker) (03/26/85)

I'm trying to add another dhdm (Able) to my /780 4.2 system.  At
the same time, I'm removing a dmf-32 (also Able).  This is my uba
startup before:

dh0 at uba0 csr 160040 vec 200, ipl 15
dm0 at uba0 csr 170510 vec 100, ipl 14
dh1 at uba0 csr 160020 vec 210, ipl 15
dm1 at uba0 csr 170500 vec 104, ipl 14
dh2 at uba0 csr 160320 vec 340, ipl 15
dm2 at uba0 csr 171000 vec 410, ipl 14
dh3 at uba0 csr 160340 vec 360, ipl 15
dm3 at uba0 csr 171020 vec 430, ipl 14
dz0 at uba0 csr 160100 vec 300, ipl 15
dz1 at uba0 csr 160110 vec 310, ipl 15
dz2 at uba0 csr 160120 vec 320, ipl 15
dz3 at uba0 csr 160130 vec 330, ipl 15
dmf0 at uba0 csr 160400 vec 764, ipl 15
acc0 at uba0 csr 167600 vec 270, ipl 15
ec0 at uba0 csr 164324 vec 440, ipl 16

This is when I replace the dmf with the dhdm (of course, this
kernel is configured for the dhdm instead of the dmf):

dh0 at uba0 csr 160040 vec 200, ipl 15
dm0 at uba0 csr 170510 vec 100, ipl 14
dh1 at uba0 csr 160020 vec 210, ipl 15
dm1 at uba0 csr 170500 vec 104, ipl 14
dh2 at uba0 csr 160320 vec 340, ipl 15
dm2 at uba0 csr 171000 vec 410, ipl 14
dh3 at uba0 csr 160340 vec 360, ipl 15
dm3 at uba0 csr 171020 vec 430, ipl 14
dh4 at uba0 csr 160400 vec 240, ipl 15
dm4 at uba0 csr 171040 vec 120, ipl 14
dz0 at uba0 csr 160100 vec 300, ipl 15
dz1 at uba0 csr 160110 vec 310, ipl 15
dz2 at uba0 csr 160120 vec 320, ipl 15
dz3 at uba0 csr 160130 vec 330, ipl 15
acc0 at uba0 csr 167600 vec 270, ipl 15
uba0: too many zero vectors
uba0: reset dz0 dz1 dz2 dz3 acc0
ec0 at uba0 csr 164324 didn't interrupt

What does the too many zero vectors message mean? The problem
seems to be related to the 3Com board (ec0) because with it out
of the configuration, the new dhdm works just fine.  I tried
moving ec0 to the top of the uba configuration, but it also fails
there.  Notice that the vectors of the new dhdm are far away from
the ec0 vector (440).  Any ideas would be appreciated.

chris@umcp-cs.UUCP (Chris Torek) (03/27/85)

Aha, I get to expose my ignorance again.  Hopefully I won't flop as
badly as with the "T1 links" question...  (Still, aren't those 56Kbit
links just a single T1 channel?  So in a sense... oh, forget it.)

If I'm not mistaken, a "zero vector" occurs when a device asks for the
Unibus, then doesn't respond when it gets it.  The 3Com boards are
rumored to be very persnickety about Unibus access.  Probably it
doesn't like sharing the bus with that DH/DM.  You might try moving it
from slot to slot....
-- 
In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 4251)
UUCP:	{seismo,allegra,brl-bmd}!umcp-cs!chris
CSNet:	chris@umcp-cs		ARPA:	chris@maryland

lrr@siemens.UUCP (03/29/85)

Gawd - I thought I was the only person who had this problem.
We had that here at Siemens and it turned out to be a marital problem between
a DZ11 and the 3COM board.  In the process of moving the 3COM board to
another slot in the backplane, I removed one DZ11 and the problem went away!
I put the DZ11 back in (anywhere, in fact) and the problem resurfaced.  Ok,
so I cann my DEC Field Service and they bring out another DZ.  Of course,
they want to run diagnostics to prove that my board is bad (any board) and
of course nothing is wrong.  I told them to just replace the ``bad'' board
with the new one and I'll be happy.  After a while (a few hours) they simply
agree; they do the swap and leave.  Now, I don't think that the were
convinced that there was a problem with that DZ, so they might have put it
back in circulation.  Maybe you got it!

My recommendation?  Pull DZ boards one at a time and reboot.  Convince DEC
that the one that you pulled is bad and have them replace it.  Good luck!


Larry Rogers
Siemens Research and Technology Labs
Princeton, NJ
princeton!siemens!jaguar!lrr

terryl@tekcrl.UUCP () (04/06/85)

>Gawd - I thought I was the only person who had this problem.
>We had that here at Siemens and it turned out to be a marital problem between
>a DZ11 and the 3COM board.  In the process of moving the 3COM board to
>another slot in the backplane, I removed one DZ11 and the problem went away!
>I put the DZ11 back in (anywhere, in fact) and the problem resurfaced.  Ok,
>so I cann my DEC Field Service and they bring out another DZ.  Of course,
>they want to run diagnostics to prove that my board is bad (any board) and
>of course nothing is wrong.  I told them to just replace the ``bad'' board
>with the new one and I'll be happy.  After a while (a few hours) they simply
>agree; they do the swap and leave.  Now, I don't think that the were
>convinced that there was a problem with that DZ, so they might have put it
>back in circulation.  Maybe you got it!


     Frankly, I agree with you 100%, but would like to add a little info to
the fray: almost all of DEC's diagnostics run standalone, and check only one
device at a time. They will not find any problems associated with timings
between two or more boards, whether they are the same type or different. We
went through a similar problem here with our 750 a couple of months ago:
We had two unibus's (unibi?) on our system, along with one massbus with a
couple of eagles with Emulex's SC750 controller, and an RA81 on one unibus,
with a TU80 on the other unibus. First booted up the system with the RA81,
and then genned a system with the eagles on the massbus as root. Did a dump
of the root on the RA81 (which worked fine), but when trying to do a restor
onto one of the eagles, the system crashed with reserved operand faults.
Well, we scratched our heads and called our local support group. Luckily,
they had run into this problem before and knew what to do. The whole upshot
of this was that before the system was released to us, they ran diagnostics
for two days with no problems, BECAUSE ALL DIAGNOSTICS RUN STANDALONE testing
one device at a time. BTW, the fix was to put the massbus at a different bus
arbitration level (other than the standard) because the second unibus on a
750 is at a fixed bus arbitration level.


					Terry Laskodi
					     of
					Tektronix