[net.unix-wizards] Unibus zero vectors on vax/780

matt@oddjob.UChicago.UUCP (Matt Crawford) (10/02/84)

We had been getting periodic crashes as the system (BSD 4.2) tried to
do a UBA reset.  The cause was the internal counter of unibus zero
vectors (uba_hd[0].uh_zvcnt) reaching 250000 every 10 days or so.
According to our DECmen, there should only be a few zero vectors per
day or per week.

"Naturally" they suspected the software.  After a few months it was
made clear that the hardware was responsible.  They brought up VMS
for a weekend and hooked up a logic analyzer to verify that the BRRVR
was in fact empty when read.  They have also apparently have tested
a system configured similarly to ours and it behaves the same way.

You may wish to check your system for this behavior.  Our
configuration is: 780 cpu with 1 DW, which holds:
1 UDA50 -- 2 RA81's
3 DZ11's
1 RX211 -- seldom used
1 3COM ethernet interface -- (You may be sure that they tried pulling
				this out.)
The zero vectors come fastest when one or both RA81's are active, and
high activity on the DZ's seems to increase the rate also.  Maybe
this is a design flaw in the UDA50?  VMS users may never observe this
symptom because VMS apparently ignores zero vectors completely.  I
can't tell you for sure what fraction of the unibus or SBI time is
wasted by this problem, but sometimes all terminals halt for several
minutes on end while the CPU and disks remain active.  This could be
related.

You can look at your zero vector count with adb via the command
uba_hd$<ubahd, or by hacking vmshow or a similar program, if you have
it.  I would be interested to know whether other similarly configured
systems have or don't have this problem.
_____________________________________________________
Matt		University	crawford@anl-mcs.arpa
Crawford	of Chicago	ihnp4!oddjob!matt

dmmartindale@watcgl.UUCP (Dave Martindale) (10/03/84)

All of the 780's that I've seen slowly accumulate zero vectors over time.
As far as I can tell, it is a feature of the interrupt controller used
in most DEC hardware.  When a device sees a bus grant going by at the
program interrupt level that it is on (BG4 or BG5 usually) and sees a
request for DMA cycle active (NPR), it will grab the grant, return
SACK to the arbitrator, and then release the bus to allow the NPR to
be serviced.  On PDP11's, this just caused a bit of wasted bus activity
but gave faster service to NPR's.  On the 780, at the point that the
BG is issued, the CPU is already in its interrupt service routine for
the UBA, and when a device doesn't complete an interrupt sequence,
the UBA returns 0 as the vector.

Thus zero vectors are a "feature" of the interrupt controllers
in some devices (DZ's included) which come into play when there is
DMA and programmed I/O activity occuring at the same time on the UNIBUS.
Having DZ's make it worse, since they do so much programmed I/O.
But I believe that this condition is quite normal and there is nothing
you can do about it, short of rearranging your UNIBUS.  Probably,
the kernel should be changed so that it does the UBA reset only if
the zero vector count has been very high over a very short period.
Accumulating the number since boot time without decrementing it
periodically is just silly.