[net.unix-wizards] Tracking down zero UNIBUS interrupt vectors

v.wales%ucla-locus@sri-unix.UUCP (09/20/83)

From:            Rich Wales <v.wales@ucla-locus>

On our VAX 11/780 running 4.1BSD, every so often some device on the
UNIBUS starts spewing forth zero interrupt vectors.  After about
250,000 of these, of course, UNIX's solution is to do a UNIBUS reset.

Clearly, it would be nice if I could figure out which device is causing
all these zero vectors -- but I can't seem to figure out any way to get
the culprit to 'fess up, since (Catch-22!) the only way I can think of
to identify an interrupting device on the UNIBUS is by its vector.

By adding a few lines to sys/locore.s and dev/uba.c, I was able to tell
that the guilty device is interrupting at IPL 15.  That doesn't really
help me much, though, because just about EVERY device I have interrupts
at IPL 15.

Does anyone out there have any helpful hints?

Our UNIBUS configuration is as follows, by the way:

	    1   SI 9400 disk controller
	    1   ABLE DH/DM
	    6   DEC DZ-11's
	    1   Proteon V2LNI interface
	    1   Interlan Ethernet interface

Everything we have interrupts at IPL 15, except for the LNI interface
and the DM half of the DH/DM (both of which interrupt at IPL 14).

-- Rich <wales@UCLA-LOCUS>

dmmartindale@watcgl.UUCP (Dave Martindale) (09/25/83)

First, is it a device suddenly spewing forth these vectors, or is it
the slow, gradual collection of 250000 of them over a long period? As
distributed, the system never resets this count and if you seldom crash
this can become a problem.

One "normal" source of zero vectors is DEC interrupt controllers.  Some
of them are designed to speed up DMA transactions by throwing away bus
grants if NPR is asserted.

During a normal interrupt sequence, the device pulls down BR5 (in this
case) and waits to see BG5.  When it gets BG5, it returns SACK and then
eventually asserts BBSY and INTR along with the vector when the
previous transaction completes.  SACK is negated after INTR is
asserted.

This is probably fine on PDP11's, but on the 780 the UBA doesn't know
what priority the processor is at and thus can't issue BG's on its
own.  Thus it just passes the BR on to the processor as an interrupt
request on the SBI, and when the UBA interrupt handler goes to read the
appropriate BRRVR, the UBA then knows that the processor is ready to
handle that interrupt and issues the BG.  Then, if the grant is thrown
away without producing an interrupt vector, the UBA just returns zero
since it has to pass back something.  This produces a zero vector.

(The above description is my own understanding of how this works, based
on reading manuals and circuit diagrams and watching the bus.  I could
be wrong....)

Anyway, you would expect to get these frequently if you have devices
which have this sort of interrupt controller (and I think the DZ's do)
plus lots of Unibus DMA activity.  The Unibus disk would provide the
latter.  Now, if the zero-vector count builds up gradually, there
really isn't much you can do practically about it; just reset it to
zero every once in a while so you don't get unibus resets.

If you really do get very large bursts of zero vectors all at once and
can produce the problem on demand, or observe it while it is happening,
you should be able to find out which device is actually requesting the
interrupt, and which is throwing away the grant, with an oscilloscope
or (better) a logic analyzer.  Probably not an attractive prospect, but
a useful last resort...

Hope this helps.

	Dave Martindale