matt@oddjob.UChicago.UUCP (Matt Crawford) (10/02/84)
We had been getting periodic crashes as the system (BSD 4.2) tried to do a UBA reset. The cause was the internal counter of unibus zero vectors (uba_hd[0].uh_zvcnt) reaching 250000 every 10 days or so. According to our DECmen, there should only be a few zero vectors per day or per week. "Naturally" they suspected the software. After a few months it was made clear that the hardware was responsible. They brought up VMS for a weekend and hooked up a logic analyzer to verify that the BRRVR was in fact empty when read. They have also apparently have tested a system configured similarly to ours and it behaves the same way. You may wish to check your system for this behavior. Our configuration is: 780 cpu with 1 DW, which holds: 1 UDA50 -- 2 RA81's 3 DZ11's 1 RX211 -- seldom used 1 3COM ethernet interface -- (You may be sure that they tried pulling this out.) The zero vectors come fastest when one or both RA81's are active, and high activity on the DZ's seems to increase the rate also. Maybe this is a design flaw in the UDA50? VMS users may never observe this symptom because VMS apparently ignores zero vectors completely. I can't tell you for sure what fraction of the unibus or SBI time is wasted by this problem, but sometimes all terminals halt for several minutes on end while the CPU and disks remain active. This could be related. You can look at your zero vector count with adb via the command uba_hd$<ubahd, or by hacking vmshow or a similar program, if you have it. I would be interested to know whether other similarly configured systems have or don't have this problem. _____________________________________________________ Matt University crawford@anl-mcs.arpa Crawford of Chicago ihnp4!oddjob!matt
dmmartindale@watcgl.UUCP (Dave Martindale) (10/03/84)
All of the 780's that I've seen slowly accumulate zero vectors over time. As far as I can tell, it is a feature of the interrupt controller used in most DEC hardware. When a device sees a bus grant going by at the program interrupt level that it is on (BG4 or BG5 usually) and sees a request for DMA cycle active (NPR), it will grab the grant, return SACK to the arbitrator, and then release the bus to allow the NPR to be serviced. On PDP11's, this just caused a bit of wasted bus activity but gave faster service to NPR's. On the 780, at the point that the BG is issued, the CPU is already in its interrupt service routine for the UBA, and when a device doesn't complete an interrupt sequence, the UBA returns 0 as the vector. Thus zero vectors are a "feature" of the interrupt controllers in some devices (DZ's included) which come into play when there is DMA and programmed I/O activity occuring at the same time on the UNIBUS. Having DZ's make it worse, since they do so much programmed I/O. But I believe that this condition is quite normal and there is nothing you can do about it, short of rearranging your UNIBUS. Probably, the kernel should be changed so that it does the UBA reset only if the zero vector count has been very high over a very short period. Accumulating the number since boot time without decrementing it periodically is just silly.