[comp.os.vms] DECNET dies and users are disconnected

billy%ntvaxa.DECnet@UTADNX.CC.UTEXAS.EDU ("NTVAXA::BILLY") (09/28/87)

DECnet event 4.18, adjacency down
Circuit UNA-0, dropped by adjacent node

DECnet event 4.7, circuit down, circuit fault
Circuit DMC-0, Line synchronization lost

DECnet event 4.10 circuit up
Circuit UNA-0

DECnet event 4.10 circuit up
Circuit DMC-0

DECnet event 4.15 adjancency up
Circuit UNA-0
-----------------------------------------------------------------------------
I keep getting the above messages on the console once or twice a day.  At the
same time, all users are disconnected from that node.  Some of the users are on
a DZ11 and some are over ETHERNET.  The node doing this is a part of a two node
homogenous cluster.  This problem does not occur on the other node at all.
However, the node causing the problem is an area router for DECnet.  

I called the Remote Diagnostics Center about this.  They said "software
problem" and transferred me over to software support.  Software support told me
to increase NPAGEDYN, LRPCOUNT, and LRPCOUNTV.  I did so, ran AUTOGEN, and
rebooted.  That didn't solve the problem.  I have used SHOW MEMORY right after
one of these mass disconnections has happened and I am convinced that it is not
a memory problem as DEC said it was. 

Any ideas?

--------------------------------------------------------------------------------
Billy Barron                  Bitnet : BILLY@NTSUVAX or AC02@NTSUVAX
VAX Programmer/Operator       TEXNET : NTVAXB::BILLY or NTVAXB::AC02
North Texas State Univ.     Internet : billy%ntvaxb.decnet@utadnx.cc.utexas.edu
--------------------------------------------------------------------------------
------

leichter@VENUS.YCC.YALE.EDU ("Jerry Leichter") (09/28/87)

	DECnet event 4.18, adjacency down
	Circuit UNA-0, dropped by adjacent node

	DECnet event 4.7, circuit down, circuit fault
	Circuit DMC-0, Line synchronization lost

	DECnet event 4.10 circuit up
	Circuit UNA-0

	DECnet event 4.10 circuit up
	Circuit DMC-0

	DECnet event 4.15 adjancency up
	Circuit UNA-0

	I keep getting the above messages on the console once or twice a day.
	At the same time, all users are disconnected from that node.  Some of
	the users are on a DZ11 and some are over ETHERNET.  The node doing
	this is a part of a two node homogenous cluster.  This problem does
	not occur on the other node at all.  However, the node causing the
	problem is an area router for DECnet.  

	I called the Remote Diagnostics Center about this.  They said
	"software problem" and transferred me over to software support.
	Software support told me to increase NPAGEDYN, LRPCOUNT, and
	LRPCOUNTV.  I did so, ran AUTOGEN, and rebooted.  That didn't solve
	the problem.  I have used SHOW MEMORY right after one of these mass
	disconnections has happened and I am convinced that it is not a memory
	problem as DEC said it was. 

Sounds like a hardware problem to me.  Are the DZ11, DMC, and DEUNA on the
same Unibus?  Is there enough power available to that Unibus?  Did the prob-
lems by any chance start after you added some new device to the configuration?

The "common elements" among the three failures you are seeing are either some
fairly broadly-based software problem - I suppose it could be memory, though I
would have expected to see rather more complaining from VMS if it was really
running out - and any shared hardware.

Take a look at your error logs, BTW; they should show something (probably
errors reported against the UBA) if it is, indeed, the hardware that is at
fault.

Another clue to look for:  Do users report "DAP CRC checksum errors" when
transfering files through this machine?  Unibus problems often present them-
selves in this way....
							-- Jerry
------

jeh@crash.CTS.COM (Jamie Hanrahan) (09/28/87)

In article <8709272318.AA21530@ucbvax.Berkeley.EDU> "NTVAXA::BILLY" <billy%ntvaxa.decnet@utadnx.cc.utexas.edu> writes:
>I keep getting the above messages [DECnet circuit bounce reports]
> on the console once or twice a day.  At the
>same time, all users are disconnected from that node.  Some of the users are on
>a DZ11 and some are over ETHERNET.  The node doing this is a part of a two node
>homogenous cluster.  This problem does not occur on the other node at all.
>However, the node causing the problem is an area router for DECnet.  
> ...

You didn't say what kind of VAX this was.  Many of the larger VAXes with
Unibus adapters can tolerate a power failure in the Unibus expansion box;
in fact, field service can power off the BA11, swap a card, and power it
on, and VMS keeps going... but users are thrown off of terminal lines 
that go to boards in that box, and of course DECnet circuits get dropped.
Which sounds a lot like what's happening to you.  

SO, if it's not a 730 or 750, I'd look for intermittent power problems (or
perhaps grounding problems) in the BA11.  You might also check the 
error log for Unibus adapter power failure reports.