[comp.os.vms] decnet DDCMP causes system hangup, or "The TGIF syndrome?"

lane@DUPHY4.DREXEL.EDU (Charles Lane) (03/16/88)

We've been having some problems lately with system hangups, and there
is some suspicion that an async decnet (DDCMP) link could be the cause
of it.

         VAX#1                            VAX#2   <-sync decnet->[vax]
         750            DDCMP 9600bps     750                     +->(many nodes)
<=ether= VMS 4.5     <------------------> VMS 4.4 <=ethernet====> (~5 nodes)
         DZ11                             DZ11

Vax#2 hangs up when the DDCMP line is connected, about once every 1-2
weeks, typically on a Friday afternoon.  (why this is, no one
knows.... "the TGIF syndrome?") The line is not totally noise free,
and will lose sync a couple of times a day, typically.  But when it
is going, the decnet link performs quite well.

The modems have been replaced by better, less error-prone modems, but
this seems to have no effect.  VAX#1 is heavily loaded, but has never
crashed or hung without the cause being obvious, and non-decnet
related.  VAX#2 is less heavily loaded.

VAX#2 *seems* to be stable when the line is just a `terminal line'
(although LOGINOUT is going constantly, since the other end is trying
to start up the DDCMP line).  This indicates that the hardware is
probably not (completely) to blame.

One thing that comes to mind is that perhaps VAX#2 is running short of
some system resource, and adding the DDCMP is just enough to make
something fail.  I've looked at memory & page- & swapfile usage, and
the margins seem to be quite adequate.  Are there other
resources/quotas that could cause the problem?

So, is there anyone out there with some ideas or experience that could
help track down the problem?  This link is fairly important to us, and
we really want to get it going....if the system hangups are caused by
something else, then eliminating the decnet link as a possible source
of problems is very desirable.

                            --Chuck Lane
                                cel@cithex.caltech.edu
                                cel@cithex.bitnet

jeh@crash.cts.com (Jamie Hanrahan) (03/23/88)

In article <880315171454.001@DUPHY4.Drexel.Edu> cel@cithex.caltech.edu writes:
> ...
>Vax#2 hangs up when the [async] DDCMP line is connected, about once every 1-2
>weeks, typically on a Friday afternoon.  

You said you checked "memory", but I'm not sure if this means you've checked
your nonpaged pool.  Since you're running Ethernet your LRPSIZE should be 
set to 1504, and if any of your pool regions have expanded past their initial
allocation into their "virtual" regions, set the initial allocation to whatever
the expanded size is, plus maybe 25 to 50%.  Do this in MODPARAMS.DAT and let
AUTOGEN make any other required changes.