[comp.protocols.tcp-ip] X.25 problems

lars@ACC-SB-UNIX.ARPA (Lars Poulsen) (10/14/87)

> Date: 14 Oct 1987 05:13-EDT
> Subject: X.25 problems
> From: CERF@A.ISI.EDU
> To: service@ACC-SB-UNIX.ARPA
> 
> I don't understand why the introduction of release 7.0 should
> exacerbate X.25 VC shortages - the limitation is in the ACC
> software, isn't it (maximum VCs set at 64?) and this would be
> a bottleneck rgardless of the IMP release (7 or its predecessor)?
> Why would these problems surface only with release 7?
> 
> Vint

Vint,
	The limitation is actually in firmware rather than in software.
We run the entire packet level on a 68000 on our X.25 board. And yes,
the limit is 64.
	Our device driver closes virtual circuits after (currently 10)
minutes of idle time. Since the timer value is in a source code #define
and we provide source code, any system manager can tighten this, and free
up VC's after two minutes, if they desire.
	The timer is set fairly long; we at one time closed circuits after
one idle minute, only to find that we would be thrashing VC's: Under certain
network conditions, the packet round trip time could go up to 80 seconds.
Under pathological conditions (buffer shortage in the PSN) we have even seen
30 seconds round trip time for an ICMP echo addressed to the host itself.
This can only be explained by the X25 equivalent of 1822 "blocking".
We are REALLY looking forward to the new End-to-end module curing this
problem.
	The lack of virtual circuits usually becomes a problem when
the network becomes pathologically slow. We speculate that this is
because transfers that normally complete in a couple of minutes
may take up to a half hour under these conditions, and thus there is
much more overlap.
	The transition release PSN7.0 has more code in it than either
PSN6.0 or PSN7.1 ; this means fewer buffers. This tends to provoke
the situation described above.
	Finally, I should mention that I have seen that hosts that use
many virtual circuits tend to have a few of these with bursts of real
traffic, such as you would expect for "normal" TCP use (SMTP, FTP,
TELNET) and a large number of VCs with very short bursts (< 5 packets)
with large intervals (one burst every 15 minutes or so). Invariably,
these VCs are to GATEWAYS, which is why I speculated that this might
be EGP traffic (I have never really read up on routing protocols).
I am told that each gateway peers with no more than 3 core EGP-mumblers.
I am now speculating, that maybe some gateway daemons like to ping each
gateway that they hear about to make sure it is reachable, but this
is speculative.
	I hope this helps you understand why we are concerned about
the transition.

	/ Lars Poulsen
	  ACC Customer Service
	  Service@ACC-SB-UNIX.ARPA

CERF@A.ISI.EDU (10/14/87)

Lars,

thanks for the explanation - what you say means that the
problem could arise in earlier releases, but is exacerbated by
the shortage of buffers. Memory again! In this day and age, one
wishes that memory problems would be a thing of the past. Do you
know how much memory complement is carried by each C30E IMP on
the ARPANET and/or MILNET?

Vint

pogran@CCQ.BBN.COM (Ken Pogran) (10/14/87)

Vint,

C/30Es in the ARPANET and MILNET have 256KW (1/2 Megabyte) of
memory.  C/300s, just beginning to be introduced at particularly
"busy" nodes, have twice that.  It's certainly a far cry from the
"old days" of Honeywell 516s and 316s; then again, there's a lot
more functionality in PSNs these days, and each PSN typically
serves a larger number of host interfaces than in the past.

By the way, I second Lars Poulsen's comment about "REALLY looking
forward to the new End-to-End module" alleviating some of the
X.25 performance problems that have been seen.  In PSN 7.0,
interoperability between X.25-connected and 1822-connected hosts
is "built in" rather than "grafted on," and we should see a good
bit of improvement.  Nothing that can, in and of itself, make it
seem like the network has infinitely more transmission resources,
but ...

Finally, everyone should understand that all of the changes and
improvements, to both the network and its hosts, are being
introduced into an environment of ever-increasing traffic and
numbers of gateways.  So, when changes are made and they settle
down after initial problems are corrected, etc., we must
remember in making "before" and "after" performance
comparisons that the load being imposed upon the network is
higher "after" than it was "before"!

Ken Pogran

Mills@UDEL.EDU (10/14/87)

Lars,

Non-core gateway daemons, the only ones likely to use ACC interfaces, do
NOT ping gateways other than the three corespeakers and then only with
EGP, which has intrinsic provisions to limit the polling frequency.
The only other polling-type protocol likely to appear over ARPANET paths
is the Network Time Protocol (NTP) spoken by a few gateways, but not
PSC to my knowledge. Therefore, I must conclude that whatever the cause
of vast numbers of ARPANET host pairs with one end at PSC it is due to
normal traffic spasms. Note that the so-called extra-hop problem due
to incomplete knowledge at the corespeakers can create a non-reciprocal
situation where two circuits, not one, are required between certain
host pairs.

What I am not hearing in your explanation on how the ACC interface handles
VC allocation is what happens when all VCs are fully allocated. I have
heard from PSC staff that the driver complains in messages to the operator
when an attempt is made to open another VC past the 64-circuit max. I
would assume the polite driver would keep an activity with entries for
each active VC and clear the oldest/least-used to make room for the next
one. I would assume this would also happen if an incoming call request
appeared from the network and all VCs were committed. Further, I would
assume both the PSN and ACC would do this kind of thing, no matter what the
timeouts chosen.

Dave

CERF@A.ISI.EDU (10/15/87)

Ken,

if the improvements are not keeping up with the load they are the wrong
improvements!

Vint

pogran@CCQ.BBN.COM (Ken Pogran) (10/16/87)

Dave,

In your message Wednesday to Lars Poulsen of ACC you said,

    "I would assume the polite driver would keep an activity with
     entries for each active VC and clear the oldest/least-used
     to make room for the next one. I would assume this would
     also happen if an incoming call request appeared from the
     network and all VCs were committed. Further, I would assume
     both the PSN and ACC would do this kind of thing, no matter
     what the imeouts chosen."

The PSN developers tell me that the PSN CAN'T initiate a clear of
an active (and non-idle) VC just to make room for the next VC as
that is a violation of the X.25 spec.  The DTE can, of course,
initiate a clear of VCs for any reason.

One thing we CAN do in the PSN is limit, from its side, the
number of active VCs with a host.  For example, if the PSN is
configured to not allow more than N VCs with a host because we
know that's a limitation in the host, the PSN would decline an
incoming call request for a host if it would be the N+1st active
VC with that host.  This might be desirable if the host's
receiving an incoming call request from the PSN that put it over
its limit would cause its software to hang or otherwise behave
badly instead of cleanly rejecting the incoming call.  Don't know
if this pertains to any behavior we're currently seeing in the
net, though.

Ken

P.S.  The developers also tell me that having an idle timer
"stretches" the spec.  I am NOT going to split hairs and get into
semantic discussions over when a VC is "active" and when it is
"idle!"

Mills@UDEL.EDU (10/17/87)

Ken,

I'm not particulary persuaded by the "non-spec" argument. Let them eat
resets. Holding a VC in reserve may be a useful workaround, but it
does not get at the heart of the problem. Granted, if the VC pool
were exhausted in the PSN, there is not much you can do; however,
if the pool were exhausted in the host, a incoming-call packet
could still be passed to the firmware, which could then close a
VC on its own. The same trick could work in the PSN, which would
close the VC with a dirty-smelly reset.

Aitcha glad we have X.25 to bash? We could always gang up on the TCP
reliable-close issue...

Dave