[fa.tcp-ip] Interactive traffic punishment via MILNET?

tcp-ip@ucbvax.ARPA (07/25/85)

From: the tty of Geoffrey S. Goodfellow <Geoff@SRI-CSL.ARPA>

For the last few months we have noticed a dreadful condition that
seems to strike with a good deal of regularity when using a
MILNET TAC to connect to an ARPANET Host.  The same thing also
happens when using a local network host gatwayed into the ARPANET
which in turn ends up at a MILNET TAC.

Specifically, this has to do with interactive "links" where two
users are TALKing to one another and there are single character
packets in both directions.  The symptom is that the person at
the TAC's output goes into molasses mode where they receive a
character from the host once every second or so.  This happens on
two different operating systems (Tenex and TOPS-20), and as i
said with directly connected ARPANET hosts as well as those
behind a local network gateway.

Any ideas was is exacerbating this situation?  Anyone else out
there experienced it?

g

tcp-ip@ucbvax.ARPA (07/25/85)

From: LARSON@SRI-KL.ARPA

  I published a discussion of this problem on the tops-20 mailing
list a month or two ago.  The situation seems to result from round
trip time estimates being calculated incorrectly.
  I have a 'fix' that seems to make things better.  It is installed
on SRI-KL.
	Alan
-------

tcp-ip@ucbvax.ARPA (07/25/85)

From: Mark Crispin <MRC@SIMTEL20.ARPA>

Welcome to the club.  This problem is with much more than links,
as just about anybody who does interactive character-at-a-time
traffic between MILNET and ARPANET has found out.  I've taken to
putting my TELNET into local echo mode and using the local line
editor to compose messages line at a time.  I don't even try to
do any editing across the gateways any more.

I believe BBN is doing some work on the problem.

-- Mark --
-------

tcp-ip@ucbvax.ARPA (07/25/85)

From: "J. Noel Chiappa" <JNC@MIT-XX.ARPA>

	It's not MILNET <-> ARPANet. I've been pissing and wailing
about this on MIT-XX since 1980; at that point traffic from my
machine to MIT-XX went in via a LAN connected to my machine and
a front end PDP11 on the 20; no 1822 nets at all.
	It still happens, although the configuration is a little
different now. There's still no ARPANET<->MILNET gateway, though.
It seems to happen whenever I type any distance ahead of the echoing.

	Noel
-------

tcp-ip@ucbvax.ARPA (07/26/85)

From: mgardner@BBNCC5.ARPA

BBN is well aware of the problems and are working on them.

--Marianne

tcp-ip@ucbvax.ARPA (07/26/85)

From: CERF@USC-ISI.ARPA

Geoff,

I wonder if it is possible that the Tenex and TOPS-20 or
the TAC TCP reacts to source quenches which are likely when
sending many short packets by throttling back on packet output
rate??

Vint

tcp-ip@ucbvax.ARPA (08/09/85)

From: mgardner@BBNCC5.ARPA


Lixia,

It is not easy to give a brief answer to your question of what exactly are the
problems with the mailbridges, but I will do my best.

Gateways are inherently bottlenecks to traffic between two networks.  For
example, ARPANET and MILNET are reliable networks, but their traffic is
funneled through gateways designed to drop data whenever pressed for space.
Retransmission at the link level is fast, because the retransmission timer
triggers a retransmission fairly quickly.  The retransmission timers at the
transport layer must be slower, and so retransmission by TCP will affect what
the user sees.  The interactive user is, of course, most likely to notice.  
Speeding up this timer, by the way, is not a good solution, since the effect is
increased congestion and poorer service for everyone. (More dropped datagrams,
more retransmitted datagrams.)

Another reason that the internet will never function as well as a subnet is
that the gateways link heterogeneous systems.  If one side is sending much
faster than the other side is receiving, the gateways are designed to drop
datagrams.  This problem are exacerbated by the current lack of buffer space in
the LSI/11s, by the lack of an effective means of slowing down a source, and by
a rudimentary routing metric that does not allow routing to respond to bursts
in traffic.

The mailbridges are a worse bottleneck than other gateways for several good
reasons.  First they were placed with the idea that the traffic between them
would be filtered for mail.  We expected a reduction in traffic.  On the
contrary, since the physical split of ARPANET and MILNET, there has been a
sharp rise in the amount of traffic between the two networks.  The bridges are
overloaded.  In addition, there are a number of hosts which send almost all
their traffic to the other net.  These hosts may be on the wrong network.  A
third problem for the mailbridges is load-sharing.  It is important that the
traffic between the two networks be spread among the different mailbridges.
This is the function of the load-sharing tables.  But this is a static routing,
based on expected traffic.  Since the destination is not known, the routing
most likely to provide good service is to home a host to its nearest
mailbridge.  However, when the host has a one or two hop path on one side of
the mailbridge and a five or six hop path on the other side, the mailbridge
will see speed mismatch problems, similar to those associated with mismatched
network speed.  The solution is not to ignore the load-sharing, since, everyone
sending to the same bridge will create even worse problems.

These are the problems we see in a perfect world where hardware and software
problems have been banished.  Unfortunately, we live in the real world.  The
software and hardware problems themselves can be in the hosts, the lines, or
the network.  They are usually hard to diagnose, since the symptom of the
problem, for example congestion, may be physically remote from the source of
the problem.  It is often not even clear where in the chain the problem lies.
For example, is congestion at an ISI IMP caused by the mailbridge, by ARPANET
congestion around ISI, by back-up from a local net, by ARPANET congestion
remote from ISI, by a host at another IMP, or by still another factor?

I look at mailbridge statistics every day.  I see, almost daily, the effects of
host problems.  Although these problems are most often caught by the host
administrators, and, if not, are tracked by our monitoring center, let me list
a few of the problems that I followed personally.  I have seen a run-away
ethernet bring MILISI to its knees, a gateway with a routing plug cause
congestion felt by a host on the other side of the network, and three cases of
hosts flooding the network with faulty IP datagrams.  The internet is
pathetically vulnerable to congestion caused by a single host.

At BBN we have a number of tools to monitor the long-range performance of
the internet.  The gateways send messages, called traps, any time an event of
interest occurs.  We summarize these on a daily basis, and keep the detailed
trap reports on hand for use when we see a problem.  The gateways store
throughput information, including how many datagrams were processed by each
gateway, summarized for the gateway, and separated by interface or neighbor.
Throughput reports give us detailed information, such as how many datagrams are
dropped (discarded) by the gateway, broken down by reason, and the number of
datagrams sent back out the same interface they used on arrival.  We can also
collect statistics on the number of datagrams between each source and destination
host.  In addition, we can measure a wide range of parameters in ARPANET or
MILNET.  These include detailed throughput statistics, statistics about the
end-to-end traffic and about the store-and-forward traffic.

But even with all these tools (and others) at our disposal, we are stopped at
the host.  There we find TCP/IP implementations written by many different
people and containing subtle differences in interpretation that could lead to
major problems.

Given this range of sources for the problems, what can we, at BBN, do to
improve the situation?  Keep in mind that we affect the mailbridges, the IMPs,
and, since we monitor the lines, the line quality, but we can only open a
discussion concerning host problems.

Analysis of the host to mailbridge traffic data, has revealed that there are a
number of hosts (including TACs) sending most of their traffic to the other
net.  Some of this traffic can be moved off the internet, reducing the load, by
the addition of TACs and rehoming hosts.  We are considering adding a
mailbridge.  Software to increase the number of buffers in the LSI/11 gateways
has already been written.  We are investigating ways to reduce the control
traffic, which should also reduce the load on the mailbridges.  We have
increased our attention to host problems and are notifying the host
administrator when see problems.  We are also considering writing guidelines
for optimizing communication with ARPANET/MILNET.  This would include
appropriate settings for retransmission timers and sending rates.  It should
also include guidelines for reasonable responses to source quenches, those
largely ignored messages sent by the gateway to a host which is sending data
too fast.

I hope this answers your question and will open up some interesting discussion
on this mailing list.  

Marianne

tcp-ip@ucbvax.ARPA (08/12/85)

From: Charles Hedrick <HEDRICK@RUTGERS.ARPA>

To complicate things, the host administrators often don't know that
much about how their software works.  When somebody posts a message
on the net saying that some horrible thing is causing some inconceivable
result, I have no way of knowing whether any of my hosts are contributing.
I run TOPS-20, Unix, and Eunice TCP's, and I do not know the details
of any of the TCP implementations.  (With Eunice I do not even have
access to the source.)  If you sent me a patch and told me to install
it, I would, but if you asked me whether my retransmission gizmo was
frabulating the gateway matter-antimatter mix, I would have no way
to respond.  I'm not sure quite what you can do about this, but in 
some ways it may make your problem easier.  What you probably need
is one or two knowlegable sites for each OS.  Then you could download
fixes they develop to the rest of us.  You will also have to find a
stick big enough to get these fixes put into the next release from
the vendor.  Maybe DCA could arrange to have Norad point a few 
missles in the direction of <name omitted to protect the guilty, of
which there are several>.  One problem that is making this more
complex is that the natural experts on TOPS-20 TCP are ISI and BBN,
but their code has diverged from the code supported by DEC and used
by the less sophisticated sites such as ourselves.  This is an area that
seems particularly amenable to the use of stategic weaponry.  Whether
the missles should be pointed at Marlboro or Cambridge and California is
a decision I would be happy to leave up to you.  (There are some
unpleasant politics hiding behind the surface here, which I am going to
avoid talking about in public, at least at the moment.)
-------

tcp-ip@ucbvax.ARPA (08/12/85)

From: Dan <LYNCH@ISIB>

Charles,  Your displeasure at some combination of ISI/BBN/DEC for the
sorry state of affairs in TCP updates/maintenance is noted.  Since I
was in the middle of that menage for a few years I can shed some
light (and dark?) on the subject.  There are two main issues:

1) Money
2) Research

Take the "research" issue first.  Many of the "problems" seen in TCP
usage are truly complicated and need to be examined carefully in the
diverse internet enviornment.  That brings up "money"...
DEC gets money for selling machines (and attendant software).
BBN gets money for doing research on networking (and for operating
some networks).
ISI gets money for running systems and keeping customers content.

The above simplifications are accurate enough for this diatribe.

The major flaw in the above division of effort is that the vendor,
DEC, does not spend enough money on making a great TCP for TOPS20.
They do not live in the Internet environment on a daily basis.
I am sure that they do a much better job with DECNET because they live
in that environment daily.  And make money on it.  
As for BBN, they have many fish to fry these days and have been
known to refuse to work on a problem unless they got paid for it.

ISI (where I was located from 1980-1983) basically gave up on both DEC 
and BBN as timely sources of help in resolving vexing performance
and functionaility problems.  We relied on them heavily for 
longer term solutions while we tried to keep our systems on
the air for our thousands of users.  
ISI would readily give out its code to anyone who had a source
license from DEC.  Of course the recipient would have to take out
our ISI site specific enhancements to get a running system...
And we did not have a lot of time/energy to promulgate and assist
others in the quest of a stable, high performance TCP.

That's a short recap in history.  What did we learn and what
can we do better in the future?  We learned that Internetting
is very complex, that declaring something to be a product
does not make it so, and that money is the root of all good.

I'd better cut it short on the "future" part.  Since TOPS20 is
dying I don't see much impetus (money) for improving the 
mechanisms in that arean.  But Unix sure ain't dead nor is VMS.
If improvements are to
be readily produced and distributed then I suggest that some
entity be formed (or identified as existing) and funded to do a quality job 
for all internet users.  Laissez faire just doesn't cut it.

Dan

PS.  I have been entering this via a Milnet TAC to the Arpanet host
at ISIB and have held my breath until now!  Geoff, thank you for
airing this subject.  The stuttering and delays are awesome.