[comp.protocols.tcp-ip] high cost of routing

snorthc@RELAY.NSWC.NAVY.MIL (08/29/89)

	The cost of routing?

senario:
	I am collecting data on network protocols and applications.
The experiments are conducted on a test subnet only (128.38.45)
and from the '45' subnet to another (128.38.48).  The '45' cable has
only 4 computers, a sun, a dec 3100, a lanalyser and a lan watch.
The 48 cable is a "living lab" various protocols are allowed to
exist on it: decnet, novell_ipx, apple_localtalk, osi etc.  It is
also pretty quiet, a 2% peak for 1 second is the max traffic observed
to date (I haven't broken out the LAN MDs yet).  The router is
currently a Network Systems Corporation EN-641.  In a few more weeks
it will be replaced by a cisco box and the tests will be rerun.

question:
	There is a repeatable difference in the number of packets
required to conduct an operation on the 45 cable alone or routed
from the 45 cable to the 48 cable for certain applications.
The difference is fairly high for xterm and telnet, it cannot be
detected for ftp.  Below are two fragments of the test results
form.  They are fairly representative, I have been running tests for
three weeks and have collected a fair quantity of data.  
examples:	45 CABLE		45 -> 48 CABLE
	Traffic required to initiate an xterm connection:
		60 packets		71 packets
	Traffic telnet requires to transmit a known string:
		37 packets		45 packets	

YES! arps are stripped out and maintained as a separate stat.
YES! the test has been run to/from similar hw/sw platforms *
NO! fragments have not been observed... in fact in the case of
xterm or telnet you tend to have small packets anyway.

* there is only one dec 3100, so an ultrix vax was used on the 48 cable,
however there is a sun 2 sunos 3.5 on both cables and results are
quite similar.

So I am confused.  What causes this overhead?  Is there an RFC I should
have read on this subject?  Could it be the router?  Any ideas?

		Thank You,

		Stephen Northcutt (snorthc@relay.nswc.navy.mil)

craig@bbn.com (Craig Partridge) (08/29/89)

In article <8908291343.AA07813@ucbvax.Berkeley.EDU> snorthc@RELAY.NSWC.NAVY.MIL writes:
>	There is a repeatable difference in the number of packets
>required to conduct an operation on the 45 cable alone or routed
>from the 45 cable to the 48 cable for certain applications.
>The difference is fairly high for xterm and telnet, it cannot be
>detected for ftp.
> ...
>examples:	45 CABLE		45 -> 48 CABLE
>	Traffic required to initiate an xterm connection:
>		60 packets		71 packets
>	Traffic telnet requires to transmit a known string:
>		37 packets		45 packets	
>
>YES! arps are stripped out and maintained as a separate stat.
>YES! the test has been run to/from similar hw/sw platforms *
>NO! fragments have not been observed... in fact in the case of
>xterm or telnet you tend to have small packets anyway.

In general, it would be much easier to diagnose this problem with
a packet trace, but...

    - Have you stripped out duplicate SYNs segments?  Some TCP's retransmit
	SYNs pretty quickly at the start.  As a result, you are likely
	to see a few more SYNs/SYN-ACKs in each direction during connection
	setup.  This is simply the cost of getting the connection calibrated
	to the path.  [Although note you can retransmit SYNs in more or
	less intelligent ways].

    - Similarly, does this problem of extra packets persist during the
	entire connection, or only at startup?  If only at startup you
	may just being seeing effects of the retransmit timer calibrating
	itself to the slightly longer delay.

Speculating in the absence of enough data....

Craig

jas@proteon.com (John A. Shriver) (08/29/89)

Two possibilities:

1.  One of the host TCP implementations is hyper-sensitive to small
changes in the round trip time.  There will be an increase in
end-to-end delay passing through the router.  This could affect
various congestion control algorithms in the hosts (like the Nagle
algorithm). 

2.  Someone is dropping a packet.  It could be that the router is not
keeping up, or it could be that the router is sending faster than one
of the hosts can receive.  Many Ethernet interfaces have subtle "deaf
time" problems.

You might want to look at the TCP, IP, and device stats on the two
machines.  Unfortunately, until 4.3tahoe (?), the TCP stats were not
very detailed.

Alternately, you could use a Sniffer (or the like), and interpret the
TCP packets to see what's happening.

barns@GATEWAY.MITRE.ORG (08/29/89)

It could be many things, and the way to find out for sure is to look
at the packets.  However, here is an example of a possible candidate.
Telnet is an especially difficult protocol to analyze in terms of
theoretical packet behavior.  Neither end is exactly sure when it 
ought to send next, so there are implementation decisions involved.
The decisions found in BSD-flavored code (and many others) tend to
induce a dependency on round-trip time.  If the timing works out
favorably, you will have echoes and acknowledgements and window
updates traveling together.  If it works out less favorably, there
will be extra packets carrying "naked ACKs" and possibly also
window updates.  By going through a router, you increase the RTT by
some amount.  This may increase the chance that the TCP will feel
the need to send an ACK for an incoming segment before the echo or
other output is ready to go back.

Many other things affect packetization, such as Nagle algorithm,
need for retransmission, all the parameters that affect retransmission
(since they indirectly determine what will go in the packets),
context or task switching behavior of the OS, etc.  I've done a few 
limited paper vs. reality studies on what makes packets and find that 
there are so many (nonlinear) factors that even if you know quite a 
lot about the underlying factors, there are likely to be yet more 
factors that you didn't know about.  To summarize, real systems have
a lot of timing-related properties, and the TCP (and perhaps its higher
layer) both contribute to and are affected by them.  These influences 
show up more with irregular flows of small data chunks than with regular 
data flows of big data chunks.

I also feel that the Nagle algorithm is too blunt an instrument to
handle such flows nicely in certain sub-congested regimes, but I
don't think that is what is biting here.

Bill Barns / MITRE-Washington / barns@gateway.mitre.org

stev@VAX.FTP.COM (08/29/89)

some people (us included), send smaller packets when sending traffic 
across networks. this is to avoid the routers fragmenting packets.


this is considered a good thing. the over head is more when you are
going through a fast router between fast networks, but you end up
winning more when you consider more networks and more heavily loaded 
routers.


stev knowles
ftp software
stev@ftp.com