[fa.tcp-ip] problems getting to CISL-SERVICE-MULTICS

HEDRICK@RED.RUTGERS.EDU (Charles Hedrick) (09/30/85)

We can't seem to get CISL-SERVICE-MULTICS to accept our mail.
We can normally send very short messages, but for anything
substantial, the connection times out.  This is a fairly new
problem.  We just changed our network configuration.  Our
DEC-20 used to be directly on the Arpanet.  It is now on an
Ethernet, and we are using a PDP-11/23 gateway, with the MIT
code, further hacked and supported by Noel Chiappa.  The
major difference here is that we now have an MTU of 1500
instead of 9xx.  The gateway is supposed to do fragmentation
and reassembly.  We wonder whether either the gateway or
CISL is blowing it.  Maybe some others of you can help us
in this diagnosis by indicating whether you are having
similar problems.  Here is what we see:
  - we can get to MIT-MULTICS.  They are running the same
	SMTP but a different TCP from CISL
  - we have the same problem from Topaz, our major Unix system,
	so it isn't TOPS-20 specific
Are other folks having trouble getting long mail items to
CISL-SERVICE-MULTICS?
-------

jis@MIT-BITSY.MIT.EDU (Jeffrey I. Schiller) (09/30/85)

Actually the situation with CISL-SERVICE-MULTICS is more complicated
then that. CISL is running an IP/TCP which doesn't understand about
any other network but net 10 (The ArpaNet). There is a kludge installed
in its IP that sends any non-net-10 packet to MIT-MULTICS who then
routes it apropriately.

Now MIT-MULTICS is constrainted to not send packets greater then 200
bytes (because if it does, the IMP it is plugged into crashes... a problem
(unresolved now for over two years) with message mode HDH support in
the IMP). So chances are now that CISL is sending you 572 byte packets
which are being fragmented by MIT-MULTICS. It is quite possible that
either your reassembly code is bad or MIT-MULTICS's fragmentation code
is bad (MIT-MULTICS's TCP arranges never to send segments that would
be fragmented by the IP layer). Some IP tracing would probably help.

			-Jeff

CLYNN@BBNA.ARPA (09/30/85)

Welcome to the world of gsteways.  I have observed problems with the
same symptoms.  They resulted from a combination of two factors.  One,
you are not getting any flow control; the imps are very good at not
accepting more data than they can handle.  (Are you getting any ICMP
Source Quench messages from the gateway?  What does your TCP do with
them?)  Several TCPs run in bang-bang mode: sending everything they
can until they either run out of data or fill the window.  Going from
an ethernet to an 1822 net with such an implementation is bound to
flood the gateway (which has to process packets from all the TCP
connections as well as any UDP traffic).

The second factor which I have observed relates to fragmentation.
Most fragmentation algorithms split a single packet into two or more
fragments and queue them sequentially for the output interface.  This
increases the number of packets for a particular destination by at
least a factor of two.  1822 nets have limits on the number of packets
for an 1822 connection.  The fragments cause this limit to be reached
more quickly.  Also, some receivers do not seem to be capable of
receiving back-to-back packets, which frequently result from
fragmented packets.  Note that in such a situation no amount of TCP
retransmissions will ever get the fragmented packet through.

What can you do?  Are the TCPs exchanging maximum segment size
options?  If you are not receiving any, your maximum packet size
should be 576, so fragmentation should not be needed.  The option
works well if one of the hosts is on the network with the minimum MTU.
Does TCP get information from IP about the size of the maximum IP
packet received; it is a useful "hint" about the MTU of intermediate
networks.  Make sure that ICMP source quench messages are being
processed.  Consider algorithms to monitor traffic flow to give some
feedback to TCP about how much data it is reasonable to have in
transit at one time (when is the retransmission timer started for a
given packet; hopefully not before the packet preceeding it has been
acked).  Think about flow control in an internet environment (maybe
some students might like to work on the problem).

Charlie