[comp.protocols.tcp-ip] Life after source quench

Mills@UDEL.EDU.UUCP (11/08/87)

Folks,

Thanks to Hans-Werner Braun, who scrounged the log of the NCAR (National
Center for Atmospheric Research) fuzzball gateway on the NSFNET Backbone net,
we may have additional insight as to the effectiveness of its quench
mechanism and the implications for TCP implementations. The NCAR fuzzball is
seriously overloaded at times and, using the preemption and quench policies
described previously to this group, can be quite vocal about it. ICMP Source
Quench messages are sent when the mean queue length exceeds about 1.5 and at a
rate depending on the number of 256-octet blocks queued for a selected host.
Presently, only the host with the largest number of blocks is selected on the
assumption that quenchable flows do not occur very often and are almost always
due to a single host. See my previous messages for justification.

The following data illustrates typical scenarios found at NCAR. Each line
represents one quench message sent for traffic in the direction shown between
the two hosts. The two three-digit numbers are the ICMP type and code fields
(octal), where the code (second) field reveals the number of 256-octet blocks
queued at the time the quench was sent. (This interpretation of the code field
is at variance with the spec, but this is research, right?)

As expected, quenchable flows are relatively infrequent and are characterized
by large traffic surges lasting up to several minutes. For example, the code
field for the first line shows 120 (170 octa1) 256-octet segments sent by host
128.6.4.7 to host 128.102.16.10 living on a single output queue! In the first
surge the flow lasted about a minute during which six quenches were sent. It
is not cear from these data what the preemption policy was doing, but it is
likely that some quantity of packets were being dropped during this period.

HOST : 128.6.4.7 : RUTGERS.EDU,RUTGERS.RUTGERS.EDU,RUTGERS.ARPA : SUN-3/180
18:25:45 ?TRAP-I-ICMP 004 170 [128.6.4.7] -> [128.102.16.10]
18:25:46 ?TRAP-I-ICMP 004 135 [128.6.4.7] -> [128.102.16.10]
18:25:47 ?TRAP-I-ICMP 004 105 [128.6.4.7] -> [128.102.16.10]
18:26:38 ?TRAP-I-ICMP 004 127 [128.6.4.7] -> [128.102.16.10]
18:26:42 ?TRAP-I-ICMP 004 140 [128.6.4.7] -> [128.102.16.10]
18:26:44 ?TRAP-I-ICMP 004 135 [128.6.4.7] -> [128.102.16.10]

The next surge shows a seven-minute surge at the beginning and two shorter
surges at the end, with only sporadic quenches between.

HOST : 128.117.8.7 : (unlisted - who is this USAN dude?)
20:26:43 ?TRAP-I-ICMP 004 133 [128.117.8.7] -> [128.118.28.2]
20:26:45 ?TRAP-I-ICMP 004 124 [128.117.8.7] -> [128.118.28.2]
20:26:46 ?TRAP-I-ICMP 004 124 [128.117.8.7] -> [128.118.28.2]
20:27:16 ?TRAP-I-ICMP 004 101 [128.117.8.7] -> [128.118.28.2]
20:27:28 ?TRAP-I-ICMP 004 133 [128.117.8.7] -> [128.118.28.2]
20:27:30 ?TRAP-I-ICMP 004 151 [128.117.8.7] -> [128.118.28.2]
20:27:30 ?TRAP-I-ICMP 004 133 [128.117.8.7] -> [128.118.28.2]
20:27:45 ?TRAP-I-ICMP 004 124 [128.117.8.7] -> [128.118.28.2]
20:28:11 ?TRAP-I-ICMP 004 115 [128.117.8.7] -> [128.118.28.2]
20:28:12 ?TRAP-I-ICMP 004 142 [128.117.8.7] -> [128.118.28.2]
20:28:32 ?TRAP-I-ICMP 004 142 [128.117.8.7] -> [128.118.28.2]
20:28:33 ?TRAP-I-ICMP 004 110 [128.117.8.7] -> [128.118.28.2]
20:28:48 ?TRAP-I-ICMP 004 133 [128.117.8.7] -> [128.118.28.2]
20:29:31 ?TRAP-I-ICMP 004 133 [128.117.8.7] -> [128.118.28.2]
20:29:31 ?TRAP-I-ICMP 004 124 [128.117.8.7] -> [128.118.28.2]
20:30:27 ?TRAP-I-ICMP 004 115 [128.117.8.7] -> [128.118.28.2]
20:30:47 ?TRAP-I-ICMP 004 106 [128.117.8.7] -> [128.118.28.2]
20:31:23 ?TRAP-I-ICMP 004 133 [128.117.8.7] -> [128.118.28.2]
20:31:24 ?TRAP-I-ICMP 004 124 [128.117.8.7] -> [128.118.28.2]
20:31:36 ?TRAP-I-ICMP 004 124 [128.117.8.7] -> [128.118.28.2]
20:31:37 ?TRAP-I-ICMP 004 151 [128.117.8.7] -> [128.118.28.2]
20:31:41 ?TRAP-I-ICMP 004 106 [128.117.8.7] -> [128.118.28.2]
20:31:53 ?TRAP-I-ICMP 004 133 [128.117.8.7] -> [128.118.28.2]
20:32:06 ?TRAP-I-ICMP 004 106 [128.117.8.7] -> [128.118.28.2]
20:32:07 ?TRAP-I-ICMP 004 133 [128.117.8.7] -> [128.118.28.2]
20:32:10 ?TRAP-I-ICMP 004 142 [128.117.8.7] -> [128.118.28.2]
20:32:27 ?TRAP-I-ICMP 004 106 [128.117.8.7] -> [128.118.28.2]
20:32:33 ?TRAP-I-ICMP 004 124 [128.117.8.7] -> [128.118.28.2]
20:32:34 ?TRAP-I-ICMP 004 133 [128.117.8.7] -> [128.118.28.2]
20:32:35 ?TRAP-I-ICMP 004 124 [128.117.8.7] -> [128.118.28.2]
20:32:56 ?TRAP-I-ICMP 004 106 [128.117.8.7] -> [128.118.28.2]
20:32:58 ?TRAP-I-ICMP 004 106 [128.117.8.7] -> [128.118.28.2]
20:33:15 ?TRAP-I-ICMP 004 106 [128.117.8.7] -> [128.118.28.2]
20:48:14 ?TRAP-I-ICMP 004 124 [128.117.8.7] -> [128.118.28.2]
20:54:21 ?TRAP-I-ICMP 004 115 [128.117.8.7] -> [128.118.28.2]
21:46:50 ?TRAP-I-ICMP 004 104 [128.117.8.7] -> [128.118.28.2]
21:46:51 ?TRAP-I-ICMP 004 104 [128.117.8.7] -> [128.112.18.2]
21:58:30 ?TRAP-I-ICMP 004 124 [128.117.8.7] -> [128.112.18.2]
21:58:37 ?TRAP-I-ICMP 004 110 [128.117.8.7] -> [128.112.18.2]
23:28:31 ?TRAP-I-ICMP 004 112 [128.117.8.7] -> [128.112.18.2]
23:28:35 ?TRAP-I-ICMP 004 115 [128.117.8.7] -> [128.112.18.2]
23:28:36 ?TRAP-I-ICMP 004 115 [128.117.8.7] -> [128.112.18.2]
23:28:37 ?TRAP-I-ICMP 004 133 [128.117.8.7] -> [128.112.18.2]
23:28:40 ?TRAP-I-ICMP 004 115 [128.117.8.7] -> [128.112.18.2]
23:28:43 ?TRAP-I-ICMP 004 106 [128.117.8.7] -> [128.112.18.2]

The next one is a real hotrod, with 17 quenches in 24 seconds.

HOST : 129.93.1.3 : (unlisted)
22:04:31 ?TRAP-I-ICMP 004 113 [129.93.1.3] -> [128.84.252.18]
22:04:35 ?TRAP-I-ICMP 004 110 [129.93.1.3] -> [128.84.252.18]
22:04:36 ?TRAP-I-ICMP 004 116 [129.93.1.3] -> [128.84.252.18]
22:04:36 ?TRAP-I-ICMP 004 127 [129.93.1.3] -> [128.84.252.18]
22:04:37 ?TRAP-I-ICMP 004 140 [129.93.1.3] -> [128.84.252.18]
22:04:37 ?TRAP-I-ICMP 004 151 [129.93.1.3] -> [128.84.252.18]
22:04:38 ?TRAP-I-ICMP 004 146 [129.93.1.3] -> [128.84.252.18]
22:04:38 ?TRAP-I-ICMP 004 132 [129.93.1.3] -> [128.84.252.18]
22:04:39 ?TRAP-I-ICMP 004 105 [129.93.1.3] -> [128.84.252.18]
22:04:51 ?TRAP-I-ICMP 004 113 [129.93.1.3] -> [128.84.252.18]
22:04:52 ?TRAP-I-ICMP 004 143 [129.93.1.3] -> [128.84.252.18]
22:04:52 ?TRAP-I-ICMP 004 165 [129.93.1.3] -> [128.84.252.18]
22:04:53 ?TRAP-I-ICMP 004 170 [129.93.1.3] -> [128.84.252.18]
22:04:53 ?TRAP-I-ICMP 004 157 [129.93.1.3] -> [128.84.252.18]
22:04:54 ?TRAP-I-ICMP 004 140 [129.93.1.3] -> [128.84.252.18]
22:04:54 ?TRAP-I-ICMP 004 127 [129.93.1.3] -> [128.84.252.18]
22:04:55 ?TRAP-I-ICMP 004 102 [129.93.1.3] -> [128.84.252.18]

I am told the Craymonsters do in fact something useful with ICMP Source Quench
messages. There is some evidence for that in the following, which shows a
surge lasting about a minute, but with the quenches mostly spread out at about
thirty-second intervals (except the last one), not in terrible spasms like the
above. If a Craymonster can be tamed with a quench every thirty seconds or so,
they may be pussycats, not monsters, after all.

HOST : 128.174.10.48 : NCSAD.ARPA : CRAY-X/MP :
22:15:30 ?TRAP-I-ICMP 004 102 [128.174.10.48] -> [128.84.252.18]
22:16:06 ?TRAP-I-ICMP 004 110 [128.174.10.48] -> [128.84.252.18]
22:16:36 ?TRAP-I-ICMP 004 110 [128.174.10.48] -> [128.84.252.18]
22:16:37 ?TRAP-I-ICMP 004 124 [128.174.10.48] -> [128.84.252.18]

The data suggest that a quench policy operating with a relatively long
integration time, such as the fuzzball policy and the policy suggested by Raj
Jain (the so-called DEC-bit) can indeed be effective. However, it is not at
all clear from the above data that the surges are due to a single TCP
connection, unless that connection was using window sizes in the 26000-octet
range. If multiple connections are involved, an effective quench strategy may
need to operate over several simultaneous and concurrent connections and
retain state over periods up to a minute or more. The operating system would
then have to restrain individual connections as a function of environment
variables independent of window modulation by the protocol itself. If it is
true that single connections with humungus windows are most prevalent, then
TCP window-drawdown strategies such as previously suggested would work
peachy-keen.

Comments from the host administrators of the above hosts would be welcome.
Can somebody describe the Craykitten anti-monster implementation?

Dave

krol@UXC.CSO.UIUC.EDU (Ed Krol) (11/09/87)

The following is an excerpt from a paper by Charlie Kline 
(kline@uxc.cso.uiuc.edu) given at this summers Sigcom in Stowe describing
the Cray CTSS source quenched algorithm which Dave seems impressed
by:

  CTSS TCP/IP treats quenches as IP events rather than TCP events. Berkeley
  responds to quenches by reducing the size of the TCP window. We respond, as
  suggested by Postel in a draft RFC, by introducing a delay between the
  sending of IP packets to the host which is producing the quenches. The delay
  increases linearly as more quenches are received. If no quenches are
  received in a certain interval, the interval is decreased exponentially.

kline@uxc.cso.uiuc.EDU (Charley Kline) (11/09/87)

Gosh Professor Mills, you guys all yelled at me so much when I
mentioned that d.ncsa.uiuc.edu didn't respond to quenches that I
thought I'd better do it right. Do I get an A on my project?

As Ed pointed out, the method that our "Craykitten" uses in response to
a source quench is simply to shackle all packets in the IP output queue
destined for the originator of the quench (I mean the ip_dst of the
packet returned in the quench) such that there is a delay of X
milliseconds before each is transmitted. X is initially zero, and the
current parameter is to increase it by 500 milliseconds for each quench
received. If no quenches are received for 30 seconds, X is halved. An X
lower than 500 causes the quench reaction to stop. This all happens in
the IP module, and TCP is unaware of the quenches.

I'm sure that the reason that the fuzzball is issuing quenches every
thirty seconds is because if only one quench is sent, IP throttles back
to one packet every 500 milliseconds (which should keep the fuzzball
happy), but when the 30 second quench reaction stops, IP starts
vomiting the packets full blast again, which causes another quench. I'm
pleased that the quench mechanism creates such effective data rate
communication between an IP module and IP gateways.

I can't take credit for the method, it's an implementation of Postel's
proposal. I only messed with the parameters.

-----
Charley Kline
University of Illinois Computing Services
kline@uxc.cso.uiuc.edu
kline@uiucvmd.bitnet
{ihnp4,uunet,pur-ee,convex}!uiucuxc!kline

Mills@UDEL.EDU (11/10/87)

Ed,

Thanks for the info; however, to fully evaluate the Cray response I would need
to know the parameters of the algorithm: what is the delay, the increment and
the weighting constant for the decrease? How long does it wait for decrease?
In order to set these properaly, it would be helpful for the Cray to know
something about the overall path, such as path delay, estimated flow rate,
packet loss rate, etc. At least on countermotivation for effecting the
quench policy at the IP level is that this information is hard to come by.

Dave

Mills@UDEL.EDU (11/10/87)

Charley,

Gee, you didn't say how mungus the packet is - 65K give/take fragments?
An incremental delay of 500 ms is probably okay for the 56-Kbps Backbone
or ARPANET, but certainly not for the ARPANET/MILNET gateways. To do it
right, you should know something more about the path, such as overall
delay, estimated flow rates and loss rates. TCP of course could give
that to you, assuming TCP were involved. I doubt UDP or raw IP would
generate the observed horsepower; on the other hand, a Craycreature
may well be needed to supply the watts for tomorrow's domain-name
server turbines.

If your suggested scenario is correct and the quench needs nick only
one every thirty seconds or so, that would be real swell news. However,
Hans-Werner Braun reports finding quench gushers for the UIUC Craykiller
at other times, which suggests additional testing and observation may
be in order.

Dave

ehrlich@psuvax1.psu.edu (Dan Ehrlich) (11/13/87)

In article <8711081434.aa25516@Huey.UDEL.EDU> Mills@UDEL.EDU writes:
> ...
>HOST : 128.117.8.7 : (unlisted - who is this USAN dude?)
>20:26:43 ?TRAP-I-ICMP 004 133 [128.117.8.7] -> [128.118.28.2]

I do not know who [128.117.8.7] is but [128.118.28.2] is
psumeteo01.psu.edu, a MicroVAX II run by the meteorology department
here at Penn State.  I believe that they run VMS and are using an
EXCELAN card to there TCP/IP.  If I can be of more help get in touch.

> ...
>Dave

-- 
Dan Ehrlich <ehrlich@psuvax1.{psu.edu,bitnet,uucp}>
The Pennsylvania State University, Department of Computer Science
333 Whitmore Laboratory, University Park, PA   16802
+1 814 863 1142 or +1 814 865 9723

lekash@orville.nas.nasa.GOV.UUCP (11/13/87)

I take care of 128.102.16.10.
It's one of the '2nd-root-servers'
Quite why someone at rutgers is sending
such a stream of to it, I couldn't tell
you.  Maybe a broken server at their
end, and they are cleverly using the
wrong set of servers due to a leak somewhere.

				john

cruff@scdpyr.UUCP (Craig Ruff) (11/13/87)

In article <3083@psuvax1.psu.edu> ehrlich@psuvax1.psu.edu (Dan Ehrlich) writes:
>In article <8711081434.aa25516@Huey.UDEL.EDU> Mills@UDEL.EDU writes:
>> ...
>>HOST : 128.117.8.7 : (unlisted - who is this USAN dude?)
>>20:26:43 ?TRAP-I-ICMP 004 133 [128.117.8.7] -> [128.118.28.2]
>
>I do not know who [128.117.8.7] is but [128.118.28.2] is

128.117.8.7 is an IBM 4381 here at NCAR running Spartacus KNET software.
-- 
Craig Ruff      NCAR                         INTERNET: cruff@scdpyr.UCAR.EDU
(303) 497-1211  P.O. Box 3000                   CSNET: cruff@ncar.CSNET
		Boulder, CO  80307               UUCP: cruff@scdpyr.UUCP

hedrick@ATHOS.RUTGERS.EDU (Charles Hedrick) (11/14/87)

There can't really be two sets of root servers.  The problem is that
when a server doesn't know the answer to a question, it generally
sends a response that refers the questioner to the root.  Bind
processes such responses by adding all the data they give to its
cache, and then posing its question again.  (The hope is that some of
the data just added will let it find the answer this time.)  So if the
official root servers point to any other servers that even indirectly
ever refer us to a server that lists the bogus root servers in a
response, we will eventually end up with the bogus root servers in our
cache.  Furthermore, the problem is contagious, because now we will
refer other people to talk to us to you as a root server.  I don't
doubt that there are still bugs in our named (although I think it is
better than any of the released versions).  But it doesn't dream up
name servers out of whole cloth.  I believe you are seeing a
combination of a bug that causes unreasonably large rates of name
server requests, with the fact that somebody has referred us to you as
a root server.  Thus you get caught in the crossfire between us and
the roots.  Tonight I'm going to spend some more time inside named.  I
have found a number of pieces of code in it already that are
non-functional.  I suspect I'll find another one or two tonight.

(Definition: When I use the term "bogus root name server", I mean any
name server that claims to be a root name server, which is not listed
as one by SRI-NIC when we ask it who the root servers are.)

montnaro@sprite.steinmetz (Skip Montanaro) (11/16/87)

In article <8711091811.AA23668@uxc.cso.uiuc.edu> kline@uxc.cso.uiuc.EDU
(Charley Kline) writes:
>I'm sure that the reason that the fuzzball is issuing quenches every
>thirty seconds is because if only one quench is sent, IP throttles back
>to one packet every 500 milliseconds (which should keep the fuzzball
>happy), but when the 30 second quench reaction stops, IP starts
>vomiting the packets full blast again, which causes another quench. I'm
>pleased that the quench mechanism creates such effective data rate
>communication between an IP module and IP gateways.
>
>I can't take credit for the method, it's an implementation of Postel's
>proposal. I only messed with the parameters.

I'm no whiz at anything related to TCP/IP (although I find this group
interesting, if not necessary, reading), but this seems like a
situation that calls for either

1. Some hysteresis. Is it wise (correct?) to have the start quench and
stop quench thresholds be the same?

2. Finer granularity. Given the rate at which most machines can spew
out packets, 500 milliseconds sounds rather coarse.

Skip (montanaro@ge-crd.arpa or uunet!steinmetz!sprite!montanaro)