[mod.protocols.tcp-ip] Maintaining Statistics for TCP/IP Implementations

cam@ACC-SB-UNIX.ARPA (Chris Markle) (12/17/86)

On the last day of the TCP/IP Implementor's Workshop held in August 86,
a gentleman from BBN spoke about network monitoring protocols. In this
discussion was mentioned a "list" of statistics that TCP/IP
implementations could maintain for internal use by the implementation
or for query by a network monitor device via some sort of network
monitoring protocol.  The BBN gent was asked if he would post this
list of statistics on this mailing list; he seemed to imply that he
would.

Did this information get posted and I missed it? If so, does anyone
know what msg number it was or what date it was posted? If not, would 
the folks at BBN be interested in posting it?

Also, if anyone else has notions on what sort of statistics would be
worthwhile for a TCP/IP (etc.) implementation to maintain, please send
mail to me directly and I will summarize the responses in a later 
posting to this mailing list.

Thanks in advance for any help in this matter.

Chris Markle (cam@acc-sb-unix) (301-290-8100)

LYNCH@A.ISI.EDU (Dan Lynch) (12/18/86)

Chris,  It was Charles Lynn at BBN (CLynn@BBN.COM) who gave that network
statistics presentation.  He is leading the session on Network Management
in the March 87 TCP/IP Interoperability conference and like Diogenes is
stil looking for the truth...  I remember his presentation and was 
awestruck at the level of implementation detail he sought in order
to get a suitable baseline of statistics for monitoring the health of
the "network".  (All of the statistics he asked for do essentially
exist in any real implementation because of the retransmission 
requirements of TCP.)  

Dan
-------

jbn@GLACIER.STANFORD.EDU.UUCP (12/20/86)

     Much of what I learned about congestion in the Internet I learned by
instrumenting a TCP implementation.  The information that you need is
not necessarily the information that a typical implementation keeps.
Yet as it turns out, collecting this information is quite inexpensive.
Management of the exceptional cases is the crucial issue.

     During the life of a TCP connection, it is useful to maintain some
event counts, and at the conclusion of the connection, it is useful to
generate a log entry of some form, at least for connections that meet
some criteria.  

     When a packet is received, there are several possibilities as to
its disposition.  The most useful (not, unfortunately, always the most
common) case is that it contains new and acceptable data, an ACK that
acknowledges previously unacknowledged data, or a window update that
advances the window.  This case must of course be handled efficiently.
Packets which change the state of the connection are also useful, but
efficiency is less of an issue.  But packets which do none of these
things are redundant; they represent an error somewhere in the system.
It is immensely useful to count the useful packets over the life of a
connection.  My criterion was that if less than 95% of the packets
received over the life of a connection were useful, (allowing for at
least 5 non-useful packets on short sessions to handle startup issues),
then a log entry should be generated to indicate trouble. 

     Reading such a log is an edifying experience.  The most notable fact
about such a log is that certain machines are represented all out of 
proportion to the amount of traffic they generate.  One of course logs
the identities of the hosts involved in the connections.  A log entry here
corresponds to "dropping a trouble ticket" in a telephone central office;
it indicates something to be fixed.  Enough said.

     One also wants to keep a tally of retransmission attempts; again, if
the number of retransmitted packets is large over the life of the connection,
something is wrong and this should be noted.  Of course, if a connection
closes abnormally, one logs that fact for later analysis.

     It is also useful to log rejected packets.  Find all those places
in your TCP where you decide to drop a packet because it is "bad", and
make them calls to a routine that logs the packet with an error code.
One turns up all sorts of dirty laundry that way.

     The number of ICMP Source Quenches received is also quite useful;
again, large values compared to the volume of data traffic are significant.

     When I operated a VAX with such logging two years ago, there would
be five or six connections logged as bad when the network was operating
properly; there might be hundreds when something was wrong.  That's how
I managed to make a large network based on slow links work properly.

     It is worth thinking about how one might report such data in a standard
way to a network monitor node.  Something that generated one datagram per
"bad" TCP connection might be quite useful; some would of course get lost
but serialization would allow the network monitor to detect this, and
statistical techniques could be used to compensate for the lost data.
You do need to log a measure of the total data transmitted in each
direction on the connection, and log entries should also contain cumulative
information about the total amount of data and total number of connections
so that statistical computations can be made.

     One needs this information to manage a network.  With it, one can 
manage your network, and make it perform well.  Without it, one can just 
grumble and make excuses.

				John Nagle