[comp.protocols.tcp-ip] more interesting numbers

mike@BRL.ARPA (Mike Muuss) (04/14/88)

----- Forwarded message # 1:

Received: from brl-smoke.arpa by SEM.BRL.ARPA id aa10255; 31 Mar 88 21:59 EST
Received: from brl-spark.arpa by SMOKE.brl.ARPA id aa16517; 31 Mar 88 21:55 EST
Date:     Thu, 31 Mar 88 21:45:08 EST
From:     Phil Dykstra <phil@BRL.ARPA>
To:       datacom@BRL.ARPA
Subject:  more interesting numbers
Message-ID:  <8803312145.aa04086@SPARK.BRL.ARPA>

Some recently obtained per node averages for gateways:

The seven BBN ARPA/MILNET Core gateways:
	10.04 packets/sec	5.78 % drop rate

The NSFNET Backbone "Fuzzball" gateways:
	15.55 packets/sec	0.18 % drop rate

The Bld 394 BRL gateway:
	~20 packets/sec		~0.8 % drop rate

The vast majority of our dropped packets are comming from the
currently sick Proteon Ring, so this value would normally be
much lower.

The 394 gateway is currently handling over 1.5 Million packets/day.
The 328 gateway is probably similar.  [I will have better data for
BRL in a few weeks.]

That's a lot of packets!

- Phil

----- End of forwarded messages

gross@GATEWAY.MITRE.ORG (Phill Gross) (04/14/88)

Mike, Phil,

Fascinating numbers.  Do we understand why the mailbridges have such a
higher drop rate?  

Until recently, of course, the 7 mailbridges were still clunky old LSI
11/23's.  That would account for some performance difference since I
believe the NSFnet Fuzzies are 73's.  However, the IETF Adopt-A-Core-Gateway
program finally caught up to the mailbridges a couple weeks ago, so they
should all be 73's now too.  What is the date for your numbers?

Or is it that the mailbridges (indeed all core gateways) simply spend too
much time processing routing updates and not enough time forwarding
packets.  Almost half the core traffic seems to be routing updates.  That
means for every other packet, the gateway has to go off and spend cycles
thinking about something besides packet forwarding.  (Oh where are those
Butterflies?)

Phill Gross

Mills@UDEL.EDU (04/15/88)

Phil, et al,

From Mike Minnich's data presented at a recent IAB meting and other
sources such as the LSI-11 reports issued by BBN, I suspect Phil's
numbers are total aggregate and include ICMP and routing overheads.
As you point out, between 40 and 60 percent of all mailbridge traffic
is overhead, while the NSFNET backbone overhead is much lower. While
the NSFNET backbone fuzzballs do use the 11/73, they are memory
limited, not CPU limited. I would be surprised if this were not the
case for the mailbridges, at least those using the 11/73. As reported
in the SIGCOM 87 paper and in another submitted to SIGCOM 88, I would
like to believe the difference in drop rates is due to the design of
the NSFNET backfuzz selective-preemption and source-quench schemes.

Dave

narten@PURDUE.EDU (Thomas Narten) (04/15/88)

> Almost half the core traffic seems to be routing updates.

How true is that statement? The stats I have seen come from core
gateways. Hardly a representative sample. Ignoring mailbridge traffic,
one expects "real" traffic to go to and from "hosts", where hosts are
non-core gateways to LANs, NSFnet, etc.  An interesting statistic is
last week's traffic stats for the purdue LSI-11 EGP core server:

GWY         RCVD           RCVD     IP       % IP         DEST   % DST
NAME        DGRAMS         BYTES    ERRORS  ERRORS       UNRCH   UNRCH
PURDUE   7,184,830 1,097,986,642       101   0.00%      27,954   0.39%

GWY         SENT           SENT    DROPPED    % DROPPED
NAME        DGRAMS         BYTES   DGRAMS        DGRAMS
PURDUE   7,557,696 1,424,771,755    24,090        0.32%

That's an average of 11.9 and 12.5 pps respectively. The funny thing
is, Purdue directs all its traffic out through its Butterfly gateway;
the only Purdue traffic traveling through the LSI-11 would be
misrouted packets.

If we assume that an EGP connection exchanges one hello/I-H-U packet
every 60 seconds, and one fragmented and one unfragmented NR update in
place of a hello/I-H-U every 180 seconds, one expects 9 EGP packets
every 180 seconds, 4 to the LSI, 5 from it. In addition, if we (over)
estimate the number of EGP peers the 11 maintains at 260, egp traffic
accounts for 260*(4/180) = 5.8 pps RCVD, 260*(5/180) = 7.2 pps SENT.
Certainly GGP doesn't consume the remaining 5 pps.  Who is responsible
for the remaining traffic?

Thomas Narten

phil@BRL.ARPA (Phil Dykstra) (04/21/88)

The numbers that I posted for the NSFNET fuzzies the the ARPA/MILNET
Core gateways were for the week ending 21 March 88 (and came via Dave
Mills).  Only one of the mailbridges and five EGP speakers had been
upgraded to 11/73's at that time.  It would be interesting to see
how they have improved.

The numbers for the BRL gateway came from ~48 hours ending 30 March 88.
That gateway has three ethernets, one 10 Mbit proteon ring, and two
1822 IMP connections (one MILNET, one local).  Packet counts were
half of in+out and included everything to/from the interfaces - EGP,
ICMP, etc., included.  If I had been posting to this list originally
I would have spoken a bit more carefully.

To answer/comment-on a few replys:

> Mike Brescia
> ... "average" throughput is a measure of packets actually offered over the
> course of the day or week reporting period, ....
> ... is a measure of handling offered load rather than limitation.

Good point.  They were all long term averages.  For the record I believe
that we all agree that "one packet" goes both in and back out of a gateway.
Most vendors seem to count that as two, for obvious reasons.

> Phill Gross
> Do we understand why the mailbridges have such a higher drop rate?

Mike Brescia mentioned a few possible reasons.  Another, which I have
heard about but don't know any details of, is a claim that the Core
gateways maintain max queue lengths of eight packets per destination
subnet (I would presume to avoid overloading the PSN's).  That probably
causes many more drops than a more generous gateway would have (unless
there are only PSN's connected to it and they return the favor).

> Dave Mills
> I would like to believe the difference in drop rates is due to the design
> of the NSFNET backfuzz selective-preemption and source-quench schemes.

I note however that your numbers went from good to excellent.  I am hoping
to be able to test that theory by playing the same game locally.

- Phil
<phil@brl.arpa>

phil@BRL.ARPA (Phil Dykstra) (04/21/88)

> Who is responsible for the remaining traffic?

Good question.  I would wager a bet that it is the old GGP induced
"extra-hop" problem.  Speaking EGP on the MILNET side of the world
I can't verify this in your case, but here is an example of this very
bad phenomena in action on this half of the core system.

Several weeks ago a typical set of EGP routes received from MINET-GW
(a MILNET Core EGP speaker) looked like this (summarized by number of
routes per gateway):

 # routes  Example net     Gateway
   307     10.0.0.0        26.0.0.106       Random "mailbridge"
    19     128.165.0.0     26.3.0.75        EGP speaker (YUMA-GW)
    18     128.171.0.0     26.1.0.65        EGP speaker (AERO-GW)
     7     128.56.0.0      26.3.0.29        BRL gw1
     6     128.20.0.0      26.2.0.29        BRL gw2
     4     128.115.0.0     26.6.0.21
     4     128.102.0.0     26.4.0.16
     3     128.60.0.0      26.20.0.8
     3     128.229.0.0     26.0.0.103
     2     129.43.0.0      26.0.0.88
     2     128.47.0.0      26.5.0.60
     2     128.122.0.0     26.0.0.58
     1     192.5.13.0      26.2.0.55
     1     192.31.98.0     26.5.0.129
    ... many more single route entries ...

The mailbridge 26.0.0.106 happened to be the "choice of the day" for
routes via the ARPANET.  Seeing a very large number of routes to a
single mailbridge is quite common; it changes every few hours or days.
[I would like by the way to hear if this is load balanced on a per
peer basis or something, or if everyone on a given EGP speaker gets
the same selection.]

But the real problem is the 37 routes to the Core EGP speakers!  We
got these routes by polling the MINET gateway, and MINET did what
it was supposed to do - never gave ITSELF as a route to anything.
Any exterior gateway which advertised its routes to MINET came out
correctly.  However, >>> any exterior gateway which advertised its
routes to YUMA and/or AERO but not to MINET (i.e. not to the EGP
peer that we polled), showed up as reachable via (one of) the EGP
peer(s) that they spoke to! <<<

This is a serious problem, because besides the sillyness of inducing
an extra "hop" to reach those networks, it also directs a large amount
of traffic to the Core EGP speakers - something which BBN(?) has been
trying to avoid!  Thus to answer Thomas Narten's question (I gather
that the machine in question is an ARPA-side Core EGP speaker): The
traffic is probably "extra-hop problem" induced.

How It Happens (in brief - those that know this can skip it):

Internal to the Core system GGP is used to communicate route information.
A GGP speaker can only say "I CAN REACH netX", not HOW. EGP on the other
hand says "I CAN REACH netX VIA gateY."  When you speak EGP to one of
the Core EGP speakers, he learns how to reach your nets VIA your
gateway.  If you ask that same EGP speaker how to get to netX you will
get the "correct" answer - gateY.  However, if you ask a *different* EGP
speaker, his knowledge of the network in question came via GGP in which
the first Core EGP speaker simply said "I CAN REACH netX."  The HOW
part, i.e. the gateway that advertised netX in the first place has been
dropped (due to this GGP limitation).  Thus someone receiving this
information will end up (needlessly) sending packets to the Core EGP
speaker netX was advertised to, rather than to the gateway that
advertised it.

How To Avoid the Problem:

You can prevent others from getting extra-hop routes to YOU by advertising
your nets to all available EGP speakers.  You can avoid getting extra-hop
routes to someone else by polling all available EGP speakers for routes
and favoring those routes that DON'T point to an EGP speaker.  [The real
solution of course is to fix GGP.]

Of course if everyone did the above the EGP speakers would be all the
more loaded.  One could also question at that point why there was more
than one EGP speaker.  One the other hand, licking the extra hop problem
might get a lot of unnecessary non-EGP traffic off of the EGP speakers.
It's hard to tell where the balance would lie.  [It is interesting to
note in the recent timetable how the EGP speakers were upgraded before
most of the mailbridges were.]

My apologies for such a long winded answer, but has been a long time
since anyone discussed this problem on this list.

- Phil
<phil@brl.arpa>