[mod.protocols.tcp-ip] Gateway Slots

mike@BRL.ARPA (Mike Muuss) (12/12/85)

Sirs -

I am writing this letter to bring to your attention a serious operational
problem with the CORE gateway system which provides routing connectivity
between the ARPANET, MILNET, SATNET, and all LANs within the InterNet system.
Briefly stated, the problem is that the current core gateway software only
has room for a fixed number of routes between networks, currently about 100.
(I'll call these routing table entries "slots").

Within the past few weeks, the number of networks (mostly LANs)
connected to the InterNet system has exceeded the number of slots,
resulting in a shortage of slots.  Attempts to provide routing information
to the core system are processed only as slots become available -- on a
first-come, first-served basis.  Some gateway somewhere has to crash to
relinquish a slot for another gateway to gain connectivity.

MAJOR FAILURE IN OPERATIONAL SYSTEM.

This past weekend, due to an extensive power outage, both of BRL's gateways
were down, relinquishing the slots we had been using.  BRL's IMP resumed
operating Sunday night, and BRL's 2 Gateways resumed operation Monday
morning, but BRL was completely without network connectivity throughout the
day Monday as we waited for slots to become available within the core
gateway system.  Lack of slots prevented any access to or from the MILNET,
blocking mail flow between BRL and AMC-HQ, USNA, ARDC, WSMR, and the
numerous other hosts we do regular business with.  Fortunately, other
gateways went down through the day, and by Monday evening BRL had reacquired
routing slots.  A one-day network outage was no disaster, and we survived.
However, if we loose network connectivity for a day or more every time our
gateways or IMP go down, BRL has a major operational problem.

Unless corrective action is taken, this problem will steadily become worse,
because more and more MILNET sites will be operating attached LANs, and
traffic is shifting from directly attached hosts to LAN-attached hosts.  BRL
feels the effect of this problem more keenly because BRL hosts are
exclusively LAN-attached.  However, all LAN-attached hosts within the
InterNet system are affected by this problem!

This problem was also encountered a few months ago, and BBN responded
promptly by increasing the number of slots to the current limit.  BBN
is aware of the current problem, and is investigating solutions.  However,
they may not be able to increase the table sizes this time, due to limited
memory in the core gateways.  The medium-term solution to this problem
would be to replace all LSI-11/03 core gateways with LSI-11/23 gateways,
which have 4 times as much memory.  I am under the impression that BBN
has already developed software which takes advantage of the extra memory
in the 11/23.  The long-term solution is, of course, to replace the core
gateway system with Butterfly gateways, but that is a long time away.

SHORT-TERM SOLUTION NEEDED.

There are several options.

1)  Take administrative action.  Insist that the most recent N new
networks connected to the InterNet system immediately disconnect, until
the number of available slots can be increased.

2)  Provide a technological response.  Instituting emergency measures,
rapidly replace the core gateway system with 11/23 systems. 

2a)  Have BBN immediately upgrade all 11/03 systems withing the GGP core.
2b)  If BBN does not have necessary equipment on hand, or en route,
additional 11/23 system could be borrowed.  For example, BRL has an 11/23
system which is temporarily not being used.  BRL would be willing to loan it
to DCA on a short term basis until BBN could procure the necessary 11/23
hardware.  Certainly there are enough unused 11/23 systems throughout the
combined Services that an immediate hardware solution could be implemented
using loaned equipment.

3) Apply software magic, and increase the current table size without
changing any hardware.  This may be easy, but more likely it will be costly
in time, costly in manpower, or simply impossible.

MEDIUM-TERM DISASTER AWAITS.

Even assuming that the current difficulty can be overcome, this problem will
reappear again soon in another form.  Indeed, the second stage of this
problem is almost upon us.  Here, the difficulty is again a growth limitation in
the core gateway software.  The core exchanges routing information between
it's gateways using GGP (Gateway-to-Gateway Protocol).  There exists an upper
limit on the length of a GGP packet, and GGP is currently defined so as to
contain information about the total InterNet system in a single packet.
Thus, when the number of gateways increases beyond the number that can fit
in a GGP packet, we will again experience competition for "slots" -- this
time GGP packet "slots".

Again, several solutions exist:

1)  Administratively prohibit connecting more LANs than the GGP protocol
can support.

2)  Modify or extend the GGP protocol and the supporting core gateway
software to ease or eliminate the current limits.

3)  Replace the GGP protocol with something else (no finished design for
a replacement exists yet, although it is being thought about).

3a)  Replace GGP within the existing 11/23 systems with the new protocol.
3b)  Replace all the 11/23 systems with Butterfly systems and the new
protocol.

Current plans for GGP replacement are being formed within BBN and the GADS
Task Force (chaired by the able Dave Mills).  I would like to suggest that
the priority of this task be elevated, and that it's funding be increased.
Investing in an extra man-year now might give us a long-term solution to
this problem before disaster strikes.  (I might also point out that the GADS
Task Force is presently operating with little or no funding).  Either GADS
or BBN must get switched into "high gear" to solve this problem.

SUMMARY.

The "lack of slots" problem is upon us.  Serious operational failures have
already been experienced, and the problem will be getting worse.  A short
term solution is needed.  Several options are available, none expensive.

Worse, a secondary form of the problem will strike soon, even if we weather
the current storm.  Solutions can be found, but all will require effort and
money.  Spending money takes time, so we need to worry now.

	Sincerely,

	 Mike Muuss
	 Leader, Advanced Computer Systems Team
	 U. S. Army Ballistic Research Lab

Geoff@SRI-CSL.ARPA (the tty of Geoffrey S. Goodfellow) (12/14/85)

With respect to gateway slots filling up, can anyone explain, why
in this day and age of endless approvals and OK's from upon high
for the trivialest of things, such as controlling net access down
to the RS-232 terminal port level on TACs its possible for anyone
to just "plug a gateway in" and your up?

it's my impression that to join in the internet club these days you
just find a friendly site that lets you plug into their local net
and you then EGP your existence out to the world.  None of this
paper work "stuff" like you need to do now on enabling a TAC ports
to connect a simple terminal up to!

Can anyone explain why there is such control over hooking
terminals onto TAC's when there is no control over hooking
gateways onto the Internet?

Has anyone dumped the gateway routing tables just to see what the
difference between "who is out there" and "who is authorized to
be connected out there is"?  What prevents anyone else from adding on?

g

mike@BRL.ARPA (Mike Muuss) (12/14/85)

The the response form issued with a new network number includes a statement
about how your network can not be connected to the core without prior
approval from the NIC, and that you also need to become part of a registered
Autonomous System.

However, there is presently no room for either the code or data needed to
validate A.S.  numbers in the core gateways, so there is nothing which
prevents people from just plugging in.

The source of the current problem is that now that 4.2BSD UNIX is capable of
being a full EGP gateway, lots of people are getting LANs and connecting
them.  Indeed, the single most common gateway system on the InterNet these
days is 4.2BSD, somewhat to the surprise of the original networking folks.

Implementing subnets will take some of the strain off (if we can do it fast
enough), but converting to subnet numbers requires a massive change in all
local host addresses.  Also, for those of us who purchase TCP-speaking
devices (like laser printers, LISP machines, etc) from random vendors, we
must depend on the vendor to implement subnet support.  As the RFC
documenting the subnet strategy is fairly recent, not all vendors have taken
notice yet.  Some vendors (most notably Excelan) are still struggling with
things like IP routing and ICMP (sigh), and their boards are found in many
current "off the shelf" products.

We at BRL are working towards reducing the number of network numbers we
require, but currently I expect it to take us another month or two to really
make progress in this direction; others will need similar time to undertake
implementing subnet routing within their gateways, and then convert their
hosts.  I would wager that the subnet tide will not turn until 4.3BSD is in
widespread use.  Even if 4.3 tapes were to teleport out to all 4.2 sites
tomorrow, it would take most sites a month or two to switch, so 4.3 is not
the cure to our immediate woes.

I predict that if everything goes well, and all the core gateways are
enhanced to 11/23 systems, and most sites drop back to using just one or two
net numbers, that we might just barely survive the continued InterNet growth
until the GGP replacement (core IGP, really) is designed and implemented.
Maybe.

	Best,
	 -Mike

martin%blade@MOUTON.ARPA (Martin J Levy) (12/14/85)

as an extra note to the one about TAC terminal connections and gateways
connections being respectively both hard and easy, i would also like to
ask the question:

"why are core gateways still 11/03's and not 11/23's or even 73's?"

in these days, with the reduced cost of these processors and memory,
and even with the software knowledge of how to program the memory
mapping registers of these processors why is there still a restriction
on the numbers of network numbers that can be held in the gateways
memory. if i remember correct the C-GATE and the BRL gateways both use
memory mapped code.

please don't take this as a vote against subnetting, but more of a note
that is worried about what will happen later, what the number of
networks goes up. the subnet solution will help big sites (like us),
but not with lots of small sites, where one cable is all they have
anyway.

martin levy.
bellcore, nj.

HEDRICK@RED.RUTGERS.EDU (Charles Hedrick) (12/16/85)

There have been a couple of messages implying that you can just plug
into the Internet.  We connected a gateway a few months ago.  At that
time, it was necessary to get approval to connect a network to the
Internet.  It was also necessary to get approval to change which machine
acted as a gateway.

As you probably know, there is a separate process to apply for an
Internet network number.  It is interesting in the context of this
discussion that the application implies that the authorities would
rather give you a class C address or a range of class C addresses
than a single class B address.  In fact an industrial group which I have
been working with did get a range of class C addresses.  (They have no
immediate plans to connect to the Internet.)

-------