mike@BRL.ARPA (Mike Muuss) (12/12/85)
Sirs - I am writing this letter to bring to your attention a serious operational problem with the CORE gateway system which provides routing connectivity between the ARPANET, MILNET, SATNET, and all LANs within the InterNet system. Briefly stated, the problem is that the current core gateway software only has room for a fixed number of routes between networks, currently about 100. (I'll call these routing table entries "slots"). Within the past few weeks, the number of networks (mostly LANs) connected to the InterNet system has exceeded the number of slots, resulting in a shortage of slots. Attempts to provide routing information to the core system are processed only as slots become available -- on a first-come, first-served basis. Some gateway somewhere has to crash to relinquish a slot for another gateway to gain connectivity. MAJOR FAILURE IN OPERATIONAL SYSTEM. This past weekend, due to an extensive power outage, both of BRL's gateways were down, relinquishing the slots we had been using. BRL's IMP resumed operating Sunday night, and BRL's 2 Gateways resumed operation Monday morning, but BRL was completely without network connectivity throughout the day Monday as we waited for slots to become available within the core gateway system. Lack of slots prevented any access to or from the MILNET, blocking mail flow between BRL and AMC-HQ, USNA, ARDC, WSMR, and the numerous other hosts we do regular business with. Fortunately, other gateways went down through the day, and by Monday evening BRL had reacquired routing slots. A one-day network outage was no disaster, and we survived. However, if we loose network connectivity for a day or more every time our gateways or IMP go down, BRL has a major operational problem. Unless corrective action is taken, this problem will steadily become worse, because more and more MILNET sites will be operating attached LANs, and traffic is shifting from directly attached hosts to LAN-attached hosts. BRL feels the effect of this problem more keenly because BRL hosts are exclusively LAN-attached. However, all LAN-attached hosts within the InterNet system are affected by this problem! This problem was also encountered a few months ago, and BBN responded promptly by increasing the number of slots to the current limit. BBN is aware of the current problem, and is investigating solutions. However, they may not be able to increase the table sizes this time, due to limited memory in the core gateways. The medium-term solution to this problem would be to replace all LSI-11/03 core gateways with LSI-11/23 gateways, which have 4 times as much memory. I am under the impression that BBN has already developed software which takes advantage of the extra memory in the 11/23. The long-term solution is, of course, to replace the core gateway system with Butterfly gateways, but that is a long time away. SHORT-TERM SOLUTION NEEDED. There are several options. 1) Take administrative action. Insist that the most recent N new networks connected to the InterNet system immediately disconnect, until the number of available slots can be increased. 2) Provide a technological response. Instituting emergency measures, rapidly replace the core gateway system with 11/23 systems. 2a) Have BBN immediately upgrade all 11/03 systems withing the GGP core. 2b) If BBN does not have necessary equipment on hand, or en route, additional 11/23 system could be borrowed. For example, BRL has an 11/23 system which is temporarily not being used. BRL would be willing to loan it to DCA on a short term basis until BBN could procure the necessary 11/23 hardware. Certainly there are enough unused 11/23 systems throughout the combined Services that an immediate hardware solution could be implemented using loaned equipment. 3) Apply software magic, and increase the current table size without changing any hardware. This may be easy, but more likely it will be costly in time, costly in manpower, or simply impossible. MEDIUM-TERM DISASTER AWAITS. Even assuming that the current difficulty can be overcome, this problem will reappear again soon in another form. Indeed, the second stage of this problem is almost upon us. Here, the difficulty is again a growth limitation in the core gateway software. The core exchanges routing information between it's gateways using GGP (Gateway-to-Gateway Protocol). There exists an upper limit on the length of a GGP packet, and GGP is currently defined so as to contain information about the total InterNet system in a single packet. Thus, when the number of gateways increases beyond the number that can fit in a GGP packet, we will again experience competition for "slots" -- this time GGP packet "slots". Again, several solutions exist: 1) Administratively prohibit connecting more LANs than the GGP protocol can support. 2) Modify or extend the GGP protocol and the supporting core gateway software to ease or eliminate the current limits. 3) Replace the GGP protocol with something else (no finished design for a replacement exists yet, although it is being thought about). 3a) Replace GGP within the existing 11/23 systems with the new protocol. 3b) Replace all the 11/23 systems with Butterfly systems and the new protocol. Current plans for GGP replacement are being formed within BBN and the GADS Task Force (chaired by the able Dave Mills). I would like to suggest that the priority of this task be elevated, and that it's funding be increased. Investing in an extra man-year now might give us a long-term solution to this problem before disaster strikes. (I might also point out that the GADS Task Force is presently operating with little or no funding). Either GADS or BBN must get switched into "high gear" to solve this problem. SUMMARY. The "lack of slots" problem is upon us. Serious operational failures have already been experienced, and the problem will be getting worse. A short term solution is needed. Several options are available, none expensive. Worse, a secondary form of the problem will strike soon, even if we weather the current storm. Solutions can be found, but all will require effort and money. Spending money takes time, so we need to worry now. Sincerely, Mike Muuss Leader, Advanced Computer Systems Team U. S. Army Ballistic Research Lab
Geoff@SRI-CSL.ARPA (the tty of Geoffrey S. Goodfellow) (12/14/85)
With respect to gateway slots filling up, can anyone explain, why in this day and age of endless approvals and OK's from upon high for the trivialest of things, such as controlling net access down to the RS-232 terminal port level on TACs its possible for anyone to just "plug a gateway in" and your up? it's my impression that to join in the internet club these days you just find a friendly site that lets you plug into their local net and you then EGP your existence out to the world. None of this paper work "stuff" like you need to do now on enabling a TAC ports to connect a simple terminal up to! Can anyone explain why there is such control over hooking terminals onto TAC's when there is no control over hooking gateways onto the Internet? Has anyone dumped the gateway routing tables just to see what the difference between "who is out there" and "who is authorized to be connected out there is"? What prevents anyone else from adding on? g
mike@BRL.ARPA (Mike Muuss) (12/14/85)
The the response form issued with a new network number includes a statement about how your network can not be connected to the core without prior approval from the NIC, and that you also need to become part of a registered Autonomous System. However, there is presently no room for either the code or data needed to validate A.S. numbers in the core gateways, so there is nothing which prevents people from just plugging in. The source of the current problem is that now that 4.2BSD UNIX is capable of being a full EGP gateway, lots of people are getting LANs and connecting them. Indeed, the single most common gateway system on the InterNet these days is 4.2BSD, somewhat to the surprise of the original networking folks. Implementing subnets will take some of the strain off (if we can do it fast enough), but converting to subnet numbers requires a massive change in all local host addresses. Also, for those of us who purchase TCP-speaking devices (like laser printers, LISP machines, etc) from random vendors, we must depend on the vendor to implement subnet support. As the RFC documenting the subnet strategy is fairly recent, not all vendors have taken notice yet. Some vendors (most notably Excelan) are still struggling with things like IP routing and ICMP (sigh), and their boards are found in many current "off the shelf" products. We at BRL are working towards reducing the number of network numbers we require, but currently I expect it to take us another month or two to really make progress in this direction; others will need similar time to undertake implementing subnet routing within their gateways, and then convert their hosts. I would wager that the subnet tide will not turn until 4.3BSD is in widespread use. Even if 4.3 tapes were to teleport out to all 4.2 sites tomorrow, it would take most sites a month or two to switch, so 4.3 is not the cure to our immediate woes. I predict that if everything goes well, and all the core gateways are enhanced to 11/23 systems, and most sites drop back to using just one or two net numbers, that we might just barely survive the continued InterNet growth until the GGP replacement (core IGP, really) is designed and implemented. Maybe. Best, -Mike
martin%blade@MOUTON.ARPA (Martin J Levy) (12/14/85)
as an extra note to the one about TAC terminal connections and gateways connections being respectively both hard and easy, i would also like to ask the question: "why are core gateways still 11/03's and not 11/23's or even 73's?" in these days, with the reduced cost of these processors and memory, and even with the software knowledge of how to program the memory mapping registers of these processors why is there still a restriction on the numbers of network numbers that can be held in the gateways memory. if i remember correct the C-GATE and the BRL gateways both use memory mapped code. please don't take this as a vote against subnetting, but more of a note that is worried about what will happen later, what the number of networks goes up. the subnet solution will help big sites (like us), but not with lots of small sites, where one cable is all they have anyway. martin levy. bellcore, nj.
HEDRICK@RED.RUTGERS.EDU (Charles Hedrick) (12/16/85)
There have been a couple of messages implying that you can just plug into the Internet. We connected a gateway a few months ago. At that time, it was necessary to get approval to connect a network to the Internet. It was also necessary to get approval to change which machine acted as a gateway. As you probably know, there is a separate process to apply for an Internet network number. It is interesting in the context of this discussion that the application implies that the authorities would rather give you a class C address or a range of class C addresses than a single class B address. In fact an industrial group which I have been working with did get a range of class C addresses. (They have no immediate plans to connect to the Internet.) -------