rick@SEISMO.CSS.GOV (Rick Adams) (10/31/86)
I can't help but wonder if the poor internet performance is related to the HORRIBLE routes that egp says to use. I have seen improvements in round trip icmp echo times of 1000% by ignoring the route egp says to use and manually forcing a route into the system. In many cases, it is the difference between connecting at all and timing out. Todays horrible case has been routing to 128.96 (bellcore.com) through lbl-milnet-gw instead of the rational relay.cs.net. Other horrible routes have included rutgers through purdue instead of the direct rutgers arpanet host. Are the egp routes supposed to be reasonable? I'm not that familiar with the theory behind them, but in practice, the suck badly. When I can get a 10 to 1 performance improvement by hard coding specific routes to override egp, I wonder if this is part of the internet congestion problem. It seems like a major performance gain for everyone could be realized by having the egp core systems advertise rational routes. ---rick
braden@ISI.EDU (Bob Braden) (10/31/86)
The examples you cite of "horrible" EGP routing are probably due to the extra-hop problem in the core. Apparently we have not done an adequate job of information-spreading, if you are not aware of this problem. I seem to recall a blaze of messages on this very subject within the past 6 months, probably on the tcp-ip list. It began with a complaint almost identical to yours, and ended with a scholarly explanation of the extra-hop problem by Dave Mills. The extra-hop problem can at worst double the core traffic, and it is scheduled to go away when the Butterflies take over the core. I forget the exact predicted date from BBN, but rescue is in sight. As for performance, in some funny sense EGP is (deliberately) designed for poor performance, in the sense that it is intended to server as a firewall against misbehaviour by routing domains outside the core. It is true, as Mike StJohns says, that EGP is not a routing protocol; it is also true that this fact has led to serious restrictions in topology and therefore a crash effort is being mounted to replace EGP with a routing protocol, under the direction of the INENG and INARCH task forces. However, maybe we are asking too much of EGP. Perhaps we are trying to make it a technical fix for administrative problems. To avoid bad things like oscillations and routing loops in the face of the "diversity" (to use a nice word) of the Internet as a whole, EGP or whatever replaces it will always have to use long time constants and provide some sub-optimal routes. At the present time, the Internet is growing largely by accretion of new Autonomous Systems, and this must lead to some degradation as you cross boundaries. If we want better overall performance, we need to persuade these systems to aggregate into bigger systems, each run by centralized and professional Internet management, and each using a carefully-optimized IGP. I go into all this polemic, because lately I have been exposed to an awful lot of technological optimism (ask NASA about that!) about Internetting. I wish we could convince some of the new players in the Internet game that it takes great technical sophistication and wisdom to make this stuff work well. The Anarchy Model of Internetting, while theoretically feasible due to EGP, is not really a very wise way to go. Bob Braden
JNC@XX.LCS.MIT.EDU ("J. Noel Chiappa") (11/01/86)
I seem to explain this every 2 months. The problem is not caused by EGP, which is telling you exactly what the gateway you are neighbours with is doing itself with packets to given destinations, but the routing protocol (GGP) which is used by the core gateways among themselves. It predates EGP, was not designed with the pattern of information flows that you see in EGP in mind, and is the cause of the problem. When GGP is replaced (which will probably be when the PDP11's are) the problem will magically disappear without any changes to EGP. For a more detailed explanation of the problem, look in the TCP-IP archive for a message I sent out at Thu 6 Mar 86 18:16:01-EST which goes into great detail. Just out of interest, were you on TCP-IP then? Noel -------
rick@SEISMO.CSS.GOV (Rick Adams) (11/03/86)
Let me see if I have this correct. Based on the letters I have received: There is a major problem with GGP. This has been known for a long time. There is no plan to fix it in the forseeable future. This problem "at most" doubles the load on the arpanet. Can anyone explain why this doesn't warrant immediate attention? If someone told me there was a kernel bug that "at most" wasted 50% of my CPU, I'd be quite concerned about it. I wouldn't wait for the next hardware release and hope it was fixed then. Observation indicates a 10 to 1 degradation in performance, which is not what I would expect from doubling the load. There seems to be some belief that the BBN Butterfly will be the salvation of the world. I hope the Butterfly being considered is a lot different from the Butterfly sitting about 25 feet from me (css-gateway 10.2.0.25). This particular Butterfly is one of the most unreliable things I have ever seen. It often needs to be MANUALLY (i.e. they call me up) rebooted several times per day. Waiting for a solution based on the Butterfly seems quite foolish. Especially when people are forced to install their own leased lines because the ARPANET performance is unacceptable. (We already have 2. I'm sure there are many others. I find it especially ironic that our DARPA project manager can not use the ARPANET to access our machine (unaceptable performance), but has to use a leased line) ---rick