BILLW@SU-SCORE.ARPA (William "Chops" Westfield) (09/27/86)
Both repsonse and throughput between Stanford and SRi is pretty awful, and they are only one IMP apart. Trying to FTP a file from a host that is further away seems nearly impossible. Is this just a local problem, say with the Stanford IMP, or are other people having similar problems? Note that this is NOT a gateway related problem, since for many of the paths Ive tried, no gateways should be involved. BillW -------
SRA@XX.LCS.MIT.EDU (Rob Austein) (09/28/86)
Bill, No, the ARPANET problem is definitely not just at Stanford. MIT has been moderately crippled by this for weeks now (since the start of the fall semester, which is probably -not- a coincidence). MC and XX have a hard time talking to each other and they are on the same IMP. The NOC claims that this is true for pretty much the entire ARPAnet. Apparently MILNET is somewhate better off. The NOC is refering to this mess as a "congestion problem" at the IMP level. The current theory the last few times I talked to the NOC was that we have managed to reach the bandwidth limit of the existing hardware. A somewhat scary thought. If this is in fact the case (and there is circumstancial evidence that it is, such as the fact that the net becomes usable again during off hours), we are in for a long siege, since it is guarenteed to take the DCA and BBN a fair length of time to deploy any new hardware or bring up new trunks. Current thoughts and efforts at MIT are (1) we need more data on the traffic going through the IMPs, and (2) we need to cut down on the amount of traffic going through the IMPs. The two go along with each other to some extent (preliminary results show that roughly 25% of the traffic through the MIT gateway is to or from XX). Some interesting ideas have come up for minimizing load due to email, if that turns out to be a prime offender (surprisingly, the preliminary statistics don't seem to indicate that). If there is anybody else out there doing analysis of network traffic, please share it. Also, if there is anybody from BBN who knows more about the problem and is willing to share it, -please- do. It's hard to make any kind of contingency plans in a vacuum. --Rob
dave@RSCH.WISC.EDU (Dave Cohrs) (09/28/86)
I don't know why it's so bad, but no, it is *not* a localized problem. Hosts at UW-Madison are also having problems reaching hosts farther away than our local PSN. The worst problems (of course) are reaching hosts on the east coast, especially Rutgers and CSS.GOV sites. The problem seems to be time/day-of-the-week related, so I assume it's a congestion problem (response time seems pretty good right now), but I'm not a net-watcher, so don't take that as gospel. There also seem to be some severe routing problems. On one occation this past week, the packet turnaround time from our gateway to the CSS gateway (10.0.0.25) was about 1sec, while one hop farther, from a host on our Pronet to 10.0.0.25 was about 8sec with peaks of over 20sec and many packets were lost. Actually, one site in the Bay Area has started setting up new UUCP links (using good ol' dialup connections) to make sure that their mail will get through. dave
hedrick@TOPAZ.RUTGERS.EDU (Charles Hedrick) (09/28/86)
We apologize for the problems we have caused other sites. I am well aware that Rutgers is among the hardest places to reach. This is a combination of our 9600 baud line into the IMP, and continual crashes of our gateway. We have now replaced our gateway with a gateway from Cisco. It is based on a 68000. It appears to be more reliable than the old 11/23 code we were using before, and has much better tools to monitor what is going on and adjust things. We think that the reliability problems will largely go away, except for TCP protocol problems with individual hosts on our network. Early results suggest that the 9600 baud line has enough bandwidth to keep mail and news flowing. We have long since given up on telnet, though at some times of the day even that may now be practical. We are also exploring an upgrade of the line speed.
karn@MOUTON.BELLCORE.COM (Phil R. Karn) (09/28/86)
I wonder how much of the existing congestion problems would go away if DARPA banned all 4.2BSD sites from the net until they convert to 4.3? Phil
ron@BRL.ARPA (Ron Natalie) (09/28/86)
It may not use gateways, but the ping wars between the BBN gateways impact all net performance as their random behaviour wreaks havoc with the IMPs virtual circuit set up time. -Ron
Lixia@XX.LCS.MIT.EDU (Lixia Zhang) (09/29/86)
The following replies to two internet congestion related messages together. Date: Sat, 27 Sep 1986 21:35 EDT From: Rob Austein <SRA@XX.LCS.MIT.EDU> Subject: Why is the ARPANet in such bad shape these days? ...... The NOC is refering to this mess as a "congestion problem" at the IMP level. The current theory the last few times I talked to the NOC was that we have managed to reach the bandwidth limit of the existing hardware. A somewhat scary thought... Could someone from BBN provide measured network throughput numbers to convince us that we indeed have hit the HARDWARE bandwidth limit? ...If this is in fact the case (and there is circumstancial evidence that it is, such as the fact that the net becomes usable again during off hours), we are in for a long siege, since it is guarenteed to take the DCA and BBN a fair length of time to deploy any new hardware or bring up new trunks. Better performance during off hours surely indicates that the problem is network load-related, but does not necessarily mean that the DATA traffic has hit the hardware limit -- there is a large percentage of non-data traffic flowing in the net. According to the measurement on a number of gateways, in the week of 9/15-9/21, (as more or less the same for all time) 43% of all received packets are addressed to a gateway 48% of all sent packets originate at a gateway Presumbly these gateway-gateway packets are routing updates, ICMP redirects, etc. But why should they take such a high percentage of the total traffic? Can someone explain to us? Even for data packets, I wonder if anyone has an idea about how much extra traffic is generated by the known extra-hop routing problem. More on this later. ALSO, IF THERE IS ANYBODY FROM BBN WHO KNOWS MORE ABOUT THE PROBLEM AND IS WILLING TO SHARE IT, -PLEASE- DO. IT'S HARD TO MAKE ANY KIND OF CONTINGENCY PLANS IN A VACUUM. --Rob I capitalized the sentence, hoping no one would pretend not seeing it. Date: Sun, 28 Sep 86 04:48:39 edt From: hedrick@topaz.rutgers.edu (Charles Hedrick) Subject: odd routings I have been looking at our EGP routings. I checked a few sites that I know we talk to a lot. Our current EGP peers are yale-gw and css-ring-gw. (We keep a list of possible peers, and the gateway picks 2. It will change if one of them becomes inaccessible. This particular pair seems to be fairly stable.) Here I what I found: ...... MIT: They seem to have 4 different networks. The ones with direct Arpanet gateways are 18 (using 10.0.0.77) and 128.52 (using 10.3.0.6). EGP was telling us to use 10.3.0.27 (isi) and 10.2.0.37 (purdue) respectively... This is probably caused by the EGP extra-hop problem: if MIT gateways are EGP neighboring with isi and purdue gateways, all other core gateways will tell you to go through isi/purdue gateways to get to MIT, even though everyone is on ARPANET. This should be a contributor to the cognestion too. One question is: Can anyone tell us WHEN this extra-hop problem will be completely eliminated? Another question is how the stubs select core EGP neighbors; if they all concentrate on a small number of core gateways, bottlenecks will be created, because the extra-hop problem says that if a stub gw EGP-neighbors with a core gw, most traffic to the stub is likely to travel through that core gw as well. Hedrick listed their coded-in core EGP gateway candidates in his message. Is the same list used by all non-core gateways? Does someone know how many stub gateways EGP-neighbor with one core gateway? Will some stub-core rebinding help relieve the congestion? In short, reducing network overhead and fixing some long-standing protocol problems may be a way to relieve the current poor net performance. Lixia -------
swb@DEVVAX.TN.CORNELL.EDU (Scott Brim) (09/29/86)
Lixia: I've always wondered about figures like that. Aren't the overwhelming majority of the gateways on Arpanet also decent-sized hosts in their own right -- so that much of the traffic in your figures might be legitimate user traffic? Scott p.s. talk about degenerative congestion -- when the network gets slow we all start sending gobs of mail back and forth in order to improve it!
Lixia@XX.LCS.MIT.EDU (Lixia Zhang) (09/29/86)
Scott, As far as I know, the numbers in my message were from measurements (by BBN) on the pure forwarding gateways, NOT including hosts. Lixia P.S. Also talking about degenerative congestion -- if no one uses the net, surely no congestion would exist, probably nor would the net itself. With no congestion, people still send mail daily, though probably on different subjects. -------
mike@BRL.ARPA (Mike Muuss) (10/02/86)
Many sites with really large sets of LANS (including MIT and BRL) run dedicated IP gateways as their attachment to the IMPs. In these cases, all traffic on those IMP ports is either user traffic or EGP. BRL-GATEWAY and BRL-GATEWAY2 are pretty high up as the largest source of packets on the MILNET, today. When our Cray-XMP48 comes online on 2-November, I expect our MILNET trunks to melt. -Mike