tcp-ip@ucbvax.ARPA (07/25/85)
From: the tty of Geoffrey S. Goodfellow <Geoff@SRI-CSL.ARPA> For the last few months we have noticed a dreadful condition that seems to strike with a good deal of regularity when using a MILNET TAC to connect to an ARPANET Host. The same thing also happens when using a local network host gatwayed into the ARPANET which in turn ends up at a MILNET TAC. Specifically, this has to do with interactive "links" where two users are TALKing to one another and there are single character packets in both directions. The symptom is that the person at the TAC's output goes into molasses mode where they receive a character from the host once every second or so. This happens on two different operating systems (Tenex and TOPS-20), and as i said with directly connected ARPANET hosts as well as those behind a local network gateway. Any ideas was is exacerbating this situation? Anyone else out there experienced it? g
tcp-ip@ucbvax.ARPA (07/25/85)
From: LARSON@SRI-KL.ARPA I published a discussion of this problem on the tops-20 mailing list a month or two ago. The situation seems to result from round trip time estimates being calculated incorrectly. I have a 'fix' that seems to make things better. It is installed on SRI-KL. Alan -------
tcp-ip@ucbvax.ARPA (07/25/85)
From: Mark Crispin <MRC@SIMTEL20.ARPA> Welcome to the club. This problem is with much more than links, as just about anybody who does interactive character-at-a-time traffic between MILNET and ARPANET has found out. I've taken to putting my TELNET into local echo mode and using the local line editor to compose messages line at a time. I don't even try to do any editing across the gateways any more. I believe BBN is doing some work on the problem. -- Mark -- -------
tcp-ip@ucbvax.ARPA (07/25/85)
From: "J. Noel Chiappa" <JNC@MIT-XX.ARPA> It's not MILNET <-> ARPANet. I've been pissing and wailing about this on MIT-XX since 1980; at that point traffic from my machine to MIT-XX went in via a LAN connected to my machine and a front end PDP11 on the 20; no 1822 nets at all. It still happens, although the configuration is a little different now. There's still no ARPANET<->MILNET gateway, though. It seems to happen whenever I type any distance ahead of the echoing. Noel -------
tcp-ip@ucbvax.ARPA (07/26/85)
From: mgardner@BBNCC5.ARPA BBN is well aware of the problems and are working on them. --Marianne
tcp-ip@ucbvax.ARPA (07/26/85)
From: CERF@USC-ISI.ARPA Geoff, I wonder if it is possible that the Tenex and TOPS-20 or the TAC TCP reacts to source quenches which are likely when sending many short packets by throttling back on packet output rate?? Vint
tcp-ip@ucbvax.ARPA (08/09/85)
From: mgardner@BBNCC5.ARPA Lixia, It is not easy to give a brief answer to your question of what exactly are the problems with the mailbridges, but I will do my best. Gateways are inherently bottlenecks to traffic between two networks. For example, ARPANET and MILNET are reliable networks, but their traffic is funneled through gateways designed to drop data whenever pressed for space. Retransmission at the link level is fast, because the retransmission timer triggers a retransmission fairly quickly. The retransmission timers at the transport layer must be slower, and so retransmission by TCP will affect what the user sees. The interactive user is, of course, most likely to notice. Speeding up this timer, by the way, is not a good solution, since the effect is increased congestion and poorer service for everyone. (More dropped datagrams, more retransmitted datagrams.) Another reason that the internet will never function as well as a subnet is that the gateways link heterogeneous systems. If one side is sending much faster than the other side is receiving, the gateways are designed to drop datagrams. This problem are exacerbated by the current lack of buffer space in the LSI/11s, by the lack of an effective means of slowing down a source, and by a rudimentary routing metric that does not allow routing to respond to bursts in traffic. The mailbridges are a worse bottleneck than other gateways for several good reasons. First they were placed with the idea that the traffic between them would be filtered for mail. We expected a reduction in traffic. On the contrary, since the physical split of ARPANET and MILNET, there has been a sharp rise in the amount of traffic between the two networks. The bridges are overloaded. In addition, there are a number of hosts which send almost all their traffic to the other net. These hosts may be on the wrong network. A third problem for the mailbridges is load-sharing. It is important that the traffic between the two networks be spread among the different mailbridges. This is the function of the load-sharing tables. But this is a static routing, based on expected traffic. Since the destination is not known, the routing most likely to provide good service is to home a host to its nearest mailbridge. However, when the host has a one or two hop path on one side of the mailbridge and a five or six hop path on the other side, the mailbridge will see speed mismatch problems, similar to those associated with mismatched network speed. The solution is not to ignore the load-sharing, since, everyone sending to the same bridge will create even worse problems. These are the problems we see in a perfect world where hardware and software problems have been banished. Unfortunately, we live in the real world. The software and hardware problems themselves can be in the hosts, the lines, or the network. They are usually hard to diagnose, since the symptom of the problem, for example congestion, may be physically remote from the source of the problem. It is often not even clear where in the chain the problem lies. For example, is congestion at an ISI IMP caused by the mailbridge, by ARPANET congestion around ISI, by back-up from a local net, by ARPANET congestion remote from ISI, by a host at another IMP, or by still another factor? I look at mailbridge statistics every day. I see, almost daily, the effects of host problems. Although these problems are most often caught by the host administrators, and, if not, are tracked by our monitoring center, let me list a few of the problems that I followed personally. I have seen a run-away ethernet bring MILISI to its knees, a gateway with a routing plug cause congestion felt by a host on the other side of the network, and three cases of hosts flooding the network with faulty IP datagrams. The internet is pathetically vulnerable to congestion caused by a single host. At BBN we have a number of tools to monitor the long-range performance of the internet. The gateways send messages, called traps, any time an event of interest occurs. We summarize these on a daily basis, and keep the detailed trap reports on hand for use when we see a problem. The gateways store throughput information, including how many datagrams were processed by each gateway, summarized for the gateway, and separated by interface or neighbor. Throughput reports give us detailed information, such as how many datagrams are dropped (discarded) by the gateway, broken down by reason, and the number of datagrams sent back out the same interface they used on arrival. We can also collect statistics on the number of datagrams between each source and destination host. In addition, we can measure a wide range of parameters in ARPANET or MILNET. These include detailed throughput statistics, statistics about the end-to-end traffic and about the store-and-forward traffic. But even with all these tools (and others) at our disposal, we are stopped at the host. There we find TCP/IP implementations written by many different people and containing subtle differences in interpretation that could lead to major problems. Given this range of sources for the problems, what can we, at BBN, do to improve the situation? Keep in mind that we affect the mailbridges, the IMPs, and, since we monitor the lines, the line quality, but we can only open a discussion concerning host problems. Analysis of the host to mailbridge traffic data, has revealed that there are a number of hosts (including TACs) sending most of their traffic to the other net. Some of this traffic can be moved off the internet, reducing the load, by the addition of TACs and rehoming hosts. We are considering adding a mailbridge. Software to increase the number of buffers in the LSI/11 gateways has already been written. We are investigating ways to reduce the control traffic, which should also reduce the load on the mailbridges. We have increased our attention to host problems and are notifying the host administrator when see problems. We are also considering writing guidelines for optimizing communication with ARPANET/MILNET. This would include appropriate settings for retransmission timers and sending rates. It should also include guidelines for reasonable responses to source quenches, those largely ignored messages sent by the gateway to a host which is sending data too fast. I hope this answers your question and will open up some interesting discussion on this mailing list. Marianne
tcp-ip@ucbvax.ARPA (08/12/85)
From: Charles Hedrick <HEDRICK@RUTGERS.ARPA> To complicate things, the host administrators often don't know that much about how their software works. When somebody posts a message on the net saying that some horrible thing is causing some inconceivable result, I have no way of knowing whether any of my hosts are contributing. I run TOPS-20, Unix, and Eunice TCP's, and I do not know the details of any of the TCP implementations. (With Eunice I do not even have access to the source.) If you sent me a patch and told me to install it, I would, but if you asked me whether my retransmission gizmo was frabulating the gateway matter-antimatter mix, I would have no way to respond. I'm not sure quite what you can do about this, but in some ways it may make your problem easier. What you probably need is one or two knowlegable sites for each OS. Then you could download fixes they develop to the rest of us. You will also have to find a stick big enough to get these fixes put into the next release from the vendor. Maybe DCA could arrange to have Norad point a few missles in the direction of <name omitted to protect the guilty, of which there are several>. One problem that is making this more complex is that the natural experts on TOPS-20 TCP are ISI and BBN, but their code has diverged from the code supported by DEC and used by the less sophisticated sites such as ourselves. This is an area that seems particularly amenable to the use of stategic weaponry. Whether the missles should be pointed at Marlboro or Cambridge and California is a decision I would be happy to leave up to you. (There are some unpleasant politics hiding behind the surface here, which I am going to avoid talking about in public, at least at the moment.) -------
tcp-ip@ucbvax.ARPA (08/12/85)
From: Dan <LYNCH@ISIB> Charles, Your displeasure at some combination of ISI/BBN/DEC for the sorry state of affairs in TCP updates/maintenance is noted. Since I was in the middle of that menage for a few years I can shed some light (and dark?) on the subject. There are two main issues: 1) Money 2) Research Take the "research" issue first. Many of the "problems" seen in TCP usage are truly complicated and need to be examined carefully in the diverse internet enviornment. That brings up "money"... DEC gets money for selling machines (and attendant software). BBN gets money for doing research on networking (and for operating some networks). ISI gets money for running systems and keeping customers content. The above simplifications are accurate enough for this diatribe. The major flaw in the above division of effort is that the vendor, DEC, does not spend enough money on making a great TCP for TOPS20. They do not live in the Internet environment on a daily basis. I am sure that they do a much better job with DECNET because they live in that environment daily. And make money on it. As for BBN, they have many fish to fry these days and have been known to refuse to work on a problem unless they got paid for it. ISI (where I was located from 1980-1983) basically gave up on both DEC and BBN as timely sources of help in resolving vexing performance and functionaility problems. We relied on them heavily for longer term solutions while we tried to keep our systems on the air for our thousands of users. ISI would readily give out its code to anyone who had a source license from DEC. Of course the recipient would have to take out our ISI site specific enhancements to get a running system... And we did not have a lot of time/energy to promulgate and assist others in the quest of a stable, high performance TCP. That's a short recap in history. What did we learn and what can we do better in the future? We learned that Internetting is very complex, that declaring something to be a product does not make it so, and that money is the root of all good. I'd better cut it short on the "future" part. Since TOPS20 is dying I don't see much impetus (money) for improving the mechanisms in that arean. But Unix sure ain't dead nor is VMS. If improvements are to be readily produced and distributed then I suggest that some entity be formed (or identified as existing) and funded to do a quality job for all internet users. Laissez faire just doesn't cut it. Dan PS. I have been entering this via a Milnet TAC to the Arpanet host at ISIB and have held my breath until now! Geoff, thank you for airing this subject. The stuttering and delays are awesome.