trn@aplcen.apl.jhu.edu (Tony Nardo) (02/27/90)
For the past few weeks, links between the *.jhuapl.edu nodes and the non-MILNET community have been somewhat unstable. Today, however, is the first time that I've seen an extreme case of gateway thrashing: warper.110% traceroute uunet.uu.net traceroute to uunet.uu.net (192.48.96.2), 30 hops max, 40 byte packets 1 apl-b3-gw (128.244.3.1) 0 ms 10 ms 0 ms 2 apl-gw (128.244.1.1) 0 ms 10 ms 0 ms 3 RESTON-DCEC-MB.DDN.MIL (26.21.0.104) 290 ms MARINA-DEL-REY-MB.DDN.MIL (26.6.0.103) 320 ms 330 ms 4 MARINA-DEL-REY-MB.DDN.MIL (26.6.0.103) 690 ms CAMBRIDGE-MB.DDN.MIL (10.3.0.5) 430 ms 430 ms 5 CAMBRIDGE-MB.DDN.MIL (10.3.0.5) 740 ms * MARINA-DEL-REY-MB.DDN.MIL (26.6.0.103) 1340 ms 6 MARINA-DEL-REY-MB.DDN.MIL (26.6.0.103) 1210 ms CAMBRIDGE-MB.DDN.MIL (10.3.0.5) 2060 ms 2080 ms 7 CAMBRIDGE-MB.DDN.MIL (10.3.0.5) 990 ms MARINA-DEL-REY-MB.DDN.MIL (26.6.0.103) 1290 ms 1710 ms 8 * CAMBRIDGE-MB.DDN.MIL (10.3.0.5) 1640 ms * 9 * MARINA-DEL-REY-MB.DDN.MIL (26.6.0.103) 3010 ms 2490 ms 10 MARINA-DEL-REY-MB.DDN.MIL (26.6.0.103) 2240 ms * * 11 CAMBRIDGE-MB.DDN.MIL (10.3.0.5) 2220 ms * * 12 * CAMBRIDGE-MB.DDN.MIL (10.3.0.5) 3920 ms * 13 * * * 14 * * * 15 * * * etc. Does anyone have any insights as to how this thrashing starts? How it may be stopped? -- Tony Nardo, INET: trn@warper.jhuapl.edu, trn@aplcen.apl.jhu.edu Johns Hopkins Univ./APL UUCP: {backbone!}mimsy!aplcen!trn Quote(s) relocated to my finger .plans
curt@dtix.dt.navy.mil (Welch) (02/27/90)
In article <4790@aplcen.apl.jhu.edu> trn@aplcen.apl.jhu.edu (Tony Nardo) writes: >For the past few weeks, links between the *.jhuapl.edu nodes and the >non-MILNET community have been somewhat unstable. Today, however, is >the first time that I've seen an extreme case of gateway thrashing: > >warper.110% traceroute uunet.uu.net >traceroute to uunet.uu.net (192.48.96.2), 30 hops max, 40 byte packets > 1 apl-b3-gw (128.244.3.1) 0 ms 10 ms 0 ms > 2 apl-gw (128.244.1.1) 0 ms 10 ms 0 ms > 3 RESTON-DCEC-MB.DDN.MIL (26.21.0.104) 290 ms MARINA-DEL-REY-MB.DDN.MIL (26. >6.0.103) 320 ms 330 ms [multiple MB hops deleted] >12 * CAMBRIDGE-MB.DDN.MIL (10.3.0.5) 3920 ms * >etc. > >Does anyone have any insights as to how this thrashing starts? How it >may be stopped? We have been seeing this same problem for weeks. One minute, traceroute shows a normal route off of the MILNET through one of the mail-bridges, and the next minute, we see traceroute output like the example above. Our packets are being passed around between the mail bridges, but they never leave the MILNET/ARPANET. Whenever this gateway thrashing starts, it lasts long enough to break TCP connections. It has gotten so bad in the last week that it almost stopped our news feed. The nntp connections, when they could get started, would only last for about 3 to 5 minutes before being disconnected. For weeks, ftp and telnet connections to anywhere off of the MILNET have been terrible. They would only last a few minutes before disconnecting, and even when they were connected they were really to slow to use. For the past few months, ftp connections to non MILNET sites have been getting worse and worse. I installed traceroute a month ago in an effort to get a handle on these network problems. When I first saw this problem, I assumed that some of the mail bridges must be going down. Now, I would guess that this problem is being caused by too much traffic through the mail bridges. Who runs the mail bridges and who can tell me what's going on? What has been changing in the past few months that has caused this? Has the traffic really been increasing or has the number of gateways been decreasing? Or is something much more complex causing this problem? Why do the mail bridges bounce packets around like that? Do they really think that the best route is through the other bridge or do they use a packet routing algorithm that gives the packets to another bridge when the queue for the outbound link is full?. Who do I need to contact to get this problem resolved? Do we have to get a connection to NSFnet to get away from this problem? Is there anything we could be doing wrong to cause this? Is there anything we can change to get around this problem? Thanks in advance for any help anyone can give us. (while I can still talk to you...) Curt Welch curt@dtix.dt.navy.mil P.S. Our gateway to the MILNET is through dtrc-b1-gw.dt.navy.mil, a cisco router, MILNET address 26.22.0.81.
ron@MANTA.NOSC.MIL (Ron Broersma) (03/02/90)
I'm wondering if some of this gateway thrashing isn't related to the fact that EGP packets from the MILNET core started exceeding 4096 bytes a month or two ago. At that time, I was tracing some thrashing problems and I noticed the following symptoms. Over the course of an hour, the packets would gradually increase in size. Just as they got within 10 to 20 bytes of 4096, many of the EGP implementations would suddenly start getting checksum errors or buffer overflows because they had 4K buffers. The ones that got a few packets with bad checksums would suddenly stop peering with that core gateway. Then, all of a sudden the EGP packets out of the core would be smaller by a few hundred bytes because of many fewer peers. As the EGP players all tried to acquire a different gateway, they would not get the checksum errors for an hour or so until the packets approached 4096 bytes and they would again perform this dance-of-the-gateways. The message here is to make sure your EGP implementation can handle packets larger than 4K bytes. The most recent gated supports 8K packets as I recall. Something I had considered was to make a list of all the networks that disappeared from the EGP packets right after the "dance". Then if one could determine who announced those nets to the core you could get a handle on where the broken EGP implementations were located. There's some other strangeness going on too. We had a case this week where one site running EGP was announcing its network to the core but the core wasn't telling anybody else about it. By peering with a different mailbridge, it started working. Strange. And to top it off, the ground started shaking yesterday. But we think that is an unrelated (hardware) problem. --Ron
mcdaniel%hqeis.decnet@HQAFSC-VAX.AF.MIL ("HQEIS::MCDANIEL") (03/03/90)
Andrews AFB I N T E R O F F I C E M E M O R A N D U M Date: 02-Mar-1990 11:46am EST From: Mr Rodney McDaniel MCDANIEL Dept: HQ AFSC/SCXP Tel No: AV 858-7909 COMM 981-7909 Owner: Mr Rodney McDaniel TO: _MAILER! ( _DDN[TCP-IP@NIC.DDN.MIL] ) TO: _MAILER! ( _DDN[NIC@NIC.DDN.MIL] ) CC: _MAILER! ( _DDN[RON@MANTA.NOSC.MIL] ) CC: _MAILER! ( _DDN[CURT@DTIX.DT.NAVY.MIL] ) Subject: RE: *.JHUAPL.EDU -- SERIOUS GATEWAY THRASHING HAS ANYONE THOUGHT ABOUT CONTACTING THE FOLLOWING OFFICES RELATING TO DDN MILNET PROBLEMS: CONUS MILNET MONITORING CENTER AUTOVON 222-2268/5726 COMM: 202-692-2268/5726 EMAIL ADDRESS: DCA-MMC.DCA-EMS.DCA.MIL CONUS TROUBLE DESK (MILNET & DSNET) 1-800-451-7413 AUTOVON 231-1787 COMM: 202-486-1982 NAVY POINT OF CONTACT: THIS FOLLOW-UP PROBLEM WAS FURTHER IDENTIFIED BY A TWO NAVY.MIL SYSTEMS. NAVAL TELECOMMUNICATIONS COMMAND AUTOVON 292-0381 ATTN: N521 COMM: 202-282-0381 4401 MASSACHUSETTS AVENUE NW EMAIL: NAVTELCOM@DDN2.DCA.MIL WASHINGTON, DC 20390-5290 DCA/DDOM (B651) WASHINGTON, DC 20305-2000 MAJOR CORDER - MILNET MANAGER AUTOVON 222-7580 COMM: 202-692-7580 EMAIL ADDRESS: MILNETMGR@DDN3.DCA.MIL THIS INFORMATION IS AVAILABLE IN DDN NEWSLETTER #56, STORED ON THE NIC.DDN.MIL, <TACNEWS> MENU ITEM 6. OPTION, VIA TELENET, BY CALLING 1-800-235-3155 OR SENDING A REQUEST TO: NIC@NIC.DDN.MIL (USER ASSISTANCE) THIS NEWSLETTER PROVIDES THE POINTS OF CONTACT FOR DDN PROBLEMS. PLEASE NOTE: A NEW DDN NEWSLETTER #57 IS FORTHCOMING AND A DRAFT CAN BE OBTAINED SAME AS ABOVE OR USING FTP AND REQUESTING NETINFO:WHO-DDN.TXT AND PROVIDES ALL THE DCA DDN PROGRAM OFFICE FUNCTIONS AND PERSONNEL. HOWEVER, STILL AWAITING THE DDN NIC TO POST AN UPDATED VERSION OF DDN NEWSLETTER #56, DATED 8 JUN 88. HOPE THIS HELPS DIRECTING THE PROBLEM INTO THE PROPER CHANNELS. PLUS, DCA IS RESPONSIBLE FOR THE MAILBRIDGES BETWEEN MILNET & INTERNET SO SUGGEST THIS BE DIRECTED TO THE DDN MILNET POC'S LISTED ABOVE FOR WORKING A POSSIBLE EGP PROBLEM. WOULD LIKE TO SEE A SUMMARY RESPONSE ON HOW THE PROBLEM WAS CORRECTED ON THE TCP-IP MAILER. RODNEY A. MCDANIEL, DAFC AIR FORCE SYSTEMS COMMAND DDN PROGRAM MANAGER EMAIL: MCDANIEL@HQAFSC-VAX.AF.MIL ANDREWS AFB MD - AUTOVON 858-7909 - COMM: 301-981-7909