trn@aplcen.apl.jhu.edu (Tony Nardo) (02/27/90)
For the past few weeks, links between the *.jhuapl.edu nodes and the
non-MILNET community have been somewhat unstable. Today, however, is
the first time that I've seen an extreme case of gateway thrashing:
warper.110% traceroute uunet.uu.net
traceroute to uunet.uu.net (192.48.96.2), 30 hops max, 40 byte packets
1 apl-b3-gw (128.244.3.1) 0 ms 10 ms 0 ms
2 apl-gw (128.244.1.1) 0 ms 10 ms 0 ms
3 RESTON-DCEC-MB.DDN.MIL (26.21.0.104) 290 ms MARINA-DEL-REY-MB.DDN.MIL (26.6.0.103) 320 ms 330 ms
4 MARINA-DEL-REY-MB.DDN.MIL (26.6.0.103) 690 ms CAMBRIDGE-MB.DDN.MIL (10.3.0.5) 430 ms 430 ms
5 CAMBRIDGE-MB.DDN.MIL (10.3.0.5) 740 ms * MARINA-DEL-REY-MB.DDN.MIL (26.6.0.103) 1340 ms
6 MARINA-DEL-REY-MB.DDN.MIL (26.6.0.103) 1210 ms CAMBRIDGE-MB.DDN.MIL (10.3.0.5) 2060 ms 2080 ms
7 CAMBRIDGE-MB.DDN.MIL (10.3.0.5) 990 ms MARINA-DEL-REY-MB.DDN.MIL (26.6.0.103) 1290 ms 1710 ms
8 * CAMBRIDGE-MB.DDN.MIL (10.3.0.5) 1640 ms *
9 * MARINA-DEL-REY-MB.DDN.MIL (26.6.0.103) 3010 ms 2490 ms
10 MARINA-DEL-REY-MB.DDN.MIL (26.6.0.103) 2240 ms * *
11 CAMBRIDGE-MB.DDN.MIL (10.3.0.5) 2220 ms * *
12 * CAMBRIDGE-MB.DDN.MIL (10.3.0.5) 3920 ms *
13 * * *
14 * * *
15 * * *
etc.
Does anyone have any insights as to how this thrashing starts? How it
may be stopped?
--
Tony Nardo, INET: trn@warper.jhuapl.edu, trn@aplcen.apl.jhu.edu
Johns Hopkins Univ./APL UUCP: {backbone!}mimsy!aplcen!trn
Quote(s) relocated to my finger .planscurt@dtix.dt.navy.mil (Welch) (02/27/90)
In article <4790@aplcen.apl.jhu.edu> trn@aplcen.apl.jhu.edu (Tony Nardo) writes: >For the past few weeks, links between the *.jhuapl.edu nodes and the >non-MILNET community have been somewhat unstable. Today, however, is >the first time that I've seen an extreme case of gateway thrashing: > >warper.110% traceroute uunet.uu.net >traceroute to uunet.uu.net (192.48.96.2), 30 hops max, 40 byte packets > 1 apl-b3-gw (128.244.3.1) 0 ms 10 ms 0 ms > 2 apl-gw (128.244.1.1) 0 ms 10 ms 0 ms > 3 RESTON-DCEC-MB.DDN.MIL (26.21.0.104) 290 ms MARINA-DEL-REY-MB.DDN.MIL (26. >6.0.103) 320 ms 330 ms [multiple MB hops deleted] >12 * CAMBRIDGE-MB.DDN.MIL (10.3.0.5) 3920 ms * >etc. > >Does anyone have any insights as to how this thrashing starts? How it >may be stopped? We have been seeing this same problem for weeks. One minute, traceroute shows a normal route off of the MILNET through one of the mail-bridges, and the next minute, we see traceroute output like the example above. Our packets are being passed around between the mail bridges, but they never leave the MILNET/ARPANET. Whenever this gateway thrashing starts, it lasts long enough to break TCP connections. It has gotten so bad in the last week that it almost stopped our news feed. The nntp connections, when they could get started, would only last for about 3 to 5 minutes before being disconnected. For weeks, ftp and telnet connections to anywhere off of the MILNET have been terrible. They would only last a few minutes before disconnecting, and even when they were connected they were really to slow to use. For the past few months, ftp connections to non MILNET sites have been getting worse and worse. I installed traceroute a month ago in an effort to get a handle on these network problems. When I first saw this problem, I assumed that some of the mail bridges must be going down. Now, I would guess that this problem is being caused by too much traffic through the mail bridges. Who runs the mail bridges and who can tell me what's going on? What has been changing in the past few months that has caused this? Has the traffic really been increasing or has the number of gateways been decreasing? Or is something much more complex causing this problem? Why do the mail bridges bounce packets around like that? Do they really think that the best route is through the other bridge or do they use a packet routing algorithm that gives the packets to another bridge when the queue for the outbound link is full?. Who do I need to contact to get this problem resolved? Do we have to get a connection to NSFnet to get away from this problem? Is there anything we could be doing wrong to cause this? Is there anything we can change to get around this problem? Thanks in advance for any help anyone can give us. (while I can still talk to you...) Curt Welch curt@dtix.dt.navy.mil P.S. Our gateway to the MILNET is through dtrc-b1-gw.dt.navy.mil, a cisco router, MILNET address 26.22.0.81.
ron@MANTA.NOSC.MIL (Ron Broersma) (03/02/90)
I'm wondering if some of this gateway thrashing isn't related to the fact that EGP packets from the MILNET core started exceeding 4096 bytes a month or two ago. At that time, I was tracing some thrashing problems and I noticed the following symptoms. Over the course of an hour, the packets would gradually increase in size. Just as they got within 10 to 20 bytes of 4096, many of the EGP implementations would suddenly start getting checksum errors or buffer overflows because they had 4K buffers. The ones that got a few packets with bad checksums would suddenly stop peering with that core gateway. Then, all of a sudden the EGP packets out of the core would be smaller by a few hundred bytes because of many fewer peers. As the EGP players all tried to acquire a different gateway, they would not get the checksum errors for an hour or so until the packets approached 4096 bytes and they would again perform this dance-of-the-gateways. The message here is to make sure your EGP implementation can handle packets larger than 4K bytes. The most recent gated supports 8K packets as I recall. Something I had considered was to make a list of all the networks that disappeared from the EGP packets right after the "dance". Then if one could determine who announced those nets to the core you could get a handle on where the broken EGP implementations were located. There's some other strangeness going on too. We had a case this week where one site running EGP was announcing its network to the core but the core wasn't telling anybody else about it. By peering with a different mailbridge, it started working. Strange. And to top it off, the ground started shaking yesterday. But we think that is an unrelated (hardware) problem. --Ron
mcdaniel%hqeis.decnet@HQAFSC-VAX.AF.MIL ("HQEIS::MCDANIEL") (03/03/90)
Andrews AFB
I N T E R O F F I C E M E M O R A N D U M
Date: 02-Mar-1990 11:46am EST
From: Mr Rodney McDaniel
MCDANIEL
Dept: HQ AFSC/SCXP
Tel No: AV 858-7909 COMM 981-7909
Owner: Mr Rodney McDaniel
TO: _MAILER! ( _DDN[TCP-IP@NIC.DDN.MIL] )
TO: _MAILER! ( _DDN[NIC@NIC.DDN.MIL] )
CC: _MAILER! ( _DDN[RON@MANTA.NOSC.MIL] )
CC: _MAILER! ( _DDN[CURT@DTIX.DT.NAVY.MIL] )
Subject: RE: *.JHUAPL.EDU -- SERIOUS GATEWAY THRASHING
HAS ANYONE THOUGHT ABOUT CONTACTING THE FOLLOWING OFFICES RELATING
TO DDN MILNET PROBLEMS:
CONUS MILNET MONITORING CENTER
AUTOVON 222-2268/5726
COMM: 202-692-2268/5726
EMAIL ADDRESS: DCA-MMC.DCA-EMS.DCA.MIL
CONUS TROUBLE DESK (MILNET & DSNET)
1-800-451-7413
AUTOVON 231-1787
COMM: 202-486-1982
NAVY POINT OF CONTACT:
THIS FOLLOW-UP PROBLEM WAS FURTHER IDENTIFIED BY A TWO NAVY.MIL
SYSTEMS.
NAVAL TELECOMMUNICATIONS COMMAND AUTOVON 292-0381
ATTN: N521 COMM: 202-282-0381
4401 MASSACHUSETTS AVENUE NW EMAIL: NAVTELCOM@DDN2.DCA.MIL
WASHINGTON, DC 20390-5290
DCA/DDOM (B651)
WASHINGTON, DC 20305-2000
MAJOR CORDER - MILNET MANAGER
AUTOVON 222-7580
COMM: 202-692-7580
EMAIL ADDRESS: MILNETMGR@DDN3.DCA.MIL
THIS INFORMATION IS AVAILABLE IN DDN NEWSLETTER #56, STORED ON THE
NIC.DDN.MIL, <TACNEWS> MENU ITEM 6. OPTION, VIA TELENET, BY
CALLING 1-800-235-3155 OR SENDING A REQUEST TO: NIC@NIC.DDN.MIL
(USER ASSISTANCE) THIS NEWSLETTER PROVIDES THE POINTS OF CONTACT
FOR DDN PROBLEMS. PLEASE NOTE: A NEW DDN NEWSLETTER #57 IS
FORTHCOMING AND A DRAFT CAN BE OBTAINED SAME AS ABOVE OR USING FTP
AND REQUESTING NETINFO:WHO-DDN.TXT AND PROVIDES ALL THE DCA DDN
PROGRAM OFFICE FUNCTIONS AND PERSONNEL.
HOWEVER, STILL AWAITING THE DDN NIC TO POST AN UPDATED VERSION OF
DDN NEWSLETTER #56, DATED 8 JUN 88. HOPE THIS HELPS DIRECTING
THE PROBLEM INTO THE PROPER CHANNELS. PLUS, DCA IS RESPONSIBLE
FOR THE MAILBRIDGES BETWEEN MILNET & INTERNET SO SUGGEST THIS BE
DIRECTED TO THE DDN MILNET POC'S LISTED ABOVE FOR WORKING A
POSSIBLE EGP PROBLEM. WOULD LIKE TO SEE A SUMMARY RESPONSE ON HOW
THE PROBLEM WAS CORRECTED ON THE TCP-IP MAILER.
RODNEY A. MCDANIEL, DAFC
AIR FORCE SYSTEMS COMMAND
DDN PROGRAM MANAGER
EMAIL: MCDANIEL@HQAFSC-VAX.AF.MIL
ANDREWS AFB MD - AUTOVON 858-7909 - COMM: 301-981-7909