[comp.protocols.tcp-ip] Odd FTP Problem

netcoor@NCS.DND.CA (DRENET Coordinator) (02/23/89)

Has anyone seen, or can anyone explain this problem?

We have users on network 128.43 who has reported trouble retreiving
files from several hosts in the Internet. The FTP connection is opened
and the user and password are exchanged and the login completed message
is received. After this, problems occur for any command for which a
data connection is to be opened. Other commands not needing the data
connection (eg cd, ascii, binary) work as expected, but commands like
get and dir fail. The usual message received is:

	< 425 Can't build data connection: Connection timed out

although the message:

	< 425 Can't build data connection: Network is unreachable.

is also common. After this, cd and other commands not needing data
connections still work.

This problem is variable in that sometimes it strikes and sometimes it
doesn't. Sometimes it will interrupt an already started data transfer
(the above messages don't apply in this case).

Network 128.43 is gatewayed onto the ARPANET through a Butterfly gateway
(10.1.0.15). I am on network 192.12.98, which is also gatewayed through
the same Butterfly. I have yet to see the problem affect my host (Ultrix).
The host affected on net 128.43 is a DEC 2065 running TOPS-20. 

What confuses me is that packets are being transferred between the two systems
over the command connection throughout, yet any attempt to establish a
new connection for data fails.

Can anyone explain this? I would sure like to understand what is going on
to create this situation, then I can try to do something about it.

Thanks.

Bob Bradford				netcoor@ncs.dnd.ca
DREnet Coordinator			(613) 998-2520

mrc@SUMEX-AIM.STANFORD.EDU (Mark Crispin) (02/24/89)

     I've seen this problem in other forms.  Apparently there are a lot of
"ICMP Destination Network Unreachable" messages getting sent in instances
where network connectivity is broken for only a brief duration (perhaps due
merely to congestion at a gateway).

     Some versions of BSD Unix nuke the connection when this or "host
unreachable" occur; it is reputed that patching location _inetctlerrmap+8 in
the kernel from 0x3341 to 0x0 remedies this problem but I haven't been able to
verify it.

     Since the "425 Can't build data connection" message is coming from the
remote server, it suggests that the problem is occuring when the remote server
tries to open a connection to the FTP client on the 128.43 TOPS-20 host.
Because of this, I'm inclined to absolve the TOSP-20 host of guilt
(particularly in the "Network is unreachable" case), and more likely to blame
network routing.

     Question: could an ICMP Destination Network Unreachable happen when
something like an X.25 virtual circuit limitation is reached?

-------

reschly@BRL.MIL ("Robert J. Reschly Jr.") (02/24/89)

      Mark,

   A couple of weeks ago BRL noted severe difficulties with
connectivity. We were able to trace this to ICMP Network Unreachables
(we're a BSD shop), which appeared to be the result of core route
"flopping".  At the end of this message, I'll tack on the message I sent
to BBN on the subject.  The raw data file mentioned in that message
is still available if anyone is masochisitic enough to want to look at
it.

   When we spoke to BBN the afternoon before sending that message, they
told us they had identified a problem with routing, and a message
received the next afternoon confirmed that a fix to the problem alluded
to in the phone conversation would be fielded in the next few days.  By
mid-week the following week, connectivity did indeed appear to be better
than previously.

   Since this last change, we still see more routing variability than we
feel should be present though it does look better than before the
change.  One curious thing, has anyone else noticed the EGP peers
bouncing in and out?  We peer with BMILDCEC, BMILBBN, and BMILMTR in
that order (though we only exchange updates with one at any given time),
and we are continually having to re-acquire one or more of these
beasties (as I write this, the gateway is trying to acquire BMILMTR).
Have these gateways been bouncing up and down a lot?   We have also
started looking at the EGP information we are getting a little more
closely, and have seen hopcounts as high as 62(!).

   In the last few days, our PSN insufficient resource (type 4)
messages are haunting us again.  We had earlier reported these and BBN
reconfigured our PSN with more space allocated to buffers to lessen the
severity of that problem.  I suppose we'll have to complain about this
again.

   Has anyone else noted any interesting behavior since the change?

				Later,
				    Bob 
   --------
Phone:  (301)278-6678   AV: 298-6678    FTS: 939-6678
Arpa:   reschly@BRL.MIL (or BRL.ARPA)   UUCP: ...!brl-smoke!reschly
Postal: Robert J. Reschly Jr.
        U.S. Army Ballistic Research Laboratory
        Systems Engineering and Concepts Analysis Division
        Advanced Computer Systems Team
        ATTN: SLCBR-SE  (Reschly)
        APG, MD  21005-5066             (Hey, *I* don't make 'em up!)

****  For a good time, call: (303) 499-7111.   Seriously!  ****

================
Date:     Thu, 9 Feb 89 5:51:55 EST
From:     "Robert J. Reschly Jr." <reschly@brl.mil>
To:       meason@wash.bbn.com, amalis@bbn.com
cc:       jcst@BRL.MIL
Subject:  More Node 29 troubles.


      Mike,

   Here is a summary of our recent experience and a copy of Phil's message.

   First, the incompletes are still with us though they appear to be at
the reduced level we noted after the PSN buffer configuration changes.
The only note here is that these messages are still coming in at a much
greater rate than before our switching to EGP peering with the
Buttergates.  We are currently seeing these 5 to 10 (on average) times
an hour, rather than 5 to 10 times a day.

   Second, as Phil notes in the enclosed message, we have been suffering
from what looks like significant routing instability since switching to
EGP peering with the Buttergates.  The variability in numbers of reported
routes was noted as soon as we switched, but we did not notice any actual
reachability problems until a while later.  A typical sequence would be:

	Establish a connection (e.g. FTP, TELNET, rlogin); everything
	appears fine, connectivity is good and round trip times are
	reasonable.

	After a few minutes of operation, suddenly the the connection
	freezes.  The connection usually closes at this time.

	Attempt to restart the connection -- this usually fails

	Wait a few minutes, then attempt to restart the connection.
	This usually succeeds as if there was never any problem.  At
	this point the cycle repeats.

   Running an experiment with ping shows that the loss of communication
coincides with the receipt of ICMP Network Unreachable messages.  I ran
a ping experiment against louie.udel.edu to see if I could  duplicate
and record the symptoms today.  I'll include a summary from the first
part of that at the end of this message, and will put the raw data,
(roughly 1.3MB collected over 4 hours between 1800 EST and 2200 EST 8
Feb 1989) in the public FTP area of vgr.brl.mil.  Note that since this
is a script of a terminal session, there are a few control characters
and escape sequences buried in this file.  We currently EGP peer with
the buttergates at DCEC and CAMBRIDGE as our primary and fallback.

   I have also made some changes to the gateway software to extract a
bit more information but have nothing to present at this time.

   The raw data is the composite of a 15 second timestamp loop, the
ping, and the gateway console all smashed together and intertwingled.
The ping generates the "xx bytes" messages as well as the verbose dumps
of most other ICMP messages.  Much of the gateway console output is
prepended by "<process_name>: ", though there are a few messages which
are different (e.g. "ICMP redirect" and "UPTIME" messages.  The gateway
software is of local origin.  If you have any questions about any of it,
get in touch with us and we will clarify.

   Finally, you will find a number of "milr:  msg with link 27 from
4/48" followed by an equal number of "milr: pack len <value1>, format 15,
illen <value2>" messages.  The values range over a small set for each.
We only started noticing these today, but had not been closely watching
the gateway for the few days prior to today.  The "link" parameter is
the link type from the IMP leader -- we are 1822 connected.

   I hope this stuff helps.
				Later,
				    Bob 
   --------
Phone:  (301)278-6678   AV: 298-6678    FTS: 939-6678
Arpa:   reschly@BRL.MIL (or BRL.ARPA)   UUCP: ...!brl-smoke!reschly
Postal: Robert J. Reschly Jr.
        U.S. Army Ballistic Research Laboratory
        Systems Engineering and Concepts Analysis Division
        Advanced Computer Systems Team
        ATTN: SLCBR-SE  (Reschly)
        APG, MD  21005-5066             (Hey, *I* don't make 'em up!)

****  For a good time, call: (303) 499-7111.   Seriously!  ****

----- Forwarded message # 1:

Received: from smoke.brl.mil by SEM.BRL.MIL id aa07207; 2 Feb 89 7:56 EST
Received: from SMOKE.BRL.MIL by SMOKE.BRL.MIL id aa12789; 2 Feb 89 7:52 EST
Received: from SRI-NIC.ARPA by SMOKE.BRL.MIL id aa12653; 2 Feb 89 7:45 EST
Received: from vgr.brl.mil by SRI-NIC.ARPA with TCP; Thu, 2 Feb 89 01:47:18 PST
Date:     Thu, 2 Feb 89 4:41:04 EST
From:     Phil Dykstra <phil@BRL.MIL>
To:       tcp-ip@sri-nic.arpa
Subject:  Instability in the Core
Message-ID:  <8902020441.aa16937@VGR.BRL.MIL>

Tonight I was trying to talk to some machines on XEROX-NET (net 13), and
once again was hit with oscillating Net-Up/Net-Unreachable.  This has been
happening to me for the past several days for net 13 as well as several
other nets (FYI, I'm 26.2.0.29).

We have been getting EGP info from the RESTON-DCEC Butterfly (26.21.0.104).
I started watching tonight to see why these routes kept appearing and
disappearing and found major unrest in the routing information we were
getting.  Here are nine consecutive EGP routing updates (taken at three
minute intervals).  They span 0400 EST.

	Int  Ext Routes (~A   B   C)
	 5   95   479
	 6   85   536
	 5   95   401
	 6   86   598    17  333  263
	 6   84   507    15  266  241
	 5   94   456     8  270  193
	 6   91   599    16  335  263
	 4   93   453     8  266  194
	 6   87   580    17  321  257

The fields are number of internal and external EGP gateways, total number
of routes, and the approximate number of class A, B, and C (approx because
this includes a few of our fixed routes).  I have complete EGP dumps for
the last six updates if anyone wishes to study the changes.

It really bothers me that the number of class A networks could double/half
every three minutes!  There is also a 10% to 50% change in the total number
of routes every three minutes.  One wouldn't expect the number of internal
EGP gateways to change so fast either [thought the LSI-11's used to flop
like that too].

It is nearly impossible to get data through when the routes come and go
this fast.  I realize that the Butterfly folks are probably working on
this, but I wasn't sure everyone was aware how bad things are right now
(I recall one other TCP-IP note about it).  Is there anything we can do
to help diagnose this?

- Phil
<phil@brl.mil>
uunet!brl!phil

----- End of forwarded messages

================
Script started on Wed Feb  8 18:11:57 1989
PING louie.udel.edu (128.175.1.3): 56 data bytes
64 bytes from 128.175.1.3: icmp_seq=0 time=466 ms
 ... through ...
64 bytes from 128.175.1.3: icmp_seq=95 time=433 ms
64 bytes from 128.175.1.3: icmp_seq=96 time=981 ms
Wed Feb  8 18:14:15 EST 1989
64 bytes from 128.175.1.3: icmp_seq=96 time=1948 ms	<<<DUPLICATE!
64 bytes from 128.175.1.3: icmp_seq=97 time=1084 ms
64 bytes from 128.175.1.3: icmp_seq=98 time=451 ms
64 bytes from 128.175.1.3: icmp_seq=99 time=514 ms
 ... through ...
64 bytes from 128.175.1.3: icmp_seq=175 time=566 ms
Wed Feb  8 18:15:36 EST 1989
64 bytes from 128.175.1.3: icmp_seq=176 time=414 ms
64 bytes from 128.175.1.3: icmp_seq=177 time=448 ms
64 bytes from 128.175.1.3: icmp_seq=178 time=414 ms
64 bytes from 128.175.1.3: icmp_seq=179 time=499 ms
64 bytes from 128.175.1.3: icmp_seq=180 time=481 ms
64 bytes from 128.175.1.3: icmp_seq=181 time=481 ms
64 bytes from 128.175.1.3: icmp_seq=182 time=499 ms
64 bytes from 128.175.1.3: icmp_seq=183 time=599 ms
36 bytes from MCLEAN-MB.DDN.MIL (26.20.0.17): Destination Net Unreachable
Vr HL TOS  Len   ID Flg  off TTL Pro  cks      Src      Dst Data
 4  5  00 0054 a304   0 0000  fb  01 c3e4 c0051708 80af0103 
36 bytes from MCLEAN-MB.DDN.MIL (26.20.0.17): Destination Net Unreachable
Vr HL TOS  Len   ID Flg  off TTL Pro  cks      Src      Dst Data
 4  5  00 0054 a30f   0 0000  fb  01 c3d9 c0051708 80af0103 
 ... 21 more net unreachables deleted ...
egp: default of 26.1.0.49 with 293 routes		<<<GATEWAY EGP UPDATE
egp: 87 gwys, 6 int, 81 ext (565 routes).
ip: 587 routes, 15 A, 306 B, 259 C, 7 S, 0 O.
36 bytes from MCLEAN-MB.DDN.MIL (26.20.0.17): Destination Net Unreachable
Vr HL TOS  Len   ID Flg  off TTL Pro  cks      Src      Dst Data
 4  5  00 0054 a3a4   0 0000  fb  01 c344 c0051708 80af0103 
Wed Feb  8 18:16:09 EST 1989
36 bytes from MCLEAN-MB.DDN.MIL (26.20.0.17): Destination Net Unreachable
Vr HL TOS  Len   ID Flg  off TTL Pro  cks      Src      Dst Data
 4  5  00 0054 a3aa   0 0000  fb  01 c33e c0051708 80af0103 
 ... 45 more net unreachables deleted ...
milr:  msg with link 27 from 4/48			<<<FUNNY AFWL MESSAGES
milr: pack len 2352, format 15, illen 28681		<<<"4/48" IS PORT/NODE
36 bytes from MCLEAN-MB.DDN.MIL (26.20.0.17): Destination Net Unreachable
Vr HL TOS  Len   ID Flg  off TTL Pro  cks      Src      Dst Data
 4  5  00 0054 a4c1   0 0000  fb  01 c227 c0051708 80af0103 
36 bytes from MCLEAN-MB.DDN.MIL (26.20.0.17): Destination Net Unreachable
Vr HL TOS  Len   ID Flg  off TTL Pro  cks      Src      Dst Data
 4  5  00 0054 a4c6   0 0000  fb  01 c222 c0051708 80af0103 
Wed Feb  8 18:16:58 EST 1989
 ... 42 more net unreachables deleted ...
36 bytes from MCLEAN-MB.DDN.MIL (26.20.0.17): Destination Net Unreachable
Vr HL TOS  Len   ID Flg  off TTL Pro  cks      Src      Dst Data
 4  5  00 0054 a5f3   0 0000  fb  01 c0f5 c0051708 80af0103 
64 bytes from 128.175.1.3: icmp_seq=300 time=633 ms	<<< ONLY DROPPED 3PKTS
64 bytes from 128.175.1.3: icmp_seq=301 time=733 ms
Wed Feb  8 18:17:47 EST 1989
64 bytes from 128.175.1.3: icmp_seq=302 time=666 ms
64 bytes from 128.175.1.3: icmp_seq=303 time=881 ms
92 bytes from BRL.ARPA (26.2.0.29): Source Quench
Vr HL TOS  Len   ID Flg  off TTL Pro  cks      Src      Dst Data
 4  5  00 0054 a605   0 0000  fc  01 bfe3 c0051708 80af0103 
64 bytes from 128.175.1.3: icmp_seq=304 time=1248 ms
64 bytes from 128.175.1.3: icmp_seq=306 time=633 ms
64 bytes from 128.175.1.3: icmp_seq=307 time=748 ms
 ... through ...
64 bytes from 128.175.1.3: icmp_seq=331 time=1381 ms

Wed Feb  8 18:18:19 EST 1989
64 bytes from 128.175.1.3: icmp_seq=333 time=933 ms
milr:  msg with link 27 from 4/48
milr: pack len 2352, format 15, illen 28681
64 bytes from 128.175.1.3: icmp_seq=334 time=1281 ms
 ... through ...
64 bytes from 128.175.1.3: icmp_seq=384 time=784 ms
64 bytes from 128.175.1.3: icmp_seq=386 time=566 ms
64 bytes from 128.175.1.3: icmp_seq=387 time=651 ms
 ... through ...
64 bytes from 128.175.1.3: icmp_seq=410 time=766 ms
Wed Feb  8 18:19:40 EST 1989
64 bytes from 128.175.1.3: icmp_seq=411 time=833 ms
64 bytes from 128.175.1.3: icmp_seq=412 time=633 ms
64 bytes from 128.175.1.3: icmp_seq=413 time=548 ms
64 bytes from 128.175.1.3: icmp_seq=414 time=766 ms
64 bytes from 128.175.1.3: icmp_seq=415 time=848 ms
64 bytes from 128.175.1.3: icmp_seq=416 time=633 ms
egp: default of 26.1.0.49 with 232 routes		<<< GATEWAY EGP UPDATE
egp: 86 gwys, 6 int, 80 ext (434 routes).
ip: 456 routes, 14 A, 243 B, 192 C, 7 S, 0 O.
64 bytes from 128.175.1.3: icmp_seq=417 time=848 ms	<<< 1 MORE PACKET THEN
Wed Feb  8 18:19:56 EST 1989				<<< NOTHING UNTIL
Wed Feb  8 18:20:12 EST 1989
Wed Feb  8 18:20:28 EST 1989
milr:  msg with link 27 from 4/48
milr: pack len 2352, format 15, illen 28681
Wed Feb  8 18:20:44 EST 1989
Wed Feb  8 18:21:00 EST 1989
milr: incomplete 15/115 3
Wed Feb  8 18:21:16 EST 1989
64 bytes from 128.175.1.3: icmp_seq=514 time=418 ms	<<< HERE
 ... through ...
64 bytes from 128.175.1.3: icmp_seq=534 time=499 ms
92 bytes from BRL.ARPA (26.2.0.29): Source Quench
Vr HL TOS  Len   ID Flg  off TTL Pro  cks      Src      Dst Data
 4  5  00 0054 a836   0 0000  fc  01 bdb2 c0051708 80af0103 
64 bytes from 128.175.1.3: icmp_seq=536 time=514 ms
Wed Feb  8 18:21:49 EST 1989
64 bytes from 128.175.1.3: icmp_seq=537 time=533 ms
36 bytes from localhost (127.0.0.1): Destination Port Unreachable
Vr HL TOS  Len   ID Flg  off TTL Pro  cks      Src      Dst Data
 4  5  00 003d a842   0 0000  1e  11 0000 7f000001 7f000001 
UDP: from port 53, to port 3500 (decimal)
64 bytes from 128.175.1.3: icmp_seq=538 time=433 ms
64 bytes from 128.175.1.3: icmp_seq=539 time=448 ms
 ... through ...
64 bytes from 128.175.1.3: icmp_seq=546 time=448 ms
92 bytes from BRL.ARPA (26.2.0.29): Source Quench
Vr HL TOS  Len   ID Flg  off TTL Pro  cks      Src      Dst Data
 4  5  00 0054 a882   0 0000  fc  01 bd66 c0051708 80af0103 
64 bytes from 128.175.1.3: icmp_seq=547 time=5048 ms	<<< UNUSUAL DELAY
64 bytes from 128.175.1.3: icmp_seq=548 time=4181 ms
Wed Feb  8 18:22:05 EST 1989
64 bytes from 128.175.1.3: icmp_seq=552 time=2381 ms
64 bytes from 128.175.1.3: icmp_seq=553 time=2266 ms
64 bytes from 128.175.1.3: icmp_seq=554 time=2099 ms
64 bytes from 128.175.1.3: icmp_seq=555 time=1866 ms
64 bytes from 128.175.1.3: icmp_seq=556 time=1448 ms
64 bytes from 128.175.1.3: icmp_seq=557 time=999 ms
64 bytes from 128.175.1.3: icmp_seq=558 time=433 ms
64 bytes from 128.175.1.3: icmp_seq=559 time=533 ms
 ... through ...
Wed Feb  8 18:23:58 EST 1989
64 bytes from 128.175.1.3: icmp_seq=662 time=518 ms
64 bytes from 128.175.1.3: icmp_seq=663 time=518 ms
64 bytes from 128.175.1.3: icmp_seq=664 time=499 ms
64 bytes from 128.175.1.3: icmp_seq=665 time=551 ms
64 bytes from 128.175.1.3: icmp_seq=666 time=451 ms
64 bytes from 128.175.1.3: icmp_seq=667 time=418 ms
64 bytes from 128.175.1.3: icmp_seq=668 time=599 ms
64 bytes from 128.175.1.3: icmp_seq=669 time=466 ms
64 bytes from 128.175.1.3: icmp_seq=670 time=566 ms
64 bytes from 128.175.1.3: icmp_seq=671 time=633 ms
64 bytes from 128.175.1.3: icmp_seq=672 time=433 ms
36 bytes from MCLEAN-MB.DDN.MIL (26.20.0.17): Destination Net Unreachable
Vr HL TOS  Len   ID Flg  off TTL Pro  cks      Src      Dst Data
 4  5  00 0054 aa58   0 0000  fb  01 bc90 c0051708 80af0103 
36 bytes from MCLEAN-MB.DDN.MIL (26.20.0.17): Destination Net Unreachable
Vr HL TOS  Len   ID Flg  off TTL Pro  cks      Src      Dst Data
 4  5  00 0054 aa64   0 0000  fb  01 bc84 c0051708 80af0103 
 ... 174 more net unreachables deleted ...
64 bytes from 128.175.1.3: icmp_seq=848 time=666 ms
64 bytes from 128.175.1.3: icmp_seq=849 time=833 ms
Wed Feb  8 18:27:14 EST 1989
64 bytes from 128.175.1.3: icmp_seq=850 time=751 ms
 ... through ...
64 bytes from 128.175.1.3: icmp_seq=878 time=833 ms
egp: default of 26.1.0.49 with 318 routes
egp: 83 gwys, 6 int, 77 ext (569 routes).
ip: 591 routes, 15 A, 311 B, 258 C, 7 S, 0 O.
64 bytes from 128.175.1.3: icmp_seq=879 time=918 ms
64 bytes from 128.175.1.3: icmp_seq=880 time=1051 ms
Wed Feb  8 18:27:46 EST 1989
64 bytes from 128.175.1.3: icmp_seq=881 time=818 ms
 ... through ...
64 bytes from 128.175.1.3: icmp_seq=895 time=551 ms
64 bytes from 128.175.1.3: icmp_seq=896 time=851 ms
Wed Feb  8 18:28:02 EST 1989
64 bytes from 128.175.1.3: icmp_seq=897 time=1151 ms
64 bytes from 128.175.1.3: icmp_seq=895 time=3833 ms	<<< DUPLICATE
64 bytes from 128.175.1.3: icmp_seq=898 time=799 ms
64 bytes from 128.175.1.3: icmp_seq=899 time=933 ms
64 bytes from 128.175.1.3: icmp_seq=900 time=584 ms
 ... through ...
64 bytes from 128.175.1.3: icmp_seq=1039 time=566 ms
64 bytes from 128.175.1.3: icmp_seq=1039 time=766 ms	<<< DUPLICATE
64 bytes from 128.175.1.3: icmp_seq=1040 time=748 ms
 ... though ...
64 bytes from 128.175.1.3: icmp_seq=1115 time=748 ms
64 bytes from 128.175.1.3: icmp_seq=1116 time=499 ms
Wed Feb  8 18:31:49 EST 1989
64 bytes from 128.175.1.3: icmp_seq=1117 time=599 ms
64 bytes from 128.175.1.3: icmp_seq=1118 time=799 ms
64 bytes from 128.175.1.3: icmp_seq=1119 time=533 ms
64 bytes from 128.175.1.3: icmp_seq=1120 time=699 ms
64 bytes from 128.175.1.3: icmp_seq=1121 time=533 ms
64 bytes from 128.175.1.3: icmp_seq=1121 time=781 ms	<<< DUPLICATE
64 bytes from 128.175.1.3: icmp_seq=1122 time=648 ms
64 bytes from 128.175.1.3: icmp_seq=1124 time=981 ms	<<< MISSING 1123
64 bytes from 128.175.1.3: icmp_seq=1125 time=799 ms
64 bytes from 128.175.1.3: icmp_seq=1126 time=548 ms
64 bytes from 128.175.1.3: icmp_seq=1127 time=799 ms
64 bytes from 128.175.1.3: icmp_seq=1128 time=614 ms
64 bytes from 128.175.1.3: icmp_seq=1129 time=448 ms
64 bytes from 128.175.1.3: icmp_seq=1130 time=666 ms
64 bytes from 128.175.1.3: icmp_seq=1131 time=681 ms
64 bytes from 128.175.1.3: icmp_seq=1132 time=681 ms
Wed Feb  8 18:32:06 EST 1989
64 bytes from 128.175.1.3: icmp_seq=1133 time=614 ms
64 bytes from 128.175.1.3: icmp_seq=1134 time=514 ms
egp: default of 26.1.0.49 with 276 routes		<<< GATEWAY EGP UPDATE
egp: 88 gwys, 6 int, 82 ext (492 routes).
ip: 514 routes, 16 A, 267 B, 224 C, 7 S, 0 O.
64 bytes from 128.175.1.3: icmp_seq=1135 time=814 ms
36 bytes from RESTON-DCEC-MB.DDN.MIL (26.21.0.104): Destination Net Unreachable
Vr HL TOS  Len   ID Flg  off TTL Pro  cks      Src      Dst Data
 4  5  00 0054 b1ae   0 0000  fb  01 b53a c0051708 80af0103 
36 bytes from RESTON-DCEC-MB.DDN.MIL (26.21.0.104): Destination Net Unreachable
Vr HL TOS  Len   ID Flg  off TTL Pro  cks      Src      Dst Data
 4  5  00 0054 b1b3   0 0000  fb  01 b535 c0051708 80af0103 
 ... 73 more net unreachables deleted ...
milr:  msg with link 27 from 4/48		<<< ANOTHER LINK MESSAGE
milr: pack len 2370, format 15, illen 10
Wed Feb  8 18:33:29 EST 1989
milr: incomplete 3/13 4				<<< THESE SEEM MORE COMMON
milr: incomplete 3/13 4				<<< WHEN RESTON IS COMPLAINING
milr: incomplete 3/13 3
milr:  msg with link 27 from 4/48
milr: pack len 2378, format 15, illen 16394
Wed Feb  8 18:33:45 EST 1989			<<< A BUNCH MORE MISSING
Wed Feb  8 18:34:01 EST 1989
milr:  msg with link 27 from 4/48
milr: pack len 2352, format 15, illen 28681
milr:  msg with link 27 from 4/48
milr: pack len 2378, format 15, illen 16394
milr:  msg with link 27 from 4/48
milr: pack len 2378, format 15, illen 16394
milr:  msg with link 27 from 4/48
milr: pack len 2370, format 15, illen 10
milr:  msg with link 27 from 4/48
milr: pack len 2378, format 15, illen 16394
milr:  msg with link 27 from 4/48
milr: pack len 2370, format 15, illen 10
milr:  msg with link 27 from 4/48
milr: pack len 2378, format 15, illen 16394
milr:  msg with link 27 from 4/48
milr: pack len 2370, format 15, illen 10
Wed Feb  8 18:34:17 EST 1989
milr:  msg with link 27 from 4/48
milr: pack len 2370, format 15, illen 10
milr:  msg with link 27 from 4/48
milr: pack len 2378, format 15, illen 16394
Wed Feb  8 18:34:34 EST 1989
Wed Feb  8 18:34:50 EST 1989
Wed Feb  8 18:35:06 EST 1989
Wed Feb  8 18:35:22 EST 1989
64 bytes from 128.175.1.3: icmp_seq=1330 time=581 ms
64 bytes from 128.175.1.3: icmp_seq=1331 time=699 ms
64 bytes from 128.175.1.3: icmp_seq=1332 time=614 ms
64 bytes from 128.175.1.3: icmp_seq=1333 time=481 ms
64 bytes from 128.175.1.3: icmp_seq=1334 time=748 ms
64 bytes from 128.175.1.3: icmp_seq=1335 time=448 ms
64 bytes from 128.175.1.3: icmp_seq=1336 time=599 ms
64 bytes from 128.175.1.3: icmp_seq=1337 time=448 ms
64 bytes from 128.175.1.3: icmp_seq=1338 time=499 ms
Wed Feb  8 18:35:38 EST 1989
64 bytes from 128.175.1.3: icmp_seq=1339 time=566 ms
 ... through ...
64 bytes from 128.175.1.3: icmp_seq=1401 time=818 ms
Wed Feb  8 18:36:44 EST 1989
egp: default of 26.1.0.49 with 228 routes
egp: 88 gwys, 6 int, 82 ext (529 routes).
ip: 551 routes, 14 A, 299 B, 231 C, 7 S, 0 O.
64 bytes from 128.175.1.3: icmp_seq=1402 time=848 ms
64 bytes from 128.175.1.3: icmp_seq=1403 time=1199 ms
64 bytes from 128.175.1.3: icmp_seq=1404 time=681 ms
64 bytes from 128.175.1.3: icmp_seq=1404 time=699 ms	<<< DUPLICATE
64 bytes from 128.175.1.3: icmp_seq=1404 time=1166 ms	<<< DUPLICATE
64 bytes from 128.175.1.3: icmp_seq=1405 time=514 ms
64 bytes from 128.175.1.3: icmp_seq=1406 time=699 ms
64 bytes from 128.175.1.3: icmp_seq=1407 time=766 ms
64 bytes from 128.175.1.3: icmp_seq=1408 time=881 ms
64 bytes from 128.175.1.3: icmp_seq=1409 time=648 ms
64 bytes from 128.175.1.3: icmp_seq=1410 time=648 ms
64 bytes from 128.175.1.3: icmp_seq=1410 time=714 ms	<<< DUPLICATE
64 bytes from 128.175.1.3: icmp_seq=1411 time=648 ms
64 bytes from 128.175.1.3: icmp_seq=1412 time=1248 ms
64 bytes from 128.175.1.3: icmp_seq=1413 time=1148 ms
64 bytes from 128.175.1.3: icmp_seq=1414 time=814 ms
64 bytes from 128.175.1.3: icmp_seq=1415 time=933 ms
   ... through ...
64 bytes from 128.175.1.3: icmp_seq=1493 time=884 ms
64 bytes from 128.175.1.3: icmp_seq=1494 time=866 ms
64 bytes from 128.175.1.3: icmp_seq=1494 time=1833 ms	<<< DUPLICATE
64 bytes from 128.175.1.3: icmp_seq=1495 time=1051 ms
64 bytes from 128.175.1.3: icmp_seq=1496 time=799 ms
64 bytes from 128.175.1.3: icmp_seq=1497 time=866 ms
Wed Feb  8 18:38:23 EST 1989
64 bytes from 128.175.1.3: icmp_seq=1498 time=618 ms
   ... through ...
64 bytes from 128.175.1.3: icmp_seq=1749 time=718 ms
Wed Feb  8 18:42:44 EST 1989
64 bytes from 128.175.1.3: icmp_seq=1750 time=818 ms
64 bytes from 128.175.1.3: icmp_seq=1751 time=1133 ms
64 bytes from 128.175.1.3: icmp_seq=1752 time=818 ms
64 bytes from 128.175.1.3: icmp_seq=1753 time=1318 ms
36 bytes from MCLEAN-MB.DDN.MIL (26.20.0.17): Destination Net Unreachable
Vr HL TOS  Len   ID Flg  off TTL Pro  cks      Src      Dst Data
 4  5  00 0054 bba1   0 0000  fb  01 ab47 c0051708 80af0103 
36 bytes from MCLEAN-MB.DDN.MIL (26.20.0.17): Destination Net Unreachable
Vr HL TOS  Len   ID Flg  off TTL Pro  cks      Src      Dst Data
 4  5  00 0054 bba6   0 0000  fb  01 ab42 c0051708 80af0103 

 ...and the data goes on for roughly three more hours ....

jas@proteon.com (John A. Shriver) (02/25/89)

Well Mark, that indeed sounds like a great way to improve the
antisocial behaviour of 4.[23]bsd in the face of ICMP host and net
unreachables.  I looked at the source (ip_input.c, tcp_subr.c,
protosw.h), and it certainly seems that will have the deisred result.

The u_char array inetctlerrmap (in ip_input.c) maps the generic error
types from protosw.h to error numbers from errno.h.  Offsets 8 and 9
are (protosw.h):

#define	PRC_UNREACH_NET		8	/* no route to network */
#define	PRC_UNREACH_HOST	9	/* no route to host */

which are mapped repsectively to (errno.h):

#define	ENETUNREACH	51		/* Network is unreachable */
#define	EHOSTUNREACH	65		/* No route to host */

Entries in that array which are 0 do not cause user errors.  Patching
8 & 9 to zero should do this.

Here's an example.  Be careful to use lower case `w'.

# adb -w /vmunix /dev/kmem
inetctlerrmap?w0				patches disk
inetctlerrmap/w0				patches memory

I have not tried this, no guaruntees.  It looks like this works in
4.3bsd, Ultrix 2.2, and SunOS 3.5.

It still is not the optimal solution, which would be to pass a warning
to the user layer, so they could decide what to do.  I suspect that is
why there is a /* XXX */ at the end of tcp_ctlinput() in tcp_subr.c.
(/* XXX */ is Berkeley shorthand for kludge, should be fixed.)  There
are no comments on the entire subroutine.

Any 4.3bsd gurus out there like to verify this?

edb@fai.UUCP (Edward Bunch) (02/25/89)

In article <8902221936.AA10826@ncs.dnd.ca> netcoor@NCS.DND.CA (DRENET) writes:
>We have users on network 128.43 who has reported trouble retrieving
>files from several hosts in the Internet.
>	< 425 Can't build data connection: Connection timed out
>Can anyone explain this? I would sure like to understand what is going on
>to create this situation, then I can try to do something about it.
>
>Bob Bradford				netcoor@ncs.dnd.ca

I saw this same problem here on our WAN. The problem was this. We were trying
to talk to a ethernet interface that was on the other end of the machine.

This is a little difficult to explain. Picture a machine with two interfaces
that we wish to contact to ftp something off. When we start the FTP we 
specify a host address of the interface on the far side. That is packets must
pass through the first interface and then through loop-back before arriving
at ftpd. When FTP trys to build the data connection the reverse way it fails.
ie. Loop-Back --> Other Interface -->  Me.
I suppose FTPD wasn't smart enough to avoid the loop-back network on the return
trip.
Solution: Use the interface address on the near side.

--------------------------------------------------------------------------------
Edward A. Bunch                       UUCP: {uunet,amdahl,sun}!fai!edb
Fujitsu America, Inc.                 DOMAIN: edb@fai.com
Computer Support and Administation.
--------------------------------------------------------------------------------

narten@PURDUE.EDU (Thomas Narten) (02/26/89)

[ Stuff about using adb to xero out errors in inetctlerrmap ]

The suggested fix causes 4.3 to ignore ICMP unreachable errors in all
cases, something that one probably does not want to do.  For instance,
I *much* prefer to have telnet attempts abort quickly with a "network
unreachable" than with a "connection timed out" some 60 seconds later.
On the other hand, once a connection has been established, I'd prefer
stray ICMP errors not break a connection.  Moreover, nuking ICMP
unreachable errors weakens utilities like ping that understand such
messages.  One of ping's useful features is the printing of ICMP
errors it receives.

The following patch (perhaps not pretty, but precise) treats ICMP
unreachable errors as before, except that they won't break established
connections.

Thomas

*** /tmp/,RCSt1025442	Sat Feb 25 13:46:58 1989
--- /tmp/,RCSt2025442	Sat Feb 25 13:46:59 1989
***************
*** 258,264 ****
  tcp_notify(inp)
  	register struct inpcb *inp;
  {
! 
  	wakeup((caddr_t) &inp->inp_socket->so_timeo);
  	sorwakeup(inp->inp_socket);
  	sowwakeup(inp->inp_socket);
--- 258,271 ----
  tcp_notify(inp)
  	register struct inpcb *inp;
  {
! 	if (inp->inp_socket->so_state != SS_ISCONNECTING) {
! 	        register int error = inp->inp_socket->so_error;
! 	        if ((error == EHOSTUNREACH) || (error == ENETUNREACH)
! 		     || (error == EHOSTDOWN)) {
! 		        inp->inp_socket->so_error = 0; /* clear error */
! 			return;
! 		}
! 	} 
  	wakeup((caddr_t) &inp->inp_socket->so_timeo);
  	sorwakeup(inp->inp_socket);
  	sowwakeup(inp->inp_socket);

Mills@UDEL.EDU (02/27/89)

Robert,

The Fuzzball logs on net 128.4 and various other places near the NSFNET
Bluebone are also showing unstable routes, sometimes intermittent
ICMP unretchables and other times ICMP time exceededs. The problem has
been growing slowly worse over the last few weeks. Seen from here
one or more of the Fuzzball time servers drops off the Earth only to
replanet a few minutes or hours later. Also, there is a rising tide
of ICMP unmentionables coming from distant gateways, rather than
nearby EGP gateways on ARPANET, so things are certainly unstable somewhere
in space. Finally, the rate of ARPANET error messages, especially to
a few gateways, is growing steadily worse. I conclude PSNs for those
gateways may be sinking slowly in the muck.

This report is certainly much less specific than yours; however, you may
find whatever corroboration useful.

Dave

Hampton@DOCKMASTER.ARPA ("David R. Hampton") (02/27/89)

    Are the hosts that you are having problems with Berkeley 4.2
systems?  There is a problem with the Berkeley 4.2 FTP server code.
When the 4.2 server opens a data connection, it should performs several
internal steps to get the data connection set up, and then announce the
port number to the client.  In the distributed kernel, it actually
announces the data port before it performs the final step of setup.  If
this final step fails, a 425 error is returned.

David Hampton
Hampton @ Dockmaster.ARPA

abe@mace.cc.purdue.edu (Vic Abell) (02/27/89)

There is another BSD ftpd problem in the very latest post-worm release that
can cause data connection failure.  The connection failure occurs when there
are two, incoming ftpd calls from the same remote peer.  If the two, receiving
ftpd processes both try to open a data connection at the same time, one can
fail with an EADDRINUSE error.  We have fixed this problem locally by adding
a retry loop in ftpd.c's getdatasock() function.

The released ftpd lacks this loop.  It also may be reporting the cause of
data connection failures improperly when they result from an error in the
bind() call within getdatasock().  There are several function calls between
the bind() and the reply() call that can change the value of errno.