pogran@CCQ.BBN.COM (Ken Pogran) (12/16/87)
Several problems with regard to the New End-to-End Protocol deployed on the ARPANET with PSN Release 7 were discussed in messages to this list over the past couple of days. Here's a report on what we at BBN have uncovered so far. We are working on three general problems and one problem that is specific to a particular host. The "general" problems affect communication with some hosts (including some gateways) connected to the ARPANET with X.25 interfaces. 1. The "one packet problem." The scenario has been described by several folks, and runs about like this: An 1822-connected gateway has traffic for an X.25-connected host. Sending the first datagram into the net causes an X.25 VC to be opened to the destination host. One and only one packet is received by the host, and the flow stops. Various events can cause the flow to become "unblocked", such as sending traffic FROM the host back over the same VC. This problem has been observed with several, but by no means with all, DDN Standard X.25 implementations. This problem has especially been seen in situations where an X.25-connected ARPANET host establishes communication with a MILNET host (or vice-versa). In this situation, because of the Mailbridge "homing rules", traffic often flows across a different Mailbridge in each direction. Thus, user data flow is essentially unidirectional across each of two VCs. With other patterns of communication, a symmetric, bidirectional user data flow would generate one of those "events" that seems to "unblock" the flow over the VC. This problem is not observed in communication between pairs of hosts that are BOTH X.25-connected, or BOTH 1822-connected, or in situations where the X.25-connected host initiates the VC. It can arise when a host with a connection-less (i.e., 1822) interface initiates communication with a host with a connection-oriented (i.e., X.25) interface and "the network" has to initiate the connection. We believe that what's happening here is that the receiving host's X.25 isn't sending an RR to the PSN for the first data packet it receives when the PSN opens the VC. Under the New End-to-End Protocol, when going from an 1822-connected host to an X.25-connected host, the PSNs wait to see an RR for the first packet before subsequent packets are sent from the source PSN to the destination PSN (and a RFNM is returned to the originating 1822-connected host). Under the Old End-to-End Protcol, subsequent packets were sent as soon as the receiving host accepted the VC (up to the limit of the window); this could result in a RFNM being sent to the originating host before the destination host actually acknowledged the packet via an RR! (The different behavior of the New End-to-End was intended as a fix for what was a bug, or perhaps a "cheat", in the old design with respect to the meaning of a RFNM.) In the case of a symmetric traffic flow, an RR is typically piggybacked on a data packet. But, as was mentioned above, traffic flows involving Mailbridges frequently aren't symmetric. Typically, X.25 implementations send an RR after some brief timeout if there's no user packet going out over the VC on which to piggyback the RR. But if there is neither traffic nor a timeout, and no RR is sent, the flow will cease as described above. We're going to change the PSNs to behave as they did under the Old End-to-End in this regard, at least temporarily. This will give us time to work with implementors to resolve this issue. 2. The "pinging yourself" problem. We've found a timing bug in the PSN that is sometimes triggered when a host "pings" itself. Other situations can trigger it, but the timing of the sequence of events that occurs when some hosts ping themselves seems to be the most conducive to triggering the bug. The result is that the PSN doesn't acknowledge delivery of one or more messages to the host in question. We've found a race condition in the PSN code. A fix for this bug will be installed in the ARPANET within the next day or so. 3. The "multiple of 128 bytes" problem. Several people have reported a problem with packets apparently being dropped by the network when they are a multiple of 128 bytes (perhaps +/- a few?) in length. We are actively investigating this problem. Anyone with data or insight with regard to this is encouraged to contact Andy Malis (Malis@bbn.com) or ARPAUPGRADE@bbn.com. 4. The gateway at Yale has mostly been off the net since the cutover to the New End-to-End Protocol. They are the only gateway connected to the ARPANET via an HDH interface running at 9.6 kb/s, and are the only ones experiencing this particular problem. We believe there is a PSN bug that we haven't been able to find yet; so far, we have been unable to duplicate this problem in our lab. In the meantime, we have developed a work-around that will enable Yale to be up on the ARPANET while we work to find and fix the bug. We apologize for the inconvenience, and thank the folks at Yale for their patience and understanding. Finally, implementors have asked if they can be included on the "ARPAUPGRADE" mailing list. We use this list as a "hot line" for getting information from the community to us and to DCA, and for internal discussion about the problems. The idea of having an implementor's mailing list is a good one, however, and we will shortly set up a new list for those who are actively helping us track down these various problems. Please keep those cards and letters coming! Regards, Ken Pogran BBN COMMUNICATIONS CORPORATION
SATZ@MATHOM.CISCO.COM (Greg Satz) (12/19/87)
From: Ken Pogran <pogran@ccq.bbn.com> Subject: An ARPANET update We believe that what's happening here is that the receiving host's X.25 isn't sending an RR to the PSN for the first data packet it receives when the PSN opens the VC. Under the New End-to-End Protocol, when going from an 1822-connected host to an X.25-connected host, the PSNs wait to see an RR for the first packet before subsequent packets are sent from the source PSN to the destination PSN (and a RFNM is returned to the originating 1822-connected host). Under the Old End-to-End Protocol, subsequent packets were sent as soon as the receiving host accepted the VC (up to the limit of the window); this could result in a RFNM being sent to the originating host before the destination host actually acknowledged the packet via an RR! (The different behavior of the New End-to-End was intended as a fix for what was a bug, or perhaps a "cheat", in the old design with respect to the meaning of a RFNM.) In the case of a symmetric traffic flow, an RR is typically piggybacked on a data packet. But, as was mentioned above, traffic flows involving Mailbridges frequently aren't symmetric. Typically, X.25 implementations send an RR after some brief timeout if there's no user packet going out over the VC on which to piggyback the RR. But if there is neither traffic nor a timeout, and no RR is sent, the flow will cease as described above. We're going to change the PSNs to behave as they did under the Old End-to-End in this regard, at least temporarily. This will give us time to work with implementors to resolve this issue. The X.25 specification ommits any explanation on when to send RR acknowledgements. For a low bandwidth link, it is seems reasonable to wait until the window is (almost) full before sending an RR. It is also reasonable to want to send an RR (if you don't have an outgoing data packet ready) after every received data packet when the acknowledgements must traverse the network instead of having local significance. Our implementation has a parameter that lets you send an acknowledgment when N input data packets have been seen. An RR acknowledgement is sent if there aren't any outgoing data packets. The drawback is that RRs will be sent more often then necessary. Adding a timer is a very good idea. After rereading the "Procedures for flow control" I came across the section of "Delivery confirmation". I don't know if it is worth it, but maybe the D-bit could be used? -------
malis@CC5.BBN.COM (Andy Malis) (12/20/87)
Mike, If I may reply for Ken, letting one packet through per circuit setup was never an intended mode of operation, and it only seems to be affecting the Sun X.25 package. Every new release of PSN software has to pass a rigorous DCA test plan. PSN 7.0 has already passed that test. Andy
oconnor@SCCGATE.SCC.COM (Michael J. O'Connor) (12/21/87)
Andy, I finally contacted Hilarie Orman (ho@tis) who is a fellow Sun sufferer (in the sense that she is also plagued by this X.25 problem). Hilarie tells me that the problem I'm currently seeing with my L3 software locking up is one that has been plaguing her system for a while. I believe she said the problem occured before the new End-to-End. I can assure you that we never had this problem under the old End-to-End or I would never have mentioned it to you. We maybe running older Sun X.25 software than Hilarie is, but whatever the reason, it sounds like that part of my problem is Sun's responsibility. Just to keep the record straight, I still have the 'thrashing circuits'. Mike
malis@CC5.BBN.COM (Andy Malis) (12/21/87)
Mike, Thanks for the additional info. Are there any dates or version IDs available for your and Hilarie's Sun X.25 packages? Andy