pogran@CCQ.BBN.COM (Ken Pogran) (12/16/87)
Several problems with regard to the New End-to-End Protocol
deployed on the ARPANET with PSN Release 7 were discussed in
messages to this list over the past couple of days. Here's a
report on what we at BBN have uncovered so far.
We are working on three general problems and one problem that is
specific to a particular host. The "general" problems affect
communication with some hosts (including some gateways) connected
to the ARPANET with X.25 interfaces.
1. The "one packet problem." The scenario has been described by
several folks, and runs about like this: An 1822-connected
gateway has traffic for an X.25-connected host. Sending the
first datagram into the net causes an X.25 VC to be opened to
the destination host. One and only one packet is received by
the host, and the flow stops. Various events can cause the
flow to become "unblocked", such as sending traffic FROM the
host back over the same VC. This problem has been observed
with several, but by no means with all, DDN Standard X.25
implementations.
This problem has especially been seen in situations where an
X.25-connected ARPANET host establishes communication with a
MILNET host (or vice-versa). In this situation, because of
the Mailbridge "homing rules", traffic often flows across a
different Mailbridge in each direction. Thus, user data flow
is essentially unidirectional across each of two VCs. With
other patterns of communication, a symmetric, bidirectional
user data flow would generate one of those "events" that
seems to "unblock" the flow over the VC.
This problem is not observed in communication between pairs
of hosts that are BOTH X.25-connected, or BOTH
1822-connected, or in situations where the X.25-connected
host initiates the VC. It can arise when a host with a
connection-less (i.e., 1822) interface initiates
communication with a host with a connection-oriented (i.e.,
X.25) interface and "the network" has to initiate the
connection.
We believe that what's happening here is that the receiving
host's X.25 isn't sending an RR to the PSN for the first data
packet it receives when the PSN opens the VC. Under the New
End-to-End Protocol, when going from an 1822-connected host
to an X.25-connected host, the PSNs wait to see an RR for the
first packet before subsequent packets are sent from the
source PSN to the destination PSN (and a RFNM is returned to
the originating 1822-connected host). Under the Old
End-to-End Protcol, subsequent packets were sent as soon as
the receiving host accepted the VC (up to the limit of the
window); this could result in a RFNM being sent to the
originating host before the destination host actually
acknowledged the packet via an RR! (The different behavior
of the New End-to-End was intended as a fix for what was a
bug, or perhaps a "cheat", in the old design with respect to
the meaning of a RFNM.)
In the case of a symmetric traffic flow, an RR is typically
piggybacked on a data packet. But, as was mentioned above,
traffic flows involving Mailbridges frequently aren't
symmetric. Typically, X.25 implementations send an RR after
some brief timeout if there's no user packet going out over
the VC on which to piggyback the RR. But if there is neither
traffic nor a timeout, and no RR is sent, the flow will cease
as described above.
We're going to change the PSNs to behave as they did under
the Old End-to-End in this regard, at least temporarily.
This will give us time to work with implementors to resolve
this issue.
2. The "pinging yourself" problem. We've found a timing bug in
the PSN that is sometimes triggered when a host "pings"
itself. Other situations can trigger it, but the timing of
the sequence of events that occurs when some hosts ping
themselves seems to be the most conducive to triggering the
bug. The result is that the PSN doesn't acknowledge delivery
of one or more messages to the host in question. We've found a
race condition in the PSN code. A fix for this bug will be
installed in the ARPANET within the next day or so.
3. The "multiple of 128 bytes" problem. Several people have
reported a problem with packets apparently being dropped by
the network when they are a multiple of 128 bytes (perhaps
+/- a few?) in length. We are actively investigating this
problem. Anyone with data or insight with regard to this is
encouraged to contact Andy Malis (Malis@bbn.com) or
ARPAUPGRADE@bbn.com.
4. The gateway at Yale has mostly been off the net since the
cutover to the New End-to-End Protocol. They are the only
gateway connected to the ARPANET via an HDH interface running
at 9.6 kb/s, and are the only ones experiencing this
particular problem. We believe there is a PSN bug that we
haven't been able to find yet; so far, we have been unable to
duplicate this problem in our lab. In the meantime, we have
developed a work-around that will enable Yale to be up on the
ARPANET while we work to find and fix the bug. We apologize
for the inconvenience, and thank the folks at Yale for their
patience and understanding.
Finally, implementors have asked if they can be included on the
"ARPAUPGRADE" mailing list. We use this list as a "hot line" for
getting information from the community to us and to DCA, and for
internal discussion about the problems. The idea of having an
implementor's mailing list is a good one, however, and we will
shortly set up a new list for those who are actively helping us
track down these various problems.
Please keep those cards and letters coming!
Regards,
Ken Pogran
BBN COMMUNICATIONS CORPORATIONSATZ@MATHOM.CISCO.COM (Greg Satz) (12/19/87)
From: Ken Pogran <pogran@ccq.bbn.com>
Subject: An ARPANET update
We believe that what's happening here is that the receiving
host's X.25 isn't sending an RR to the PSN for the first data
packet it receives when the PSN opens the VC. Under the New
End-to-End Protocol, when going from an 1822-connected host
to an X.25-connected host, the PSNs wait to see an RR for the
first packet before subsequent packets are sent from the
source PSN to the destination PSN (and a RFNM is returned to
the originating 1822-connected host). Under the Old
End-to-End Protocol, subsequent packets were sent as soon as
the receiving host accepted the VC (up to the limit of the
window); this could result in a RFNM being sent to the
originating host before the destination host actually
acknowledged the packet via an RR! (The different behavior
of the New End-to-End was intended as a fix for what was a
bug, or perhaps a "cheat", in the old design with respect to
the meaning of a RFNM.)
In the case of a symmetric traffic flow, an RR is typically
piggybacked on a data packet. But, as was mentioned above,
traffic flows involving Mailbridges frequently aren't
symmetric. Typically, X.25 implementations send an RR after
some brief timeout if there's no user packet going out over
the VC on which to piggyback the RR. But if there is neither
traffic nor a timeout, and no RR is sent, the flow will cease
as described above.
We're going to change the PSNs to behave as they did under
the Old End-to-End in this regard, at least temporarily.
This will give us time to work with implementors to resolve
this issue.
The X.25 specification ommits any explanation on when to send RR
acknowledgements. For a low bandwidth link, it is seems reasonable to
wait until the window is (almost) full before sending an RR. It is also
reasonable to want to send an RR (if you don't have an outgoing data
packet ready) after every received data packet when the acknowledgements
must traverse the network instead of having local significance.
Our implementation has a parameter that lets you send an acknowledgment
when N input data packets have been seen. An RR acknowledgement is sent
if there aren't any outgoing data packets. The drawback is that RRs will
be sent more often then necessary. Adding a timer is a very good idea.
After rereading the "Procedures for flow control" I came across the
section of "Delivery confirmation". I don't know if it is worth it, but
maybe the D-bit could be used?
-------malis@CC5.BBN.COM (Andy Malis) (12/20/87)
Mike, If I may reply for Ken, letting one packet through per circuit setup was never an intended mode of operation, and it only seems to be affecting the Sun X.25 package. Every new release of PSN software has to pass a rigorous DCA test plan. PSN 7.0 has already passed that test. Andy
oconnor@SCCGATE.SCC.COM (Michael J. O'Connor) (12/21/87)
Andy, I finally contacted Hilarie Orman (ho@tis) who is a fellow Sun sufferer (in the sense that she is also plagued by this X.25 problem). Hilarie tells me that the problem I'm currently seeing with my L3 software locking up is one that has been plaguing her system for a while. I believe she said the problem occured before the new End-to-End. I can assure you that we never had this problem under the old End-to-End or I would never have mentioned it to you. We maybe running older Sun X.25 software than Hilarie is, but whatever the reason, it sounds like that part of my problem is Sun's responsibility. Just to keep the record straight, I still have the 'thrashing circuits'. Mike
malis@CC5.BBN.COM (Andy Malis) (12/21/87)
Mike, Thanks for the additional info. Are there any dates or version IDs available for your and Hilarie's Sun X.25 packages? Andy