[comp.protocols.tcp-ip] An ARPANET update

pogran@CCQ.BBN.COM (Ken Pogran) (12/16/87)

Several problems with regard to the New End-to-End Protocol
deployed on the ARPANET with PSN Release 7 were discussed in
messages to this list over the past couple of days.  Here's a
report on what we at BBN have uncovered so far.

We are working on three general problems and one problem that is
specific to a particular host.  The "general" problems affect
communication with some hosts (including some gateways) connected
to the ARPANET with X.25 interfaces.

1.  The "one packet problem." The scenario has been described by
    several folks, and runs about like this: An 1822-connected
    gateway has traffic for an X.25-connected host.  Sending the
    first datagram into the net causes an X.25 VC to be opened to
    the destination host.  One and only one packet is received by
    the host, and the flow stops.  Various events can cause the
    flow to become "unblocked", such as sending traffic FROM the
    host back over the same VC.  This problem has been observed
    with several, but by no means with all, DDN Standard X.25
    implementations.

    This problem has especially been seen in situations where an
    X.25-connected ARPANET host establishes communication with a
    MILNET host (or vice-versa).  In this situation, because of
    the Mailbridge "homing rules", traffic often flows across a
    different Mailbridge in each direction.  Thus, user data flow
    is essentially unidirectional across each of two VCs.  With
    other patterns of communication, a symmetric, bidirectional
    user data flow would generate one of those "events" that
    seems to "unblock" the flow over the VC.

    This problem is not observed in communication between pairs
    of hosts that are BOTH X.25-connected, or BOTH
    1822-connected, or in situations where the X.25-connected
    host initiates the VC.  It can arise when a host with a
    connection-less (i.e., 1822) interface initiates
    communication with a host with a connection-oriented (i.e.,
    X.25) interface and "the network" has to initiate the
    connection.

    We believe that what's happening here is that the receiving
    host's X.25 isn't sending an RR to the PSN for the first data
    packet it receives when the PSN opens the VC.  Under the New
    End-to-End Protocol, when going from an 1822-connected host
    to an X.25-connected host, the PSNs wait to see an RR for the
    first packet before subsequent packets are sent from the
    source PSN to the destination PSN (and a RFNM is returned to
    the originating 1822-connected host).  Under the Old
    End-to-End Protcol, subsequent packets were sent as soon as
    the receiving host accepted the VC (up to the limit of the
    window); this could result in a RFNM being sent to the
    originating host before the destination host actually
    acknowledged the packet via an RR!  (The different behavior
    of the New End-to-End was intended as a fix for what was a
    bug, or perhaps a "cheat", in the old design with respect to
    the meaning of a RFNM.)

    In the case of a symmetric traffic flow, an RR is typically
    piggybacked on a data packet.  But, as was mentioned above,
    traffic flows involving Mailbridges frequently aren't
    symmetric.  Typically, X.25 implementations send an RR after
    some brief timeout if there's no user packet going out over
    the VC on which to piggyback the RR.  But if there is neither
    traffic nor a timeout, and no RR is sent, the flow will cease
    as described above.

    We're going to change the PSNs to behave as they did under
    the Old End-to-End in this regard, at least temporarily.
    This will give us time to work with implementors to resolve
    this issue.

2.  The "pinging yourself" problem.  We've found a timing bug in
    the PSN that is sometimes triggered when a host "pings"
    itself.  Other situations can trigger it, but the timing of
    the sequence of events that occurs when some hosts ping
    themselves seems to be the most conducive to triggering the
    bug.  The result is that the PSN doesn't acknowledge delivery
    of one or more messages to the host in question.  We've found a
    race condition in the PSN code.  A fix for this bug will be
    installed in the ARPANET within the next day or so.

3.  The "multiple of 128 bytes" problem.  Several people have
    reported a problem with packets apparently being dropped by
    the network when they are a multiple of 128 bytes (perhaps
    +/- a few?) in length.  We are actively investigating this
    problem.  Anyone with data or insight with regard to this is
    encouraged to contact Andy Malis (Malis@bbn.com) or
    ARPAUPGRADE@bbn.com.

4.  The gateway at Yale has mostly been off the net since the
    cutover to the New End-to-End Protocol.  They are the only
    gateway connected to the ARPANET via an HDH interface running
    at 9.6 kb/s, and are the only ones experiencing this
    particular problem.  We believe there is a PSN bug that we
    haven't been able to find yet; so far, we have been unable to
    duplicate this problem in our lab.  In the meantime, we have
    developed a work-around that will enable Yale to be up on the
    ARPANET while we work to find and fix the bug.  We apologize
    for the inconvenience, and thank the folks at Yale for their
    patience and understanding.

Finally, implementors have asked if they can be included on the
"ARPAUPGRADE" mailing list.  We use this list as a "hot line" for
getting information from the community to us and to DCA, and for
internal discussion about the problems.  The idea of having an
implementor's mailing list is a good one, however, and we will
shortly set up a new list for those who are actively helping us
track down these various problems.  

Please keep those cards and letters coming!

Regards,
 Ken Pogran
 BBN COMMUNICATIONS CORPORATION

SATZ@MATHOM.CISCO.COM (Greg Satz) (12/19/87)

    From: Ken Pogran <pogran@ccq.bbn.com>
    Subject: An ARPANET update
    
        We believe that what's happening here is that the receiving
        host's X.25 isn't sending an RR to the PSN for the first data
        packet it receives when the PSN opens the VC.  Under the New
        End-to-End Protocol, when going from an 1822-connected host
        to an X.25-connected host, the PSNs wait to see an RR for the
        first packet before subsequent packets are sent from the
        source PSN to the destination PSN (and a RFNM is returned to
        the originating 1822-connected host).  Under the Old
        End-to-End Protocol, subsequent packets were sent as soon as
        the receiving host accepted the VC (up to the limit of the
        window); this could result in a RFNM being sent to the
        originating host before the destination host actually
        acknowledged the packet via an RR!  (The different behavior
        of the New End-to-End was intended as a fix for what was a
        bug, or perhaps a "cheat", in the old design with respect to
        the meaning of a RFNM.)
    
        In the case of a symmetric traffic flow, an RR is typically
        piggybacked on a data packet.  But, as was mentioned above,
        traffic flows involving Mailbridges frequently aren't
        symmetric.  Typically, X.25 implementations send an RR after
        some brief timeout if there's no user packet going out over
        the VC on which to piggyback the RR.  But if there is neither
        traffic nor a timeout, and no RR is sent, the flow will cease
        as described above.
    
        We're going to change the PSNs to behave as they did under
        the Old End-to-End in this regard, at least temporarily.
        This will give us time to work with implementors to resolve
        this issue.

The X.25 specification ommits any explanation on when to send RR
acknowledgements. For a low bandwidth link, it is seems reasonable to
wait until the window is (almost) full before sending an RR. It is also
reasonable to want to send an RR (if you don't have an outgoing data
packet ready) after every received data packet when the acknowledgements
must traverse the network instead of having local significance.

Our implementation has a parameter that lets you send an acknowledgment
when N input data packets have been seen. An RR acknowledgement is sent
if there aren't any outgoing data packets. The drawback is that RRs will
be sent more often then necessary. Adding a timer is a very good idea.

After rereading the "Procedures for flow control" I came across the
section of "Delivery confirmation". I don't know if it is worth it, but
maybe the D-bit could be used?
-------

malis@CC5.BBN.COM (Andy Malis) (12/20/87)

Mike,

If I may reply for Ken, letting one packet through per circuit
setup was never an intended mode of operation, and it only seems
to be affecting the Sun X.25 package.

Every new release of PSN software has to pass a rigorous DCA test
plan.  PSN 7.0 has already passed that test.

Andy

oconnor@SCCGATE.SCC.COM (Michael J. O'Connor) (12/21/87)

Andy,
	I finally contacted Hilarie Orman (ho@tis) who is a fellow Sun
sufferer (in the sense that she is also plagued by this X.25 problem).
Hilarie tells me that the problem I'm currently seeing with my L3 software
locking up is one that has been plaguing her system for a while.  I believe
she said the problem occured before the new End-to-End.  I can assure you
that we never had this problem under the old End-to-End or I would never have
mentioned it to you.  We maybe running older Sun X.25 software than Hilarie is,
but whatever the reason, it sounds like that part of my problem is Sun's
responsibility.
	Just to keep the record straight, I still have the 'thrashing circuits'.

		Mike

malis@CC5.BBN.COM (Andy Malis) (12/21/87)

Mike,

Thanks for the additional info.  Are there any dates or version
IDs available for your and Hilarie's Sun X.25 packages?

Andy