pogran@CCQ.BBN.COM (Ken Pogran) (12/24/87)
Folks, Here's where we stand on resolving ARPANET PSN 7 problems: Please refer to my message of 15 December entitled "An ARPANET Update" for a description of the problems referred to here. We have successfully tested fixes for the "one packet problem" and the "pinging yourself" problem. These patches should be deployed ARPANET-wide within the next day or so. We have identified the "multiple of 128 bytes" problem as a host software problem. Here are the details: 1. The "one packet problem" (otherwise known as the "stuck VC problem," "thrashing VC problem," etc.). Known to affect Sun X.25 hosts. When an 1822-connected host begins to send data to an X.25-connected host, the destination PSN, to which the X.25-connected host is attached, must open an X.25 VC with to the destination host. Under PSN 7, the PSN opens the VC, sends the first IP datagram, and waits for an RR from the host before allowing the source PSN to send additional data across the network (and to return a RFNM to the source host for the first packet). This behavior is different from PSN 6, where up to 8 datagrams could be sent to the destination host. Under PSN 6, a source host could conceivably receive the RFNM for the first such datagram before the datagram was acknowledged by the destination host. RRs are often piggy-backed on traffic flowing over the same VC in the opposite direction. However, conditions such as Mailbridge homing in the DDN can produce asymmetric flows. Many X.25 implementations send an RR to acknowledge a packet based on the expiration of a timer, if there is no reverse traffic. Sun X.25 does not, however, but instead waits for the window to become "half full". The behavior of the "interoperability" mechanism of PSN 7, together with the behavior of the Sun X.25, created a "deadly embrace" in which only one datagram would be received on the VC. Behavior of the PSN 7 interoperability mechanism is being changed to eliminate this condition. The patch to do this has been tested in our lab and will be deployed to the ARPANET shortly. NOTE: A patch was deployed last week in an attempt to fix this problem. That patch did not work, and was removed from the network last night. We had been unable to test that patch in the lab beforehand, because at the time we did not have a Sun with an X.25 interface hooked up to our lab net. 2. The "pinging yourself" problem. The timing bug described in my message of 15 December has been fixed in a patch tested earlier today. Mike Petry at UMd was our "guinea pig", and reports that the problem he saw has in fact been corrected by this patch. This patch will also be deployed to the ARPANET shortly. 3. The "multiple of 128 bytes" problem. Using our Sun with X.25 interface in our lab net, and with a data scope on the X.25 link between the PSN and the Sun, we tried "pinging" the Sun from another host. We found that with packets of length 127, 128, 255, 256 ... the datascope showed the "ping" going to the Sun, but no response from the Sun. With packets of other length, the datascope showed the "ping" and its reply going across the link. The packets from the PSN are well-formed in every respect. At this point we can only assume there's a bug in the host code. --> Has anyone OTHER than folks with Suns with X.25 interfaces seen this problem? If so, please send a message to ARPAUPGRADE@BBN.COM. Happy holidays, everyone. Ken Pogran BBN COMMUNICATIONS
melohn@Sun.COM (Bill Melohn) (01/05/88)
After catching up on my mail over the holidays, it appeared as though everyone now believes the problems related to "stuck VCs" have been fixed. However, when I drop my host Sun.COM back to the non-patched version of our kernel (the one that doesn't attempt to kludge around the BBN bug by always sending an RR with each packet) I still notice "stuck VCs", easily reproduceable on one-way VCs between us and machines on IMPs 11 and 68. Either the latest version of Andy's patch has not been fully deployed, or it too does NOT fix the problem. On the 128 byte packet problem; we are in the process of getting packet traces from the ARPAnet to see exactly what the packet traffic looks like when we appear to lose the 128 byte packets. I should point out that this too only appears to happen between us and 1822 hosts running the new end to end; I suspect that we will find another PSN 7.0 bug at the root cause of this problem. More as soon as I have the traces. We are in the process of testing a new version of our software to handle multiple incoming VCs from the same IP host. Because multiple VCs are used for X.25 loopback by the PSN under the new end to end, we feel we have little choice but to support them. I do feel that requiring this support without any warning that such support would be required by the new end to end was a mistake, one that our mutual customers may have to live with until we can test and manufacture a new software release. It also conceptually wastes VCs, which are limited resources in many X.25 implementations, because it encourages in many more cases two or more one-way VCs between host pairs where a single VC would have existed under the old end to end.