[comp.protocols.tcp-ip] TCP Urgent Data Handling

rjh@intelob.intel.com (Bob Hathaway) (09/27/89)

I'm implementing TCP urgent data handling for our product line and have
discovered some ambiguous semantics.  This implementation will support
multiple transport layer interfaces including a Unix socket layer and must
be able to communicate with other TCP/IP implementations, however it appears
BSD Unix doesn't implement the TCP specifications as I would expect.
I'd appreciate hearing from any internet experts or be referred to any 
existing specifications which can clarify urgent data handling.


TCP RFC 793 and the MIL STD do not offer precise semantics for urgent
data handling.  Single byte messages are simple but larger messages seem
to be poorly defined.  For example, Ultrix assumes the first byte of a
multi-character message is urgent and 4.3 BSD assumes the last byte.  Also,
4.3 breaks large urgent messages into several segments with the URG bit
set and the urgent pointer pointing to just past the data *in each segment*.
The receiver will believe each segment is an urgent message and each segment
will override the last saved urgent byte unless inlining is specified.  This
implementation seems erroneous.

A more correct interpretation of the TCP specifications for multi-segment
urgent messages seems to be setting the URG bit on the first segment only 
and setting the urgent pointer to one byte past the last byte in the entire 
multi-segment urgent message.  The transport service will consider an entire
urgent message as urgent data allowing the socket layer to extract a single
byte from the urgent message if necessary.  Future socket implementations
will hopefully conform more closely to the TCP specification.  With this
interpretation, a receiver sets TCB variable RCV.UP = SEG.UP when an URG bit
is detected and arriving data up to *RCV.UP* is assumed to be urgent.

For example, this interpretation of the TCP specification will result in:

             URGENT MESSAGE		  SEGMENTS
	     ==============               ========
	      					URG=1, UP=3000 -+
           m	|-------|	          m     |-------|	|
		|	|	     		|	|	|
		|	|	     m + 1023	|-------|	|
		|	|			 	 	|
		|	|		        URG=UP=0	|
		~	~	     m + 1024   |-------|	|
		|	|			|	|	|
		|	|	     m + 2047   |-------|	|
		|	|					|
		|	|		        URG=UP=0	|
		|	|	     m + 2048   |-------|	|
     m + 2999	|-------|	     		|	|	|
				     m + 2999   |-------|	|
							<-------+

PSH will also be set on all outgoing segments. 

For reliability, SEG.UP would have to be constraint checked and segments
with the URG bit set and a new UP which arrive past the first segment but
within the original urgent message would have to be handled, for example
when the second or third segments above arrived with URG=1, UP=new value.
These updates could be considered errors by sending a RST with logging or 
could be considered correct by updating RCV.UP.  We opt for considering UP
updates within messages an error condition and disconnect with a RST because
this indicates the peer is out of sync.

This scheme raises some important questions such as compatibility with
existing systems and correctness.  Are there any existing specifications
or internet experts which can clarify this?

Thanks,
Bob Hathaway
rjh@inteloa.intel.com
!tektronix!biin!rjh

hedrick@geneva.rutgers.edu (Charles Hedrick) (09/29/89)

The urgent bit is sort of odd.  Your questions suggest that you want
to be able to tell exactly what data is urgent and what isn't.  That's
not really what the original intent was, I don't think.  There are no
"messages" in TCP, nor is there any intent in the spec to define where
urgent data begins, only where it ends.  Urgency is intended to
indicate a condition that takes effect immediately.  Its beginning is
not synchronized with the data stream.  Thus urgent should be set in
the next datagram to be sent, even if it is a retransmission and did
not have urgent set last time.  (This may be hard to implement of
course, and as far as I know is not.  But conceptually it would make
sense, and it would improve some aspects of telnet and rlogin
behavior.)  Furthermore, if two sections of "urgency" are adjacent,
they may look like one.  Both of the uses of urgent data that I know
-- telnet and rlogin -- are designed with these concepts in mind.
This means that the concept of "urgent message" is not well defined.
Only the concept of "last urgent byte" makes any sense, and with the
off by one fiasco, even that is ambiguous.  I would suggest that you
*not* try to "clarify" urgent, but stick with the 4.3 semantics.
Anyone who needs urgent "messages" should arrange to delimit the
messages by some sort of marker in the data.

guy@guy.uucp (Guy Streeter) (09/29/89)

hedrick@geneva.rutgers.edu (Charles Hedrick) writes:
> ...
>Only the concept of "last urgent byte" makes any sense, and with the
>off by one fiasco, even that is ambiguous.  I would suggest that you
>*not* try to "clarify" urgent, but stick with the 4.3 semantics.
> ...

RFC 1011 - Official Internet Protocols                          May 1987


         Urgent:  Page 17 is wrong.  The urgent pointer points to the
         last octet of urgent data (not to the first octet of non-urgent
         data).

 ...

***DRAFT RFC***          TRANSPORT LAYER -- TCP            June 16, 1989

         4.2.2.4  Urgent Pointer: RFC-793 Section 3.1

            The second sentence is in error: the urgent pointer points
            to the sequence number of the LAST octet (not LAST+1) in a
            sequence of urgent data.

 ...

BSD 4.3 sets the urgent pointer to the LAST+1 octet in a sequence of urgent
data.  Whenever anyone else violates a protocol, the response in this
newsgroup is always "Fix your software!"  Should we propagate Berkeley's
error in the interest of compatibility, or should we do it right?

Guy Streeter
b11!guy!guy@ingr.com
...uunet!ingr!b11!guy!guy

subbu@hpindda.HP.COM (MCV Subramaniam) (09/30/89)

>multi-character message is urgent and 4.3 BSD assumes the last byte.  Also,
>4.3 breaks large urgent messages into several segments with the URG bit
>set and the urgent pointer pointing to just past the data *in each segment*.

	4.3 TCP considers only one byte in the message as urgent. Therefore,
	if you call send() with 2000 bytes, the last byte will be considered
	urgent, and the SND.UP pointer will be updated to that value. 
	Thereafter, every packet sent will have the URG flag set, but
	the URP in the packet will be equal to SND.UP. That is, each
	segment will contain the same value of URP in it. I believe 4.2
	used to set URG flag and URP only in the segments in which the
	OOB byte is actually transmitted. (Someone correct me if I am 
	wrong).

>The receiver will believe each segment is an urgent message and each segment
>will override the last saved urgent byte unless inlining is specified.  This
	
	If, however, you call multiple send()s with MSG_OOB, then each
	segment sent will contain one urgent byte, and if the user has not
	received the last OOB byte, it will be overridden with the new one,
	and RCV.UP will be moved past the new URP received.

	The following bugs exist in 4.3 BSD urgent (OOB) data handling:

1.	If you send Out of band bytes *too* frequently, i.e. send the 
	next OOB before the first is acknowledged, then 4.3 BSD TCP leads
	to data corruption (if you don't use OOBINLINE option). 
	[Bug in the sender]

2.	Also, if 4.3 BSD TCP transmits (say) 3 segments and the third one
	had the URG flag set, and the first one got lost, then SND.UP
	gets messed up when the first segment is retransmitted. This, again,
	leads to data corruption. [Bug in the sender]

3.	If segments containing urgent bytes have to be retransmitted, and
	get reassembled in the receivers TCP reassembly queue, data corruption
	could result [Bug in the receiver - Reassembly code]

-Subbu

CERF@A.ISI.EDU (10/01/89)

Bob,

The most sensible implementation of URGENT POINTER is to mark the
byte just past the end of the urgent message. If the message is
broken into segments, one could continue to set URG=1 and
UP="byte number 1 past the last byte of urgent data". Resetting
URG and UP is OK, too, so long as the recipient remembers UP
until the received in sequence bytes exceed the last byte of
urgent data.

The implementation which sets UP to just past the data of each
segment isn;t necessarily broken, but it seems unnecessary
to implement in that fashion. 

The question of first or last byte of an urgent message caught
me by surprise. At the TCP level, the only thing you can
specify is where the urgent data ends, not where it starts.
The interface between the process wanting to send urgent 
information and the kernel TCP service needs to have a way for
the process to say where the urgent data ENDS, since that is
the information that the TCP can convey.

The receiving process will need clues in addition to those provided
by TCP to distinguish the urgent from non-urgent information -
these semantic and syntactic matters were left to the protocol
layer above TCP to deal with.

Vint