[comp.protocols.tcp-ip] Reliable Datagram ??? Protocols

postel@VENERA.ISI.EDU (10/16/90)

Hi.

"Reliable Datagram" is an oxymoron.

--jon.

	From tcp-ip-RELAY@NIC.DDN.MIL Fri Oct 12 17:39:16 1990
	Date: 12 Oct 90 10:12:49 GMT
	From: apple.com!erekose@apple.com  (Erik Scheelke)
	Subject: Reliable Datagram Protocols
	Sender: tcp-ip-relay@nic.ddn.mil
	To: tcp-ip@nic.ddn.mil

	Does anyone know of a reliable connectionless datagram protocol that
	runs on top of UDP?  Is so, is there a library out there I can get?

	Thanks in advance,
	Erik Scheelke

jdarcy@encore.com (Floating Exception) (10/16/90)

apple.com!erekose@apple.com  (Erik Scheelke):
>	Does anyone know of a reliable connectionless datagram protocol that
>	runs on top of UDP?  Is so, is there a library out there I can get?

postel@VENERA.ISI.EDU writes:
>Hi.
>
>"Reliable Datagram" is an oxymoron.

Very funny.  Really.  I would guess, however, that Erik is referring to a
connectionless protocol that preserves message boundaries and guarantees
delivery but not necessarily sequencing or non-duplication.  I'm sure such
beasts exist somewhere.

--

Jeff d'Arcy, Generic Software Engineer - jdarcy@encore.com
      Nothing was ever achieved by accepting reality

cjohnson@somni.wpd.sgi.com (Chris Johnson) (10/17/90)

> postel@VENERA.ISI.EDU sez:
>
> "Reliable Datagram" is an oxymoron.
>
> --jon.

Perhaps reliable datagram using UDP is an oxymoron, depending on the
transport layer.

However XTP datagrams *are* reliable.  Mail to xtp-request@pei.com
for information or a XTP spec.

					Chris Johnson
					cjohnson@pei.com

jbvb@FTP.COM (James B. Van Bokkelen) (10/22/90)

    apple.com!erekose@apple.com  (Erik Scheelke):
    >     Does anyone know of a reliable connectionless datagram protocol that
    >     runs on top of UDP?  Is so, is there a library out there I can get?

    postel@VENERA.ISI.EDU writes:
    >Hi.
    >
    >"Reliable Datagram" is an oxymoron.

    Very funny.  Really.  I would guess, however, that Erik is referring to a
    connectionless protocol that preserves message boundaries and guarantees
    delivery but not necessarily sequencing or non-duplication.  I'm sure such
    beasts exist somewhere.

What Jon said was the very, very condensed summary of an issue that he has
no doubt seen hashed over far too many times.  Even though I've been over
it twice as many times on the phone to customers as I've seen it discussed
here, I'll try my hand at laying it out in long form.

"Datagrams" are defined as single messages.  Sometimes you send one and you
don't expect an answer.  Sometimes you kind of hope for a reply, but the
transaction you are attempting isn't worth the overhead of setting up a
connection; if you don't get an answer, the request may have been lost, the
server may be down, or the reply may have been lost.  You don't care, there
are many servers, and your timeout handler has just sent a duplicate query
to another of them...

"Connectionless" means that there is no state at either the source or the
destination of the message.  Thus, a connectionless protocol cannot
guarantee delivery.  If the sender keeps enough state and includes enough
information in each message to guarantee delivery (e.g. some sort of
unique ID, and a timeout if the guaranteed response doesn't arrive), you
only need to add a little state to the receiver to allow sequencing and
non-duplication.  If the application must keep track because the
transport doesn't, it still looks like a connection to me...

So, every time this came up in the past, the next stage was to ask the
person looking for a "reliable connectionless protocol" or somesuch what
was really needed.  The most frequent goal has been some sort of transport
for a little machine, or a new one for which there is no networking
software yet.  The searcher doesn't want to implement all of TCP, and sees
"datagrams" as being easier, particularly on a single Ethernet.  Another
common goal has been to get very high throughput by avoiding the "excessive
overhead" of TCP, but Van Jacobsen's research has more or less laid that
one to rest.  A third one has been "preservation of message boundaries".

There are a number of 'reliable' alternatives to TCP, including NETBLT
(optimized for block transfers over lossy links), ISO TP, RDP and others.
Those I'm familiar with offer built-in functionality for preserving message
boundaries, along with varying approaches to connection establishment and
acknowlegement.  However, it does not appear that they provide enough extra
functionality, or require enough less effort to develop that they (except
for ISO TP) will ever become very widespread.

Frequently, having been made aware of the alternatives, the searcher reads
the specs and decides that he won't save anything.  Sometimes the next stage
is a complaint about excessive complexity containing the assertion "I never
see *any* packet loss between my Frobozz boxes on my FooNeT".  Whereupon
a large number of people jump down the complainer's throat, mostly network
maintainers suffering the slings and arrows of trying to make protocols
designed for 'little' nets run on 'big, bad' nets...

In summary, if you need reliability in an "internet" protocol, those "in
tune with the Tao of the Internet" assert that you need a connection, flow
control and an end-to-end data integrity check.  If all of your
transactions are guaranteed to fit in one packet, you can replace the
connection state with server idempotency.  If not, message boundaries are
best not tied to packet boundaries, lest you fall afoul of differing
MTUs and fragmentation (see RFC 1001/1002 Netbios over TCP for an example
of a header/length-based scheme).  If you leave the integrity check out,
that's your and your customers' risk, but leaving the flow control out
could get hosts ostracized by offended backbone router operators...

"Those who do not understand TCP are doomed to re-invent it..."

(??? who said that ???)

James B. VanBokkelen		26 Princess St., Wakefield, MA  01880
FTP Software Inc.		voice: (617) 246-0900  fax: (617) 246-0901

BILLW@MATHOM.CISCO.COM (William "Chops" Westfield) (10/23/90)

I occasionally wonder whether we should just take TCP, add a comment
that says "you WILL preserve packet boundries", change the IP protocol
type, and say "poof, here is a reliable datagram protocol".

BillW
-------

mcc@WLV.IMSD.CONTEL.COM (Merton Campbell Crockett) (10/23/90)

I may have missed the point but doesn't a PUSH accomplish the same thing?  In
which case no modification is required.

Merton

BILLW@MATHOM.CISCO.COM (William "Chops" Westfield) (10/23/90)

    I may have missed the point but doesn't a PUSH accomplish the same thing?
    [make TCP a datagram-oriented protocol]

Well, no.  A major part that is missing is a specification for the
interface between TCP and the next layer up (in ISO, this would likely
be a whole separtate document.)  In particular, if a receiver gets
two packets with PUSH set, the interface may put both packets in a
single buffer.  To quote RFC793:

    The exact push point might not be visible to the receiving user and
    the push function does not supply a record boundary marker.

Also, on the sender side, push does not preclude use of algorithms such
as slow start, Nagle, or re-packetization on retransmit.  (Hopefully,
a system using a datagram oriented protocol does not involve situations
where these are important (well, slow start would still be useful - you
just have to do it with datagrams rather than stream data))

Finally, TCP as is will send many datagrams if you present more than a
packet-sizes worth of data.  For a datagram oriented system, you would
force it to send a fragmented IP packets instead (and the maximum
segment size would have a slightly different meaning.)

The changes to any particular TCP to achieve a reliable datagram model
would not be significant, but it would take a little work.

BillW
-------

J.Crowcroft@CS.UCL.AC.UK (Jon Crowcroft) (10/23/90)

the world used to divide into 
datagram & virtual circuit

it now cuts many ways -

connection oriented versus connectionless is on dimension, but
reliability is another orthogonal issue ... some people would assert
that using a 'first' datagram to establish buffer allocation for an
entry in a router's tables to decide on who gets packets dropped is 
connection oriented ... it's just not reliable...on the other hand,
asking for the right TOS may insure reliability, even though the
packet format is datagram...

so i dont agree with jon postel...however, "reliable internet" may be
an oxymoron:-)

but seriously, 'reliable virtual circuits' is an oxymoron if you've
ever tries using (I)PSS...calls get ripped out with disconnection
reasons like 'congestion' , which doesnt sound too dissimilar from
routers dropping packets (except you have to wait a while longer
before sending your next packet - or do you - well prob. not if you
are running slow start x.25, now there's an idea:-))

jon crowcroft

jbvb@FTP.COM (James B. Van Bokkelen) (10/23/90)

    ....
    Finally, TCP as is will send many datagrams if you present more than a
    packet-sizes worth of data.  For a datagram oriented system, you would
    force it to send a fragmented IP packets instead (and the maximum
    segment size would have a slightly different meaning.)

Given my druthers, I'd much rather make message boundaries independent
of IP datagrams, because deliberate fragmentation is EEEVIIILLL!!!.  Also,
you'll never hear the end of "why are records limited to 65Kb" and "why
can't I use records bigger than 8Kb on my FooNix system"?

  If you're willing to re-implement everywhere
    If you're willing to settle for 16 bits of record length
      Do something really gross with the Urgent pointer and unused bits in the
      TCP header.
    Else
      Define a new TCP Record Option as: opt_type, opt_len followed by 
      (opt_len - 2)/2 "start of record offsets within this segment".
  Else
    Define a formal "Record-Oriented TCP Extension" which uses header/length
    and let the applications that want it use it.  If enough of them do so,
    someone will move support into a library, and then someone else will put
    it in the kernel.  You could even use ISO Session and get ASN.1 data
    abstraction in the ?bargain?

James B. VanBokkelen		26 Princess St., Wakefield, MA  01880
FTP Software Inc.		voice: (617) 246-0900  fax: (617) 246-0901

jqj@HOGG.CC.UOREGON.EDU (10/24/90)

I think we are being a bit ingenuous in assuming that the only reason one
might want "reliable datagrams" is to implement a sequenced packet protocol,
or that the only good way to get reliable packet delivery over IP is by using
TCP.  The current discussion does not, for example, address reliable broadcast
or any of a miriad of other real transaction-oriented applications for
reliable packet exchange where the cost of setting up a VC is prohibitive.
I would even go so far as to suggest that the lack of a standard RPX protocol
in the IP suite has inhibited development of reasonable applications that
would use it!

As an aside, 4.2bsd included Eric Cooper's courier compiler that implemented a
Xerox SPP-like protocol over TCP (message boundaries and message types are 
needed to implement the semantics of Courier).  As I recall, Eric just used
a simple counted string approach, where messages could span multiple strings
and end was delimited by an EOM bit in the message header, totally ignoring IP
packet boundaries, PUSHes, etc. (you have to guarantee a PUSH after the EOM,
I suppose).  Worked just fine if what you wanted was packet boundaries on TCP;
no changes to TCP needed.

jcurran@SH.CS.NET (10/24/90)

>> I may have missed the point but doesn't a PUSH accomplish the same thing?  In
>> which case no modification is required.
   
"A PSH bit is not a record marker and is independent of segment boundaries."
"Passing a received PSH flag to the application layer is now OPTIONAL."
(RFC1122)

Work on a TCP extension for record information rather than taking 
a step back.. 

/John

Policy Routing, 2000: "Through the networks according to their abilities,
			to the applications according to their need."

braden@VENERA.ISI.EDU (10/24/90)

If you need records, you can build a trivial framing protocol on top
of TCP.  There have been many examples of this... Mike Muuss' PKG,
records in Sun's XDR, and Dave Clark's USP come immediately to mind.

Bob Braden

phil@BRL.MIL (Phil Dykstra) (10/24/90)

> The changes to any particular TCP to achieve a reliable datagram model
> would not be significant, but it would take a little work.

It is easy to layer a reliable message protocol on top of TCP.  We
designed one such protocol at BRL call PKG (Package Protocol) and have
used it for several distributed applications over the past five years
(e.g. in BRL-CAD).  Messages have user defined "types" and both
synchronous and asynchronous message exchanges are supported.  At one
time a version of it even ran over DECNET, but our current code is
BSD socket library oriented only.  We should have written an RFC about
it years ago.

If anyone is interested they can look at brl-cad/pkg.shar.Z via
anonymous FTP on ftp.brl.mil (a.k.a. vgr.brl.mil, 192.5.23.6).

- Phil

gwilliam@SH.CS.NET (George Williams) (10/24/90)

The "push" flag, present on tcp packets has nothing to do with
reliabilty. It just says ' forwards what's in the receiving 
application buffer ' up...now !

It has nothing to do with Datagram or Reliabilty but is associated 
with fragmentation and re-assembly.

Additionally, most APIs don't make this flag user accessible; as a 
good working knowledge of systems and protocol(s) as they pertain
to overall response time and throughput is assumed for those who
massage this flag. It can result in overall system degradation in
a distributed compute environment is set inappropriately..

A final word on UDP...if I may:

 () Architecturally, UDP is the connection-less transport for IP.

    Reliability as in pertains to delivery and retranmissions is 
    the assumed burden of  associated HLPs (higer level protocols)
    or service(s) present. I did not write the protocol but did 
    enough implementations to state this as fact.

 () It is generally the protocol of choice when the transmisssion
    media has a low error rate or an HLP (e.g. interactive ) that
    compensates for deficiencies.
    

   George Williams

( The above are my own humble views and opinions and are not a 
  critique )

BILLW@MATHOM.CISCO.COM (William "Chops" Westfield) (10/24/90)

    If you need records, you can build a trivial framing protocol on top
    of TCP.  There have been many examples of this... Mike Muuss' PKG,
    records in Sun's XDR, and Dave Clark's USP come immediately to mind.

Yes, yes.  This is not difficult, and is clearly the way to do things
within the framework of the currently defined protocols.  In fact, the
cisco routers do exactly this sort of thing for running X.25 over TCP.

Aesthetically, though, it bothers me to have to do the extra work of
converting datagrams to streams and back when the underlying transmission
scheme is almost certainly datagram based.  (Hmm, is anyone running TCP
over anything other than IP?)

BillW
-------

jamesp@bilby.cs.uwa.oz.au (James Pinakis) (10/24/90)

Maybe I've missed something here, but what happens if you have multiple
entities which want to send datagrams between each other, and it may be
the case that only one datagram is sent between any given pair.
In this case, isn't the overhead of establishing a TCP connection merely
to send one "datagram" (or "data which preserves message boundaries" or
whatever the pedants like to call it) too great?  Isn't this the exact
situation when a different protocol is required and building a scheme on
top of TCP is bad?  And if I don't _want_ a connection between two sites
(i.e. I only want to send discrete messages) then why should I create one,
send the data, and then pull it down?

I agree that such a protocol is not really connectionless and that state must
be maintained at both ends, but at least from the application layer (assuming
it's implemented as a transport layer protocol) it _looks_ like I am
reliably sending datagrams.  I've spend some time over the last few months
implementing just such a protocol.  Basically it is a fairly straightforward
implementation of a sliding window protocol (protocol 6 from Tanenbaum
actually) optimised to support a particular sort of client/server model.
The main thing which it has over TCP, I believe, is that establishing the
state information is sufficiently "lightweight" so that the cost of
a client sending a small number of packets to a server is not prohibitive.

james
jamesp@bison.cs.uwa.oz.au

mishkin@apollo.HP.COM (Nathaniel Mishkin) (10/24/90)

In article <9010221418.AA03839@ftp.com>, jbvb@FTP.COM (James B. Van Bokkelen) writes:
>So, every time this came up in the past, the next stage was to ask the
>person looking for a "reliable connectionless protocol" or somesuch what
>was really needed.  The most frequent goal has been some sort of transport
>for a little machine, or a new one for which there is no networking
>software yet.  The searcher doesn't want to implement all of TCP, and sees
>"datagrams" as being easier, particularly on a single Ethernet.  Another
>common goal has been to get very high throughput by avoiding the "excessive
>overhead" of TCP, but Van Jacobsen's research has more or less laid that
>one to rest.  A third one has been "preservation of message boundaries".

Good summary.  Having made the mistake (no doubt through initial
fuzzy-headedness on my part), of having built an RPC system whose protocol
I called "datagram-oriented" (and which I now call "connection-oriented,
but designed with the knowledge that RPC is the application that wants
to use it" -- at least to people who might understand what I mean), I
try to be very careful when I say "datagram" these days.

Anyway, one other goal of some people in search of something other than
TCP is the reduction in the number of network messages that need to be
exchanged for short-lived connections (which often occur when you're
doing RPC), in particular eliminating some of the connection setup and
tear-down messages.  E.g., for the purposes of RPC, it sure would have
been nice if I could do something like send data (i.e., the remote call's
input parameters) in a SYN.  (I've never seen an implementation that
allows this; I can't point to the place in the TCP spec that disallows
it, but I imagine it is disallowed.)  Maybe it's not really so bad to
have the number of control messages be more than the number of
data-carrying messages in case I make one remote call to a server, but
my assumption is that the overall system will behave better if the number
of control-only messages is reduced.

--
                    -- Nat Mishkin
                       Cooperative Object Computing Operation
                       Hewlett-Packard Company
                       mishkin@apollo.hp.com

hwajin@wrs.com (Hwa Jin Bae) (10/25/90)

In article <9010231706.AA25446@hogg.cc.uoregon.edu> jqj@HOGG.CC.UOREGON.EDU writes:
>As an aside, 4.2bsd included Eric Cooper's courier compiler that implemented a
>Xerox SPP-like protocol over TCP (message boundaries and message types are 
>needed to implement the semantics of Courier).  As I recall, Eric just used
>a simple counted string approach, where messages could span multiple strings
>and end was delimited by an EOM bit in the message header, totally ignoring IP
>packet boundaries, PUSHes, etc. (you have to guarantee a PUSH after the EOM,
>I suppose).  Worked just fine if what you wanted was packet boundaries on TCP;
>no changes to TCP needed.

pretty much the same thing is done with sun rpc over tcp.  the following
is from sunrpc 4.0 release:

/*
 * xdr_rec.c, Implements TCP/IP based XDR streams with a "record marking"
 * layer above tcp (for rpc's use).
 *
 * These routines interface XDRSTREAMS to a tcp/ip connection.
 * There is a record marking layer between the xdr stream
 * and the tcp transport level.  A record is composed on one or more
 * record fragments.  A record fragment is a thirty-two bit header followed
 * by n bytes of data, where n is contained in the header.  The header
 * is represented as a htonl(u_long).  Thegh order bit encodes
 * whether or not the fragment is the last fragment of the record
 * (1 => fragment is last, 0 => more fragments to follow.
 * The other 31 bits encode the byte length of the fragment.
 */

-- 
hwajin@wrs.com

HOOPER@QUCDN.QUEENSU.CA (Andy Hooper) (10/25/90)

If you look closely at RFC 1006 (ISODE transport service on TCP), and strip
out all the stuff about TPDUs, what's left is a record boundary protocol for
TCP. It basically just puts a byte count in front of each "record" (or NPDU
in the ISODE context).

bzs@world.std.com (Barry Shein) (10/25/90)

>Aesthetically, though, it bothers me to have to do the extra work of
>converting datagrams to streams and back when the underlying transmission
>scheme is almost certainly datagram based.  (Hmm, is anyone running TCP
>over anything other than IP?)
>
>BillW

But it's orders of magnitude easier than trying to add reliability
(and performance, once you've added that reliability) to UDP or
similar. All you basically need is to add a count field to each
"packet" if you put it over TCP.

(One can get fancier, depends on the real need of the application,
originally I think all that was asked was that a write of X bytes at
this end can be turned into a read of X bytes at the other end, record
boundaries, that's trivial over TCP as it's all sequenced and reliable
already.)

-- 
        -Barry Shein

Software Tool & Die    | {xylogics,uunet}!world!bzs | bzs@world.std.com
Purveyors to the Trade | Voice: 617-739-0202        | Login: 617-739-WRLD

muts@fysaj.fys.ruu.nl (Peter Mutsaers /100000) (10/25/90)

bzs@world.std.com (Barry Shein) writes:

>>Aesthetically, though, it bothers me to have to do the extra work of
>>converting datagrams to streams and back when the underlying transmission
>>scheme is almost certainly datagram based.  (Hmm, is anyone running TCP
>>over anything other than IP?)
>>
>>BillW

>But it's orders of magnitude easier than trying to add reliability
>(and performance, once you've added that reliability) to UDP or
>similar. All you basically need is to add a count field to each
>"packet" if you put it over TCP.

There may well be another reason not to use TCP. I for example am busy
with distributing programs over dozens of workstations. Every program
must be able to talk to any other one, 30 TCP connections is often the
maximum possible.
I could automatically close a connection if a new one must be opened, but
how do I know if no data is to be read, or underway, to the connection
I want to close?

If someone has another solution for this problem than making a reliable
UDP I'd like to hear it.
--
Peter Mutsaers                          email:    muts@fysaj.fys.ruu.nl     
Rijksuniversiteit Utrecht                         nmutsaer@ruunsa.fys.ruu.nl
Princetonplein 5                          tel:    (+31)-(0)30-533880
3584 CG Utrecht, Netherlands

craig@bbn.com (Craig Partridge) (10/25/90)

In article <12632159446.18.BILLW@mathom.cisco.com> BILLW@MATHOM.CISCO.COM (William "Chops" Westfield) writes:
>
>Aesthetically, though, it bothers me to have to do the extra work of
>converting datagrams to streams and back when the underlying transmission
>scheme is almost certainly datagram based.  (Hmm, is anyone running TCP
>over anything other than IP?)

Well, if you really wanna talk esthetics, I'm sure that somewhere in the
world, someone is running the cisco X.25 over TCP, over an IP, which at
some point in the path gets sent over an X.25 channel....

Craig

craig@bbn.com (Craig Partridge) (10/25/90)

>There may well be another reason not to use TCP. I for example am busy
>with distributing programs over dozens of workstations. Every program
>must be able to talk to any other one, 30 TCP connections is often the
>maximum possible.
>I could automatically close a connection if a new one must be opened, but
>how do I know if no data is to be read, or underway, to the connection
>I want to close?

From a protocol designer's point of view this is a terrible argument
(from an implementer's perspective, I understand the issues, but allow
me to wear my designer hat for a while).

To develop a "reliable UDP", you'll need state information (sequence
number, retransmission counts, round-trip time estimator), in principle
you'll need just about all the information currently in the TCP connection
block.

So building a "reliable UDP" is essentially as difficult as doing a TCP.
And the only reason to do it is that your operating system constrains the
number of connection blocks you can get, while the UDP interface allows
you more connection blocks, because you can put the connection blocks in
your application's memory space, rather than your kernel space (which is
putting dumb restrictions on you).

In the long run, you would almost certainly be better off fixing enhancing
the kernel to support more connection blocks, than doing a "reliable UDP."
You'll get a more flexible kernel, access to the protocol you really want,
and won't get caught in the quagmire of maintaining yet another reliable
protocol.

Craig

braden@VENERA.ISI.EDU (10/25/90)

	Anyway, one other goal of some people in search of something other than
	TCP is the reduction in the number of network messages that need to be
	exchanged for short-lived connections (which often occur when you're
	doing RPC), in particular eliminating some of the connection setup and
	tear-down messages.  E.g., for the purposes of RPC, it sure would have
	been nice if I could do something like send data (i.e., the remote call's
	input parameters) in a SYN.  (I've never seen an implementation that
	allows this; I can't point to the place in the TCP spec that disallows
	it, but I imagine it is disallowed.) 
	
Nat,

It certainly is NOT disallowed, and all of the early research TCP's
tried to support it.  A favorite test in "bake-offs" was to send a
"Kamakazii packet", with SYN, data, and FIN all in one segment.  Not
all TCP's survived the test, but some did...  However, sending data with
the original SYN (sic) is not a big win because a full 3-way handshake
is necessary before the data can be passed to the receiving
application.  Other protocols, e.g., delta-T and VMTP, use a timer-based
mechanism to avoid 3-way handshakes; this leads to what is commonly
called a "transaction transport protocol" (see RFC-955).

Bob Braden

jbvb@FTP.COM (James B. Van Bokkelen) (10/26/90)

If you can fit your entire message in one MAC-layer packet, and you only
have one, a formal connection can be replaced by server idempotency.  However,
if you have more than one message (or they are larger than one MAC-layer
packet), consider the following:

Dir	Flags		Data.

 <-	SYN		Transaction 1 request.
 ->	SYN ACK		Transaction 1 response.
 <-	ACK FIN		Transaction 2 request.
 ->	ACK FIN		Transaction 2 response.
 <-	ACK

Two transactions, five packets, perfectly legal TCP, and the server
doesn't have to be idempotent.  Four is the absolute minimum with UDP if
the server is idempotent.  I am not contending that any existing TCP API
will allow this streamlined an exchange, or that all TCPs can handle data
with the SYN (almost all can handle data with the FIN).  However, I will
argue that work in this direction offers more bang per buck than
developing both the sophisticated API and a new protocol to go under it.

One might criticize my scenario on the grounds that transaction processing
might delay things enough to cause retransmissions.  However, the same problem
afflicts any transport in this situation.  If processing time is predictable,
the API must allow setting initial timeout values.  If not, the net suffers
regardless of the API.

James B. VanBokkelen		26 Princess St., Wakefield, MA  01880
FTP Software Inc.		voice: (617) 246-0900  fax: (617) 246-0901

karn@envy.bellcore.com (Phil Karn) (10/26/90)

In article <1990Oct24.090841@apollo.HP.COM>, mishkin@apollo.HP.COM
(Nathaniel Mishkin) writes:
|> E.g., for the purposes of RPC, it sure would have
|> been nice if I could do something like send data (i.e., the remote
call's
|> input parameters) in a SYN.  (I've never seen an implementation that
|> allows this; I can't point to the place in the TCP spec that
disallows
|> it, but I imagine it is disallowed.)

I don't think anything in the TCP spec specifically disallows the
piggybacking of data with SYN bits. The only possible argument against
it is the fact that the active initiator of a TCP connection doesn't
yet know the receiver's window size yet. But even here the the worst
that should happen under the Robustness Principle is that the receiver
might not have buffer space for all (or any) of the data, requiring
the sender to retransmit it once the connection is fully open.  This
is an efficiency issue, not a protocol correctness issue, and is
identical to that associated with the "optimistic window" send policy.

But assuming no problems with buffer allocation, I see no reason why
an entire TCP connection couldn't consist of only three packets:

A->B: SYN, FIN and data
B->A: SYN, FIN, ACK (and optional data)
A->B: ACK

Of course, most application interfaces don't provide for sending such
"christmas tree" packets, but a correct implementation of TCP should
be able to handle them when received. It's hard to think of a "reliable
datagram protocol" that would take less than three packets to provide
at-most-once semantics for a single message in each direction.

Even with applications that don't require a "reliable datagram
protocol",
I think that the ability of TCP to piggyback control and data should be
much more widely used. There's no reason why a SMTP or FTP server's
opening banner couldn't be piggybacked on the server's SYN/ACK segment,
saving two packets, and no reason why FIN can't be piggybacked on the
last
segment of a data transfer, also saving two packets. Again, a correct
implementation of TCP should handle such packets just fine.

While we're on the subject of piggybacking, another thing I would
really like to see is widespread use of batched SMTP on the Internet.
I think the number of packets it takes for most SMTP implementations
to transfer a short mail message is criminal, especially when the
message has several recipients on the same system.  There's no reason
that you shouldn't be able to send a series of SMTP commands in a
single TCP segment and receive a series of responses, except that many
SMTP servers inexplicably blow up when you try this. Given that TCP is
supposed to be a reliable byte stream protocol, the designers of these
systems must have gone well out of their way to keep this from working.

Phil

muts@fysaj.fys.ruu.nl (Peter Mutsaers /100000) (10/26/90)

craig@bbn.com (Craig Partridge) writes:

>>There may well be another reason not to use TCP. I for example am busy
>>with distributing programs over dozens of workstations. Every program
>>must be able to talk to any other one, 30 TCP connections is often the
>>maximum possible.
>>I could automatically close a connection if a new one must be opened, but
>>how do I know if no data is to be read, or underway, to the connection
>>I want to close?

>So building a "reliable UDP" is essentially as difficult as doing a TCP.
>And the only reason to do it is that your operating system constrains the
>number of connection blocks you can get, while the UDP interface allows
>you more connection blocks, because you can put the connection blocks in
>your application's memory space, rather than your kernel space (which is
>putting dumb restrictions on you).

But suppose I want 1000's of processes to be able to communicate. Couldn't
the overhead of just having that many connections become to big if
only few of them communicate at the same time? It would be very helpful 
if I could use TCP indeed, but have a kind of a 'safe' close, which does
indicate if more data could be underway.
Besides, for my particular application I use the select() system call, which
only operates on the lowest 32 file descriptors.

--
Peter Mutsaers                          email:    muts@fysaj.fys.ruu.nl     
Rijksuniversiteit Utrecht                         nmutsaer@ruunsa.fys.ruu.nl
Princetonplein 5                          tel:    (+31)-(0)30-533880
3584 CG Utrecht, Netherlands

craig@bbn.com (Craig Partridge) (10/26/90)

In article <1666@ruunsa.fys.ruu.nl> muts@fysaj.fys.ruu.nl (Peter Mutsaers /100000) writes:
>
>>So building a "reliable UDP" is essentially as difficult as doing a TCP.
>>And the only reason to do it is that your operating system constrains the
>>number of connection blocks you can get, while the UDP interface allows
>>you more connection blocks, because you can put the connection blocks in
>>your application's memory space, rather than your kernel space (which is
>>putting dumb restrictions on you).
>
>But suppose I want 1000's of processes to be able to communicate. Couldn't
>the overhead of just having that many connections become to big if
>only few of them communicate at the same time?

Let me repeat my assertion, slightly differently.  IF you want reliability,
THEN you have no choice but to have connection blocks.  If you have 1,000's
of processes continuously talking, you will have 1,000s (or more) connection
blocks.  If they are not continuously talking, then you can just as easily
deallocate TCP connection blocks as "reliable UDP" connection blocks.

>Besides, for my particular application I use the select() system call, which
>only operates on the lowest 32 file descriptors.

This has suddenly become a UNIX discussion, but read the manual page
again.  Select uses arbitrary sized bitmasks, and takes a parameter
telling it how large the bitmasks passed to it are.  [32 is just the
maximum size some systems support].

Craig

hedrick@athos.rutgers.edu (Charles Hedrick) (10/29/90)

Since TCP handles all errors, flow control, etc., about all your
trivial framing protocol needs to be is a byte count followed by the
data.  As long as your TCP implementation pushes every write (which is
typical of those that I know), this is about all you need.  In order
to catch coding errors, it's probably a good idea to include some way
of catching missynchronization, like some tacking on a -1 word at the
end of each message.

The esthetics of this do not bother me.  Converting messages to
streams and back will be a no-op in the simple case, and useful in
other cases.  That is, if you do single query and response, TCP will
simply send off your packets one by one, handling only the sorts of
issues you'd really rather not have to handle for yourself, such as
retransmission.  If you send more than one datagram in a given
direction (e.g. you respond to a query with lots of data), then
combining datagrams, sequencing, etc., are useful.  If TCP were really
a high-overhead thing, I'd worry about this, but lots of smart people
are putting lots of time into optimizing TCP.  They have almost
certainly done a better job than you will do.  The main places I
recommend avoiding TCP are when queries are being sent to varying
destinations, e.g. named, or when you need to use broadcasts.
And frankly I sometimes think it would have been better to have
done named only over TCP.

clynn@BBN.COM (Charles Lynn) (10/30/90)

Doesn't TCP require the connection to be in the ESTABLISHED state before
it is permitted to deliver data to the application?  If so, I think B
cannot see the data until it gets the ACK of its SYN from A, thus the
response from B cannot be includes with B's SYN ACK.

Charlie

Makey@Logicon.COM (Jeff Makey) (10/30/90)

In article <1990Oct25.165545@envy.bellcore.com> karn@thumper.bellcore.com writes:
>There's no reason
>that you shouldn't be able to send a series of SMTP commands in a
>single TCP segment and receive a series of responses, except that many
>SMTP servers inexplicably blow up when you try this. Given that TCP is
>supposed to be a reliable byte stream protocol, the designers of these
>systems must have gone well out of their way to keep this from working.

Five years ago, my internet host was a PDP-11/70 running PWB UNIX
(vintage 1978, for those who don't know their history).  The wonderful
folks at BBN had somehow managed to make this beast talk TCP/IP, and
had provided implementations of TELNET, FTP, and SMTP.  The SMTP code
blissfully assumed that every read() call would return exactly one
line of text, and this assumption was correct about 90% of the time.
Eventually I wanted the other 10% to work and I added buffering to the
SMTP daemon independent of TCP, but nobody had gone to any great
effort to produce the broken implementation; it was just plain old
lazy programming.

                           :: Jeff Makey

Department of Tautological Pleonasms and Superfluous Redundancies Department
    Disclaimer: All opinions are strictly those of the author.
    Domain: Makey@Logicon.COM    UUCP: {ucsd,nosc}!snoopy!Makey

jbvb@FTP.COM (James B. Van Bokkelen) (11/01/90)

    From: Charles Lynn <clynn@BBN.COM>

    Doesn't TCP require the connection to be in the ESTABLISHED state before
    it is permitted to deliver data to the application?  If so, I think B
    cannot see the data until it gets the ACK of its SYN from A, thus the
    response from B cannot be includes with B's SYN ACK.

You're right:  RECEIVE calls are supposed to be queued until the connection
reaches ESTABLISHED (RFC 793, pg 58).  The discussion of OPEN, SEND and
RECEIVE in that section seems somewhat out of line with the requirement
elsewhere in the RFC that TCPs accept data with the initial SYN, because
it appears that the API specified in section 3.9 can't generate it (SENDs
are queued until the connection reaches ESTABLISHED as well).

I will raise this as an issue with the HR WG.  My own initial feeling
is that the API's restriction is unnecessary, and could be relaxed at
considerable benefit to the Internet...

James B. VanBokkelen		26 Princess St., Wakefield, MA  01880
FTP Software Inc.		voice: (617) 246-0900  fax: (617) 246-0901

braden@VENERA.ISI.EDU (11/01/90)

	You're right:  RECEIVE calls are supposed to be queued until the connection
	reaches ESTABLISHED (RFC 793, pg 58).  The discussion of OPEN, SEND and
	RECEIVE in that section seems somewhat out of line with the requirement
	elsewhere in the RFC that TCPs accept data with the initial SYN, because
	it appears that the API specified in section 3.9 can't generate it (SENDs
	are queued until the connection reaches ESTABLISHED as well).

	I will raise this as an issue with the HR WG.  My own initial feeling
	is that the API's restriction is unnecessary, and could be relaxed at
	considerable benefit to the Internet...

	James B. VanBokkelen		26 Princess St., Wakefield, MA  01880
	FTP Software Inc.		voice: (617) 246-0900  fax: (617) 246-0901

James,

I believe that the API defined in RFC-793 was intended to describe
the MINIMUM set of capabilities required of an implementation; in
fact, there are weasel-words about that in the RFC.  Thus, it was
not intended to be a "restriction"; an implementation could go
beyond it, to include for example, a call: OPENandSEND(...).

I also believe, however, that the detailed protocol description
in RFC-793 is inconsistent in a number of little ways with the
5-packet minimum request-response exchange that TCP could in
principle provide:

    1.   SYN, req-data, FIN ----->  
    2.                   <--------- SYN, ACK
    3.   ACK -------->  (deliver data to application)  
    4.                   <---------  ACK, rep-data, FIN   
    5.   ACK -------->   

There are problems both in the state diagram and in the event
processing rules.  Although some of the early research TCP's could
handle this 5-packet exchange, the issue did not get a lot of attention
when the DoD put the heat on the research group to tie down the TCP
spec.  Maybe there is a job here for Host Requirements Man (leaps
tall buildings at a single bound, etc). 

Pardon me while I wax philosphical for a moment.

The early TCPs were all designed to implement "what the protocol meant"
(as Charlie Lynn can attest).  The words in RFC-793 were written later,
and getting them exactly right was (and is!) HARD.  Witness the Milspec
spectacle.  Implementors are still advised to implement what the
protocol means, not what the spec says.

It is arguably a defect of TCP as a protocol (and an advantage of TP4?)
that TCP is very subtle, and therefore it is complex and difficult to
describe in detail.  Writing an efficient and fully correct TCP ab
initio is still a great programming challenges, I believe (as I think
you will agree, James!)  I wonder of there are ANY fully correct
implementations of the (entire) TCP protocol in the world today?

Bob Braden

sma@WRC.XEROX.COM (11/02/90)

We have been doing some work on a reliable multicast transport protocol,
the "Multicast Transport Protocol".  MTP (or Empty Pea) attempts to satisfy
two of the goals discussed recently - 1) fast, reliable multicasting at the
transport level, and 2) agreement on order and delivery of "messages" built
on top of the transport data.

The transport layer is a NAK-based protocol that utilizes the multicast
capability of the lower layers (e.g. IP and Ethernet).  It provides reliable
delivery of messages, which are uninterpreted sequences of bytes terminated
by a end-of-message marker.  The ordering and agreement protocol uses a
token-based scheme to grant sending rights to producers of messages, to
guarentee that a given message will either be accepted by all processes or
not accepted by any, and to guarentee that all processes agree on the order
in which the messages will be processed.

Alan Freier at Apple and Keith Marzullo at Cornell have written a technical
report on MTP available from Cornell.  Ask marzullo@cs.cornell.edu for
information on getting a copy.

Cheers,
    Susie Armstrong
    Xerox Webster Research Center
    armstrong.wbst128@xerox.com