[mod.protocols.tcp-ip] Port Multiplexing Details

Margulies@SCRC-YUKON.ARPA.UUCP (05/19/86)

I have been thinking about how the port multiplexing protocol might
work. Part of it seems simple enough, but part does not.

For TCP, the following sketch seems easy enough:

Given a TCP port for a NAMED-TCP-SERVICE service, you connect to that
port, and send the name of the service you want followed by a CRLF.

If the service exists, your connection is handed off to it.  If not, the
connecting is closed.

Server implementions are welcome to mark the TCB with the service name.

Conceptually, note that there need not be any numeric port number
associated with the protocol at all.

Some implementations may choose to implement this as a mapping from
names to ports.  It is particularly useful for the port used to vary, so
that no explicit configuration is needed to avoid collisions between
protocols.

For UDP, the problem is harder.

In the CHAOS protocol, the datagram equivalent carries a protocol name.
However, it dosen't carry any data.

Having UDP packets include a protocol name would none the less be the
most elegant.  I fear that it won't be practical.  It would probably be
necessary to invent UDP-2 as a full-fledged protocol that stored the
name-length in the header.

The weaker alternative is a UDP service that converts a protocol name
into a port number.  The problem here is the lifetime of the resulting
information.  If the mapping from names to numbers has to be permanent,
then each server implementation has to have a way to maintain a
permanent data base of the assignments, which would be a shame.

JSLove@MIT-MULTICS.ARPA.UUCP (05/19/86)

Concerning Benson Margulies' comments on NAMED-TCP-SERVICE:

You can't just close the connection if the service is not implemented,
because the TCP close is a half-channel close.  I really think you must
abort the connection, sending an RST packet, to indicate totally
unambiguously that the service is unavailable.

Consider the timing windows if you use a closing strategy:  the user
side sends the protocol name to NAMED-TCP-SERVICE, and then passes the
connection to the protocol user program.  Perhaps it waits for the
server's TCP to acknowledge the name bytes.  The server receives the
name, and the service is unavailable.  It closes the connection.  The
user side gets the close but is already in the protocol user code.  This
could be surprising; it might even have some other interpretation if the
service multiplexor is used for some existing service.

When an abort is used instead, this more closely resembles the response
of a system which doesn't implement the service.  If you try to contact
a TCP which has nothing listening on a port, the connection is
effectively aborted.  Granted, it was never established, but a RST
packet is a RST packet.

A cleaner alternative is to have some positive acknowledgement.  For
example, the server could send OK, DOWN, or UNKNOWN back, followed by a
CRLF.  The disadvantage of this is the extra overhead of the reply.  If
you are lucky, the reply data gets piggybacked on the name
acknowledgement, and the reply acknowledgement gets piggybacked on the
first packet of the user program.

Long live UDP-2.  My proposal for this used a null-terminated string on
the theory that it constitutes encouragement to use the UDP-2 protocol
only for one shot single-query single-response datagram exchanges.  This
was probably not clever of me.  By having a single length byte, you can
hash very nicely on length and first character.  I don't believe that
more than one length byte is needed, but perhaps one or two high bits
could be swiped as flags if 127 or 63 is acceptable as the maximum
length of a protocol name.  I still think that the name field should be
an even number of bytes long because a number of machines like their 16
bit fields aligned.  The spec should thus include a pad byte for odd
length names.

There is still the problem of designating the other end of the
transaction.  When you send a query the service name is equivalent to
the foreign port; when sending a reply, the service name is the local
port.  Either both ports must be named, or there will have to be some
other way of distinguishing queries from replies.  I really don't like
using two names, but perhaps that is best.  I would prefer service name,
transaction ID, and a query/reply flag.  The transaction ID would be
assigned by the querying host, and for many protocols ignored by the
server except to send it back in the reply.  Uniqueness might be
required in some cases, but is only needed for a given host pair and
service name, just as the port number would be.

The extra space taken by the name combined with the packet size limits
restricts the amount of data in the datagram.  For UDP, I have heard
suggested maximum packet sizes of 512 bytes, although you can send real
jumbograms between some implementations using fragmentation.  By
requiring that transaction IDs be unique over host pairs (during some
TTL (time to live)), the service name could be omitted from reply
packets.  If this seems ugly, how about a flag in the request packet
indicating whether it is permissible to omit the service name from the
reply.  If the flag is still set in the reply, there is no service name.

If the UDP port lookup service is written, a TTL field could be included
in the reply.  Some user sides might ignore the TTL and look up every
time they had a transaction.  For simple exchanges this doubles the
overhead.  Hosts with a permanent database could return very long times
to live; and long TTLs would be the rule when servers permitted looking
up permanently assigned values like TELNET => 23.  When the port numbers
are assigned on a per-bootload basis, or even more often, TTL values
like one minute could be used to allow for system crashes or service
restarts.

The UDP service might be useful for protocols other than TCP.  A request
packet could include the protocol ID and the name of the service.  The
reply could begin with the request and add the port number and TTL.
Perhaps the port number could be at the end so that protocols could have
port numbers more than 16 bits long.

I think it is reasonable to refine all three proposals.  I fear that
there will be few takers for UDP-2.

STJOHNS@SRI-NIC.ARPA.UUCP (05/20/86)

Let's  try  and make this as simple as possible, at least for the
TCP side.  I haven't taken a look at the UDP stuff yet, but there
may be a totally seperate solution.

Having  yielded to the original point that a multiplexing port is
necessary, I went back and took a look at the spec  and  came  up
with the following:

1) Assign a standard TCP port for a Contact by Name server.

2)  Define  a  TCP option - Contact Name, give is some reasonable
maximum.  (32 chars?

3) The Contact Name option is only valid in a packet containing a
SYN.  (Just like the max seg size option).

4)  Multiplexing  is  still done at the TCP level, based on ports
and host addresses.  In fact, once the connection is open,  there
is no difference in the way it is handled.

Looking  at  the  implementations  I  am  familiar with (Multics,
UNIX), this shouldn't be difficult to implement at all.

Mike

DCP@SCRC-QUABBIN.ARPA.UUCP (05/21/86)

It shouldn't end in CRLF.  For that matter, you might as well do what
CHAOS did: add arguments to the 'contact name'.  You could even use the
urgent pointer to delimit the end of the contact/name arguments!

In the CHAOS protocol, there is only one type of connection: you connect
with a packet that has contact name and arguments.  There is no
'datagram equivalent'.

I guess I'm not sure what the UDP problem is, possibly because I don't
know how UDP connections get started.  Isn't there a 'first' packet that
can include the contact name?

JSLove@MIT-MULTICS.ARPA.UUCP (05/21/86)

Are the general readership really interested in seeing this design
discussion continue on the TCP-IP list?  I haven't received any
complaints about my long messages, but perhaps the interested parties
have already identified themselves.  If this goes on much longer, should
we create a new list or something?

Putting the contact name in a TCP option in the header of the SYN packet
is cute:  it solves the problem of gateways finding out what service
owns the connection.  It is ugly to have to use a reserved port number
as well, but it would clearly be necessary so that hosts which didn't
implement the protocol would properly reject such packets.

However, putting the contact name in the header imposes length
limitations much more stringent than placing it in the data stream.  It
also requires significant changes to all TCPs that participate in this
game.  Perhaps that seems desirable to others, but I view both as
disasterous.

A 32 character name length may seem adequate, but it really isn't.
First, the number czar may give out long suffixes, like SYMBOLICS-,
THINKING-MACHINES-, HONEYWELL-INFORMATION-SYSTEMS-, and so on.  Within
an organization, there may be further delegation, like OFFICE-SYSTEMS-,
ULTRIX-, SCRC-, and so on.  It would be better to spell out names like
MANDELBROT, rather than having them compressed into MNDLBRT or given
less clear names like DCP3.

In the connection model we are currently using, there is no use for
contact arguments.  There is no pressing need to implement them, since
we are using a stream protocol.  The contact arguments can effectively
just appear in the TCP data stream.  I believe that FINGER is a good
example.  For TCP, the FINGER server arguments appear at the beginning
of the data stream.  In fact, they are the whole data stream in one
direction.  In CHAOS, the first packet contains the FINGER contact name
and the arguments.  The only other packets to go in that direction are
the acknowledgements of the reply.  TCPs generally assume a zero window
until the connection is established, so TCP requires at least two
packets to do what CHAOS does in one.

However, contact arguments can server as protocol specific options, and
it is easy to envision services that might be offered for more than one
stream protocol.  For example, a system might support TCP, DECNET
(whatever their stream protocol is), CHAOS, TP4, X.25, and so on.
Defining an out-of-band mechanism for later use preserves flexibility.

Even though it makes life harder for security-minded gateways, I think
that the advantages of putting the contact information at the front of
the stream outweigh it.  Protocols which must cross the secure gateways
can be assigned numbers.  The gateways can reject multiplexor port SYN
packets.

Sending the contact information urgently is less disruptive since many
TCPs wouldn't have to be modified to permit this.  The advantage of a
protocol that sends the contact information in the stream is that no
underlying mechanisms need be modified on either side of the connection.
The user side just sends the contact information at the beginning of the
stream, suitably delimited, and the server side reads it and acts on it.
The server TCP doesn't need to be modified; the server could make a new
connection to a numbered port and transfer bytes between the two
connections transparently.

Still, there may be TCPs that don't handle urgent data well.  The
rationale for using a network newline (CRLF) to delimit the connection
name is that this is a standard delimiter sequence which is widely
understood across the network.  Any ASCII character may be sent without
triggering recognition of this delimiter since a CRLF can be quoted as
CR NUL LF.  (That is asking for trouble since there are implementations
which are defective in this department.)  Certainly any printing
character can be sent with any implementation.  Sending the contact
name, an optional space followed by contact arguments, and ending the
whole thing with a newline makes it very easy for both user and server.
The server can read one line from the TCP, which is often available as a
primitive.  (If it isn't, there are other ways of ensuring that too much
data is not read from the TCP, like reading one character at a time.)

On Multics, either urgent or newline-delimited contact names would be
easy to implement.  Interoperability would be possible in a very short
time, although improvements could be made later.  A new option would
require significant changes to the underlying TCP for every system that
used it, a much bigger job.

UDP is a connectionless protocol.  It has no options, and no memory from
one packet to the next.  Requests are sent out to a well known port from
a local port which is used to identify the reply, if any.

There are two common scenarios which UDP is used for:  a simple
announcement or query where each participating host sends at most one
packet, and as an underlying base for implementing some complex
protocol.  To ask the time, for example, one host sends a UDP packet to
the time port; there may be no data in the packet (except the port
number to return the time to, which is part of the UDP header).  The
reply contains only the time, perhaps as a 32 bit number.  When it
arrives, the transaction is finished.  This might be a good candidate
for a UDP-2 protocol which carried the name
"TIME-IN-SECONDS-SINCE-1970/01/01-00:00:00-GMT" rather than a well known
port number.  At least one such protocol would get the short name
"TIME", I suppose, since the desire for precise names doesn't seem to be
widespread.

Other examples include TFTP and the remote virtual disk protocol.  Both
of these applications maintain considerable state information about a
session and exchange many packets during the session.  Using UDP-2 would
increase the overhead of the protocols, and might decrease the amount of
useful data sent in a datagram.  For these sorts of services, some other
service to translate service names into port numbers would be much more
efficient.

The UDP lookup server, like Benson's version of the multiplexor server,
is easy to implement.  UDP-2 might be nice, but is an even bigger job
than adding a new TCP option, and I never expect to see it in general
use.

Concerning contact names:  how about reserving some TCP options as
"higher-level protocol specific"?  The options could vary in meaning and
even length depending on which protocol (well known port) was used.
They would be valid only for the SYN packet.  TCPs which play this game
could indicate in the LISTEN or OPEN call that they would accept such
options, and define a way for the application server to read them when
the connection is established.  TCPs which don't implement the options,
or TCBs which don't accept the options would cause the connection to be
aborted rather than established.  This precursor would make it possible
to implement St.  Johns' version of the Named-Service server, and would
provide a mechanism like contact arguments.