Margulies@SCRC-YUKON.ARPA.UUCP (05/19/86)
I have been thinking about how the port multiplexing protocol might work. Part of it seems simple enough, but part does not. For TCP, the following sketch seems easy enough: Given a TCP port for a NAMED-TCP-SERVICE service, you connect to that port, and send the name of the service you want followed by a CRLF. If the service exists, your connection is handed off to it. If not, the connecting is closed. Server implementions are welcome to mark the TCB with the service name. Conceptually, note that there need not be any numeric port number associated with the protocol at all. Some implementations may choose to implement this as a mapping from names to ports. It is particularly useful for the port used to vary, so that no explicit configuration is needed to avoid collisions between protocols. For UDP, the problem is harder. In the CHAOS protocol, the datagram equivalent carries a protocol name. However, it dosen't carry any data. Having UDP packets include a protocol name would none the less be the most elegant. I fear that it won't be practical. It would probably be necessary to invent UDP-2 as a full-fledged protocol that stored the name-length in the header. The weaker alternative is a UDP service that converts a protocol name into a port number. The problem here is the lifetime of the resulting information. If the mapping from names to numbers has to be permanent, then each server implementation has to have a way to maintain a permanent data base of the assignments, which would be a shame.
JSLove@MIT-MULTICS.ARPA.UUCP (05/19/86)
Concerning Benson Margulies' comments on NAMED-TCP-SERVICE: You can't just close the connection if the service is not implemented, because the TCP close is a half-channel close. I really think you must abort the connection, sending an RST packet, to indicate totally unambiguously that the service is unavailable. Consider the timing windows if you use a closing strategy: the user side sends the protocol name to NAMED-TCP-SERVICE, and then passes the connection to the protocol user program. Perhaps it waits for the server's TCP to acknowledge the name bytes. The server receives the name, and the service is unavailable. It closes the connection. The user side gets the close but is already in the protocol user code. This could be surprising; it might even have some other interpretation if the service multiplexor is used for some existing service. When an abort is used instead, this more closely resembles the response of a system which doesn't implement the service. If you try to contact a TCP which has nothing listening on a port, the connection is effectively aborted. Granted, it was never established, but a RST packet is a RST packet. A cleaner alternative is to have some positive acknowledgement. For example, the server could send OK, DOWN, or UNKNOWN back, followed by a CRLF. The disadvantage of this is the extra overhead of the reply. If you are lucky, the reply data gets piggybacked on the name acknowledgement, and the reply acknowledgement gets piggybacked on the first packet of the user program. Long live UDP-2. My proposal for this used a null-terminated string on the theory that it constitutes encouragement to use the UDP-2 protocol only for one shot single-query single-response datagram exchanges. This was probably not clever of me. By having a single length byte, you can hash very nicely on length and first character. I don't believe that more than one length byte is needed, but perhaps one or two high bits could be swiped as flags if 127 or 63 is acceptable as the maximum length of a protocol name. I still think that the name field should be an even number of bytes long because a number of machines like their 16 bit fields aligned. The spec should thus include a pad byte for odd length names. There is still the problem of designating the other end of the transaction. When you send a query the service name is equivalent to the foreign port; when sending a reply, the service name is the local port. Either both ports must be named, or there will have to be some other way of distinguishing queries from replies. I really don't like using two names, but perhaps that is best. I would prefer service name, transaction ID, and a query/reply flag. The transaction ID would be assigned by the querying host, and for many protocols ignored by the server except to send it back in the reply. Uniqueness might be required in some cases, but is only needed for a given host pair and service name, just as the port number would be. The extra space taken by the name combined with the packet size limits restricts the amount of data in the datagram. For UDP, I have heard suggested maximum packet sizes of 512 bytes, although you can send real jumbograms between some implementations using fragmentation. By requiring that transaction IDs be unique over host pairs (during some TTL (time to live)), the service name could be omitted from reply packets. If this seems ugly, how about a flag in the request packet indicating whether it is permissible to omit the service name from the reply. If the flag is still set in the reply, there is no service name. If the UDP port lookup service is written, a TTL field could be included in the reply. Some user sides might ignore the TTL and look up every time they had a transaction. For simple exchanges this doubles the overhead. Hosts with a permanent database could return very long times to live; and long TTLs would be the rule when servers permitted looking up permanently assigned values like TELNET => 23. When the port numbers are assigned on a per-bootload basis, or even more often, TTL values like one minute could be used to allow for system crashes or service restarts. The UDP service might be useful for protocols other than TCP. A request packet could include the protocol ID and the name of the service. The reply could begin with the request and add the port number and TTL. Perhaps the port number could be at the end so that protocols could have port numbers more than 16 bits long. I think it is reasonable to refine all three proposals. I fear that there will be few takers for UDP-2.
STJOHNS@SRI-NIC.ARPA.UUCP (05/20/86)
Let's try and make this as simple as possible, at least for the TCP side. I haven't taken a look at the UDP stuff yet, but there may be a totally seperate solution. Having yielded to the original point that a multiplexing port is necessary, I went back and took a look at the spec and came up with the following: 1) Assign a standard TCP port for a Contact by Name server. 2) Define a TCP option - Contact Name, give is some reasonable maximum. (32 chars? 3) The Contact Name option is only valid in a packet containing a SYN. (Just like the max seg size option). 4) Multiplexing is still done at the TCP level, based on ports and host addresses. In fact, once the connection is open, there is no difference in the way it is handled. Looking at the implementations I am familiar with (Multics, UNIX), this shouldn't be difficult to implement at all. Mike
DCP@SCRC-QUABBIN.ARPA.UUCP (05/21/86)
It shouldn't end in CRLF. For that matter, you might as well do what CHAOS did: add arguments to the 'contact name'. You could even use the urgent pointer to delimit the end of the contact/name arguments! In the CHAOS protocol, there is only one type of connection: you connect with a packet that has contact name and arguments. There is no 'datagram equivalent'. I guess I'm not sure what the UDP problem is, possibly because I don't know how UDP connections get started. Isn't there a 'first' packet that can include the contact name?
JSLove@MIT-MULTICS.ARPA.UUCP (05/21/86)
Are the general readership really interested in seeing this design discussion continue on the TCP-IP list? I haven't received any complaints about my long messages, but perhaps the interested parties have already identified themselves. If this goes on much longer, should we create a new list or something? Putting the contact name in a TCP option in the header of the SYN packet is cute: it solves the problem of gateways finding out what service owns the connection. It is ugly to have to use a reserved port number as well, but it would clearly be necessary so that hosts which didn't implement the protocol would properly reject such packets. However, putting the contact name in the header imposes length limitations much more stringent than placing it in the data stream. It also requires significant changes to all TCPs that participate in this game. Perhaps that seems desirable to others, but I view both as disasterous. A 32 character name length may seem adequate, but it really isn't. First, the number czar may give out long suffixes, like SYMBOLICS-, THINKING-MACHINES-, HONEYWELL-INFORMATION-SYSTEMS-, and so on. Within an organization, there may be further delegation, like OFFICE-SYSTEMS-, ULTRIX-, SCRC-, and so on. It would be better to spell out names like MANDELBROT, rather than having them compressed into MNDLBRT or given less clear names like DCP3. In the connection model we are currently using, there is no use for contact arguments. There is no pressing need to implement them, since we are using a stream protocol. The contact arguments can effectively just appear in the TCP data stream. I believe that FINGER is a good example. For TCP, the FINGER server arguments appear at the beginning of the data stream. In fact, they are the whole data stream in one direction. In CHAOS, the first packet contains the FINGER contact name and the arguments. The only other packets to go in that direction are the acknowledgements of the reply. TCPs generally assume a zero window until the connection is established, so TCP requires at least two packets to do what CHAOS does in one. However, contact arguments can server as protocol specific options, and it is easy to envision services that might be offered for more than one stream protocol. For example, a system might support TCP, DECNET (whatever their stream protocol is), CHAOS, TP4, X.25, and so on. Defining an out-of-band mechanism for later use preserves flexibility. Even though it makes life harder for security-minded gateways, I think that the advantages of putting the contact information at the front of the stream outweigh it. Protocols which must cross the secure gateways can be assigned numbers. The gateways can reject multiplexor port SYN packets. Sending the contact information urgently is less disruptive since many TCPs wouldn't have to be modified to permit this. The advantage of a protocol that sends the contact information in the stream is that no underlying mechanisms need be modified on either side of the connection. The user side just sends the contact information at the beginning of the stream, suitably delimited, and the server side reads it and acts on it. The server TCP doesn't need to be modified; the server could make a new connection to a numbered port and transfer bytes between the two connections transparently. Still, there may be TCPs that don't handle urgent data well. The rationale for using a network newline (CRLF) to delimit the connection name is that this is a standard delimiter sequence which is widely understood across the network. Any ASCII character may be sent without triggering recognition of this delimiter since a CRLF can be quoted as CR NUL LF. (That is asking for trouble since there are implementations which are defective in this department.) Certainly any printing character can be sent with any implementation. Sending the contact name, an optional space followed by contact arguments, and ending the whole thing with a newline makes it very easy for both user and server. The server can read one line from the TCP, which is often available as a primitive. (If it isn't, there are other ways of ensuring that too much data is not read from the TCP, like reading one character at a time.) On Multics, either urgent or newline-delimited contact names would be easy to implement. Interoperability would be possible in a very short time, although improvements could be made later. A new option would require significant changes to the underlying TCP for every system that used it, a much bigger job. UDP is a connectionless protocol. It has no options, and no memory from one packet to the next. Requests are sent out to a well known port from a local port which is used to identify the reply, if any. There are two common scenarios which UDP is used for: a simple announcement or query where each participating host sends at most one packet, and as an underlying base for implementing some complex protocol. To ask the time, for example, one host sends a UDP packet to the time port; there may be no data in the packet (except the port number to return the time to, which is part of the UDP header). The reply contains only the time, perhaps as a 32 bit number. When it arrives, the transaction is finished. This might be a good candidate for a UDP-2 protocol which carried the name "TIME-IN-SECONDS-SINCE-1970/01/01-00:00:00-GMT" rather than a well known port number. At least one such protocol would get the short name "TIME", I suppose, since the desire for precise names doesn't seem to be widespread. Other examples include TFTP and the remote virtual disk protocol. Both of these applications maintain considerable state information about a session and exchange many packets during the session. Using UDP-2 would increase the overhead of the protocols, and might decrease the amount of useful data sent in a datagram. For these sorts of services, some other service to translate service names into port numbers would be much more efficient. The UDP lookup server, like Benson's version of the multiplexor server, is easy to implement. UDP-2 might be nice, but is an even bigger job than adding a new TCP option, and I never expect to see it in general use. Concerning contact names: how about reserving some TCP options as "higher-level protocol specific"? The options could vary in meaning and even length depending on which protocol (well known port) was used. They would be valid only for the SYN packet. TCPs which play this game could indicate in the LISTEN or OPEN call that they would accept such options, and define a way for the application server to read them when the connection is established. TCPs which don't implement the options, or TCBs which don't accept the options would cause the connection to be aborted rather than established. This precursor would make it possible to implement St. Johns' version of the Named-Service server, and would provide a mechanism like contact arguments.