[comp.protocols.tcp-ip] Batched SMTP

enag@ifi.uio.no (Erik Naggum) (10/27/90)

Phil Karn <karn@envy.bellcore.com> writes,
in article <1990Oct25.165545@envy.bellcore.com>:
>>> While we're on the subject of piggybacking, another thing I would
>>> really like to see is widespread use of batched SMTP on the Internet.
>>> I think the number of packets it takes for most SMTP implementations
>>> to transfer a short mail message is criminal, especially when the
>>> message has several recipients on the same system.  There's no reason
>>> that you shouldn't be able to send a series of SMTP commands in a
>>> single TCP segment and receive a series of responses, except that many
>>> SMTP servers inexplicably blow up when you try this. Given that TCP is
>>> supposed to be a reliable byte stream protocol, the designers of these
>>> systems must have gone well out of their way to keep this from working.

If you do this, it isn't SMTP.

RFC821, page 37:

   4.3.  SEQUENCING OF COMMANDS AND REPLIES

      The communication between the sender and receiver is intended to
      be an alternating dialogue, controlled by the sender.  As such,
      the sender issues a command and the receiver responds with a
      reply.  The sender must wait for this response before sending
      further commands.

In recently standardized parlance, the session is two-way alternate
(a.k.a. half-duplex) with implicit turn giving.

I assume that you mean that batching should be done as follows:

In the first segment, send

	HELO
	MAIL
	RCPT...

wait for and check incoming 250 results, then

	DATA
	<blast>
	.
	QUIT

in the next segment if all went well, otherwise send a QUIT, only.

So we have this minimal packet exchange with a conforming SMTP server:

	  SMTP Client			  SMTP Server
	SYN
					SYN+ACK + "220"
	ACK + "HELO/MAIL/RCPT..."
					ACK + "250"
	ACK
					"250"
{	ACK
					"250"		}  repeat
	ACK + "DATA/msg/./QUIT"
					ACK + "354"
	ACK
					"250"
	ACK
					FIN + "221"
	FIN+ACK
					ACK

We could be really, really optimistic and assume that the server was
aware of the status of the incoming queue, and make it like this:

	  SMTP Client			  SMTP Server
	SYN
					SYN+ACK + "220"
	ACK + "HELO/MAIL/RCPT..."
					ACK + "250/250/250..."
	ACK + "DATA/msg/./QUIT"
					ACK + "354/250/221"
	ACK+FIN
					FIN+ACK
	ACK

Alternatively, we could be really perverse and do this:

	  SMTP Client			  SMTP Server
	SYN + "HELO/MAIL/RCPT..."
					SYN+ACK + "220/250/250/250..."
	ACK+FIN + "DATA/msg/./QUIT"
					ACK+FIN "354/250/221"
	ACK

There are several immense problems in this scheme, and the very desire
to minimize use of sequence number space like this.

Unless we tell TCP not to send ACKs before we have written all we need
to it and do a PUSH, replies will not be piggy-backed on ACKs, and
those will be two segments instead of one.

Unless the SMTP server knows that there is more input, it can't delay
the PUSH until all input is processed, which will be separate segments
with separate ACKs, unless the SMTP client knows that more is coming,
and can tell TCP not to ACK until it writes more data and does a PUSH.

Unless we redefine SMTP to allow commands before the 220 reply, we
can't send HELO before we get it.

Unless we redefine SMTP to allow commands to be sent before the reply
to the previous command has been received, we can't group commands.

Unless we disregard the wasted processing and the behavior when
receiving an out-of-sequence RCPT by the server when the MAIL is
rejected, we can't group MAIL and the first RCPT.

Unless we're very brilliant with respect to the individual 25x and 55x
replies to the RCPTs, we shouldn't group them.

Unless we ignore all sorts of local problems at the server side, we
can't group DATA and the message, and in particular, we can't group
DATA, the message, AND the final dot.

Unless we demand only one connection per message or ignore message
delivery problems at a late stage, we shouldn't group the final dot
and the QUIT.

Unless the STMP server is able to recognize the QUIT properly, it
can't set the FIN bit in the last data segment.

Unless we have support for half-closing connections, the SMTP client
can't group the FIN and the QUIT, again unless you redefine SMTP not
to acknowledge the QUIT with a 221.

I think I have pointed to several severe problems in control and
status information propagation between TCP and the application, some
problems in end-to-end application acks of operations, and that these,
by themselves, make it very inappropriate to squeeze the living
daylight out of SMTP.  In result, I think that we will produce a lot
of hair on the client side to take care of a server which thinks it's
seeing and replying to individual commands, and that the gain in
number of packets will be minimal, such as three, in the common case.

				 ---

Rather, if you want a fast, inexpensive mail transfer protocol, define
one with two pieces of information transferred and acked:

	Envelope

	Message

One pair per exchange, and the FIN bit used as the end-of-message
indicator, if you can handle one-sided close operations gracefully.
Most operating systems doesn't support this feature.  (I.e. the last
data segment has FIN set, but it still awaits a data segment with the
other side's FIN segment.)  It could go like this:

	  FMDP client			  FMDP server
	SYN + Envelope
					SYN+ACK + Envelope OK
	ACK+FIN + Message
					ACK+FIN + Message OK
	ACK

FMDP stands for "Futuristic Message Delivery Protocol".

Most probably, you will get something on the order of:

	  FMDP client			  FMDP server
	SYN
					SYN+ACK
	ACK
	Envelope
					ACK
					Envelope OK
	ACK
	Message
					ACK
	FIN
					ACK
					Message OK
	ACK
					FIN
	ACK

If you know the size of the message, you can send that information in
the envelope and relieve yourself of the one-sided close problem.

I believe this scheme has been used in a different set of protocols,
which has its own set of fans, not necessarily on this list/group.

				 ---

I like SMTP the way it is.  That doesn't mean I dislike improvements,
just don't call them SMTP, and don't expect my SMTP client or server
to accept your bogus idea of what the underlying transport protocol
provides and I therefore have to accept at a higher layer.  Just
because ARM (Arpanet Reference Model) doesn't have a session layer,
doesn't mean it isn't implicit in some of the protocols.  ARM just
doesn't think it's worth a whole layer.  That's why.  Try pulling the
plug at the Session layer to the mighty Priests of the Holy Seven, and
they'll react with even more rationale for it's existence than I have
provided above.

--
[Erik Naggum]		Naggum Software; Gaustadalleen 21; 0371 OSLO; NORWAY
	I disclaim,	<erik@naggum.uu.no>, <enag@ifi.uio.no>
  therefore I post.	+47-295-8622, +47-256-7822, (fax) +47-260-4427
--