[mod.protocols.tcp-ip] When to acknowledge SMTP messages

Kevin_Crowston@XV.MIT.EDU.UUCP (02/27/87)

Message type: Message
Topic: SMTP
Text: 
Re: Message of 25 Feb 87 05:01 from MRC%PANDA@SUMEX-AIM.Stanford.EDU

> The server should NOT make the client wait while a message is
> being delivered...

I faced this issue when implementing our mail relay.  I decided that the
client SMTP would have to wait while the relay delivered the message.
Otherwise, the relay could acknowledge the message and then crash
or discover that the destination mail server was unable to take the message.
Either way, the mail goes on the floor, hardly desirable.  Acknowledgement
should mean that the message is really okay.

On the other hand,  I also get multiple copies of a lot of whole and partial
messages; it seems that some hosts are less patient than others...

Kevin Crowston
MIT Sloan School of Management

MRC%PANDA@SUMEX-AIM.STANFORD.EDU.UUCP (02/27/87)

Kevin Crowston -

     Your relay should queue the message on its local disk and acknowledge
it once it is safely written.  That protects against the system crash
problem.  If the message cannot be accepted by the other end, then the
message should be returned to the sender via the return-path address.

     An SMTP server should NEVER block a client waiting for delivery.
It is STUPID and WRONG-HEADED to keep an SMTP connection open for ANY
period of time longer than is necessary to get the bits across and
acknowledged.

     The world isn't necessarily the Internet with no charges per packet
or time charges for a virtual circuit.  When time charges are a reality,
mail servers that block clients COST REAL MONEY.

     Sorry for flaming, but this really is an important concept.

-- Mark --
-------

geof@decwrl.DEC.COM@imagen.UUCP (02/27/87)

 >  > The server should NOT make the client wait while a message is
 >  > being delivered...
 >      
 >  I faced this issue when implementing our mail relay.  I decided that the
 >  client SMTP would have to wait while the relay delivered the message.
 >  Otherwise, the relay could acknowledge the message and then crash
 >  or discover that the destination mail server was unable to take the message.
 >  Either way, the mail goes on the floor, hardly desirable.  Acknowledgement
 >  should mean that the message is really okay.

I agree whole-heartedly.  The problem is with SMTP itself.  TCP mandates
that it is the client's responsibility to ensure that the remote client
is up.  In other words, TCP won't probe an idle connection (the old
"keep-alive" discussion), so the higher level protocol must do so if it
cares.  This behavior on TCP's part is necessary to cope with potentially
expensive network paths (e.g., a PTT network that bills by the packet),
so that quiescent TCP connections do not run up big bills.  If you're out of
the office for lunch, you don't want your telnet connection to send
packets around uselessly for an hour or more.  As in most cases, it
doesn't matter much when you're on an Ethernet, but it does in the more
general case.

In the case of SMTP, when a message is terminated with a ".CRLF", no
SMTP data may flow except the server's success/fail response.  Since
the TCP connection is quiescent during this interval, TCP cannot detect
a remote crash.  The only reasonable thing to do is to have SMTP set its
own death timer when it sends ".CRLF" and hope the message can be
delivered during that time interval.

The trouble is that there is no way to judge how long the SMTP death timer
should be.  Some machines deliver mail fast, others not so fast (mine
is just plain slow).  No matter what value you set for the death timer,
you lose some of the time.  And the way you lose is that mail to one type of
host is always lousy.

The ultimate answer would be to fix SMTP, so that the server could still
respond with "OK, I'm still here" messages while it was delivering the
mail.  Given all the SMTP hosts out there, this is probably not going to
happen.  Ad hoc solutions include:

   1. Have the server respond before the message is sent (bad, since messages
      can get dropped on the floor).

   2. Adjust the timeouts to try and accomodate every host you would
      reasonably connect to => every TCP implementation.  This is
      what we do now, and it doesn't work all the time.

   3. Find some random data for the message sender to periodically
      queue.  This would have the effect of taking the TCP connection
      out of its quiescent state, so that the TCP layer can detect
      a machine crash for you.  This works unless the problem is
      that the remote SMTP server is in a tight loop, with the remote
      TCP still healthy (that's a "software bug"-type situation that
      can be detected and fixed).

I favor [3].  Try this:

    When you send ".CRLF":
        set timer for how long you expect this to take (T)
        set timer for how long you are willing to hang (D >> T)
        set noops=0
        wait for input from server
    
    On TIMER T:
        send NOOP<CRLF> command to server
        noops = noops + 1
        set timer to T
        go back to waiting for input from server

    On INPUT:
        process success/fail message from SMTP SEND command
        while noops > 0 do
            read & discard command from server
            noops = noops - 1
            end

    On TIMER D:
        assume failure of message.

The idea is that by sending NOOP commands, the TCP layer will
probe the underlying connection for you.  Thus, the ultimate
timer, D, can be VERY long, since it detects bugs in the remote
SMTP, not random events.  The annoyance is that you have to
ignore enough responses to match each noop you sent (I guess
the other annoyance is that it is a miserable hack that should
be shot at sunrise...).

An obvious enhancement is to query the local TCP before sending
a NOOP -- it is not necessary to send anything unless the local
TCP is quiescent.  This is extremeley useful in the situation
where the SMTP connection is dribbling along at 1200 baud somewhere
and the REAL problem is that the message hasn't been TRANSMITTED yet.

The timer T should be long enough to give the other machine a
running shot at delivering the message in that time (say 1-5
minutes).

- Geof

jordan@UCBARPA.BERKELEY.EDU.UUCP (02/27/87)

Kevin Crowston writes:

	I decided that the client SMTP would have to wait while the
	relay delivered the message.  Otherwise, the relay could
	acknowledge the message and then crash or discover that the
	destination mail server was unable to take the message.

Sendmail seems to handle this correctly, since "delivered" to that part
of the code means "placed in the queue" (i.e., wrote it to disk ... if
the machine then crashes, the daemon will pick up where it left off
since the queue file is still there) -- you can't acknowlege the
message as being sent before you have firm control of it.  That's what
lock-step is all about.  Once you have done that, if you find later
that you can't deliver it, it's up to the recipient SMTP process to
send it back to where it came from.  This can be handled
asynchronously.

/jordan

sy.Ken@CU20B.COLUMBIA.EDU.UUCP (02/27/87)

    > The server should NOT make the client wait while a message is
    > being delivered...

  I faced this issue when implementing our mail relay.  I decided that the
  client SMTP would have to wait while the relay delivered the message...

I think the furthest the acknowledgement process should go is essentially
"message received by this host and queued for delivery locally".  In so
many cases, there's often too much processing involved in delivering to the
final destination mailbox that the sending system should NOT have to wait
for all of this to go on.  I see the cases of local mailbox delivery and
mail forwarding as the same.  For example, host A wants to send to host C,
not on host A's network.  It must therefore forward through host B.  Should
host A have to wait while host B tries to forward the mail through all the
way to host C?  This case is clearly unreasonable.  The local delivery
process can often be just as unreasonable for a variety of reasons, and
thus, the mail should be stuffed into some local delivery queue (which
would presumably be a fast process), and actual local delivery can then
happen asynchronously with the SMTP dialog.  If there is some fatal case
where the mail cannot actually be delivered after being queued on the
target system for local delivery, then the entire message can be returned
to the sender by the mailer.  This is how the TOPS-20 mailer works, and it
seems like a fairly airtight procedure in practice as well as in theory.

/Ken
-------

mrose@nrtc-gremlin.arpa.UUCP (02/27/87)

Hack.  Hack.  Hack.  Two things:

1.  As Jordan pointed out: as soon as the SMTP server queues the message for
    delivery (not actually delivers it), the server should send the success
    acknowledgement to the client.  Even if your host is single-threaded,
    the server can always deliver the mail *after* the SMTP connection is
    closed.

2.  Why hack SMTP?  I can find similar faults with interactions in FTP.
    And in just about any command/response application that you can run
    on top of TCP.  The correct solution is to add an *option* to TCP
    saying to use keep-alives.  Things like SMTP could use it, things like
    telnet (where a failure is obvious to any interactive user) don't have
    to use it.  With this solution, you only have to make a very small
    change to the way an application opens the network, instead of complicating
    the peer-to-peer protocol used by the application.  Keep it simple guys!

/mtr

MRC%PANDA@SUMEX-AIM.Stanford.EDU.UUCP (02/27/87)

A much better approach is for the SMTP server to queue the message on
its local disk and acknowledge immediately.  The delivery can be done
by an asynchronous process.  Unless your system is in real bad shape,
it shouldn't take any time at all to write a file on the disk.

It is much better to cure the disease (SMTP servers taking an
indeterminate amount of time to respond) than it is to mask the
symptoms.
-------

dms@HERMES.AI.MIT.EDU.UUCP (02/27/87)

   Date: Thu, 26 Feb 87 16:47:02 PST
   From: jordan@ucbarpa.berkeley.edu (Jordan Hayes)
   Organization: Experimental Computer Facility (XCF), UC Berkeley

   Kevin Crowston writes:

	   I decided that the client SMTP would have to wait while the
	   relay delivered the message.  Otherwise, the relay could
	   acknowledge the message and then crash or discover that the
	   destination mail server was unable to take the message.

   Sendmail seems to handle this correctly, since "delivered" to that part
   of the code means "placed in the queue" (i.e., wrote it to disk ... if
   the machine then crashes, the daemon will pick up where it left off
   since the queue file is still there) -- you can't acknowlege the
   message as being sent before you have firm control of it.  That's what
   lock-step is all about.  Once you have done that, if you find later
   that you can't deliver it, it's up to the recipient SMTP process to
   send it back to where it came from.  This can be handled
   asynchronously.

   /jordan

Actually, sendmail doesn't handle this completely correctly. Before
sendmail queue's up a message, and gives the acknowledgment back to
the sender, it attempts to expand every address in a mailing list.
This expansion can take a long time, since it means a call to the
resolver to qualify host names. So, messages sent to large mailing
lists take a long time to get queued up. What sendmail should be doing
is writing out a very simple queue file with the un-expanded
receipients. The background delivery process should do the expansion
the first time it comes across an un-expanded address.