[net.news] increasing mail/netnews efficiency

lauren (03/12/83)

I strongly suspect that the best place to try improve uucp "efficiency"
is in the "higher-level" file handling routines -- I recommend
AGAINST trying to improve or replace the packet driver.  Several points:

1) The packet driver, with its non-trivial buffering techniques, manages
   to allow fairly high rates of data transfer, even on multiple lines,
   without dragging most systems into the ground.  Most other protocols
   with which I've experimented have turned out to have substantially
   lower throughput and presented a greater load (sometimes much greater)
   on the system.

   The packet driver is not the problem -- it's a nice piece of work
   and performs very nicely.

2) Occasionally we hear complaints about uucp not really using the full
   duplex capabilities of the channel.  Frankly, I doubt very much
   if true "full duplex" file transfers would really improve the
   situation.  Most heavy uucp traffic "tends" to be mostly in one direction 
   (especially netnews traffic!) -- the number of cases where both
   sites have even approximately equal traffic *at a given time* are
   pretty small.  Given the possibility of load disruptions and other
   factors, plus the "unidirectional" nature of most traffic, leads me
   to suspect that an effort into more full duplex usage of uucp would
   also NOT be the best way to proceed.

3) The queue.  The place where we see most of the "clogging" clearly
   seems to be the uucp spool directory itself.  The mechanism of
   having three files for each message to be mailed provides good
   generality (i.e. mail is but a special case of a very general
   inter-machine mechanism) but is costly in time and space.  However,
   we don't really want to toss out the generality either!  My suggestion
   would be to define a new "channel" for mail/netnews which still uses
   the conventional packet driver.  At first, since there would be
   some compatibility issues, I would propose that the new channel
   only be used for netnews between cooperating sites.  As more sites
   started running the appropriate versions of uucico/uuxqt, the channel
   could be used for mail as well.  The old channel would still exist
   as a fallback in all cases.

   The easiest way to set this up would be to define the new channel
   as an "alternative" to the normal "g" channel that we now use.
   Perhaps calling this a new channel is a bit deceptive.  What I really
   want to do is establish a new mail/netnews delivery scheme that uses
   only one or two files (instead of the current three) for delivery 
   of a single message.  Sites would indicate their ability to handle
   the procedure by negotiating to use the "new channel" (perhaps "m"?)
   instead of "g".  Channel "m" would still use the ordinary packet
   driver and still transfer files in the conventional manner -- all the
   "m" indicates is that the sites have agreed to use the new mail
   delivery mechanism.
 
   There are a number of manners in which the new delivery mechanism could
   function.  One of the most obvious is to include the addressing
   information for the message in the body of the data file *of* the message.
   This information would be stripped from the message before final 
   delivery.  For example, something like:

   *TO-USER: ihnp4!vortex!lauren
   From seymour DateTime remote from foosite
   ...


   In this case, the "*TO-USER" line (or whatever we'd call it)
   contains the addressing information which would normally be 
   contained in the "X.foo" file for mail delivery.
   
   As I suggested above, netnews would probably be implemented first:

   *TO-PROGRAM: rnews
   From...

   If a site accepted a uucico connection using the "m" protocol, it 
   would be saying that its uuxqt was ready and willing to handle
   messages in this sort of format, as an alternative to the "g" format.

   This new technique involves the use of two files (the C. and D.) files
   in the sending system's spool dir, and the transfer of one file (D.) to
   the remote system.  With considerably more work, it would be possible
   to reduce the number of files in the sending spool dir to one, but I'm not
   sure that this would really be worth the effort involved.

   To avoid confusion, it might well be advisable to store the mail
   data file, under the new format, as something other than a typical
   D. file -- Perhaps M. or something similar would be suitable.

   I believe that the overall changes required to implement such a scheme
   for netnews/mail, both in uucico and uuxqt, are actually quite small.
   I also suspect that a substantial increase in overall uucp efficiency
   might result, with minimal compatibility problems.  We would still
   be using an efficient transfer mechanism, only the upper-level file
   handling/delivery mechanism would change.  For netnews, message batching
   and other techniques could still be used to gain even more benefits.
   
   Comments?

--Lauren--

smb (03/15/83)

Lauren raises several good points.  However, I don't completely agree
with his proposed solutions.

First, I agree that the packet driver needs to be kept for dial-up use.
The MMDF packet mechanism, though apparently simpler, gets far worse
throughput and (at least in some versions) is rather susceptible to
catatonia.  Uucp's driver gets fairly good throughput on 1200 baud lines
(though it starts falling off badly at higher speeds), and it seems to
be quite reliable.

The issue is different, though, if you're using an underlying
transmission path that is itself flow-controlled and error-corrected,
such as a TCP/IP channel.  Performance improvements of at least a factor
of 10 can be obtained by replacing the packet driver with some other
protocol, as outlined by Lauren.

Which brings up my second point -- the alternate protocol mechanism
isn't geared towards higher-level interchanges like "here's some mail";
it's intended for lower-level functions.  The primitives a new protocol
must provide are "open", "close", "send/receive message", and
"send/receive file".  Decisions like what file should be sent, and what
the contents of it mean, are handled at a higher level, and are not
as easily negotiated.

It is also unclear that changing file formats will really help netnews,
especially for sites that feed more than one other site.  The 2.10 code
can make use of the '-c' option (via code changes) to uux; this causes
the text of the article to be transmitted directly from the news spool
area, and hence avoids the creation of the second D. file in the
outbound uucp spool area.  This change, plus Truscott's subdirectory mod
to uucp (separate subdirectories for C., D., and X. files) should yield
a large performance improvement.  (To be sure, they're not the whole
problem; a lot of the overhead with uucp seems to be the per-file
handshaking that goes on.  Much of this is directory-search time, but I
don't have a feel for just how mcuh.)

The real problem with mail transfer is that uucp is *too* general; it
can't do the sorts of special-purpose mail handling that one might like.
MMDF and SMTP (the ARPAnet's "simple mail transfer protocol" -- not to
be confused with the message format standard), on the other hand, allow
a site to validate each address individually before sending the body of
the message.  They also make it much easier to deal with temporary
resource problems, such as no space or no i-nodes -- by the time uuxqt
learns of such a problem, it's too late to reject the message with a
request for retry, and it may not even be possible to mail it back.
With SMTP, the sending site *knows* whether or not the next relay
received it correctly.  (Uucp also has problems with the stupidity of
uuclean; if my mail can't be delivered, I would really like the letter
back, rather than being told the uucp filenames....)

Where does this leave us?  One idea is to change the C. file mechanism
locally to some more efficient scheme.  The outbound X. file could
probably be generated dynamically, especially for the simple cases,
i.e., mail and news.  The same might be done for inbound X. files,
though there are timing considerations to worry about -- it isn't
feasible to attempt delivery immediately upon receipt of an X. file,
especially if the delivery attempt involves expensive operations like
alias-list expansion.  If you want to experiment with alternate
protocols, use uucp (rather than uux) to create Q. files (or some such)
in the receiving site's spool directory, and have some variant of uuxqt
interpret them.  To be sure, that can't be negotiated at transfer time,
but it can easily be controlled for news hops via the 'sys' file.

Finally, let me make one appeal to anyone implementing a new queuing
mechanism:  implement a "requeue" counter.  That is, any time a
transmission fails and is requeued, a counter should be bumped.  If it
reaches a certain limit, that particular job should be abandoned;
otherwise, one failing job can wedge the whole queuing system.  A good
example is an attempt to transmit a gigantic file via uucp.  Even if
it's received properly -- by no means certain -- if the receiving site
has to copy it to another file system from the TM. file, the sending
site will time out waiting for a response.  And the next time the two
systems connect, the file will be sent again....


		--Steve