[net.news] a harmless query...

paul@greipa.UUCP (Paul A. Vixie) (05/03/85)

Well, then.  I've just read 131 articles from net.news and I feel fairly
sure that noone has said anything about this in the past few weeks... (I
apologize in advance, of course, if this is not the case [of course]).

My understanding of news is that every system sends every article it
receives to every machine it feeds.  Lots of "every"'s in that...

If this understanding is correct, a system can get the same article from
every machine that feeds it.  This is not a major problem, since it will
go straight to the bit-bucket when its article-ID is found to match that
of an existing article.  However, the article must actually be sent to
a machine before it can get ignored.

I'm not sure what the statistics would be on this, but as an example,
wouldn't it be silly if 45 minutes of each 90 minute nightly long-distance
UUCP connection was spent transferring articles which are trashed
immediately?

Folks are starting to talk about 2400-baud dialup modems for the backbone
sites.  If my above-stated worries are founded, I think there is a less-
expensive way to cut down on phone charges...

The technical problems in not sending articles already present in the
receiving system are beyond the scope of this simple inquiry, not to
mention my limited understanding of UUCP.  If this is a well-known
subject and this entire article is useless drivel to you, please inform
me personally - I'll apologize to the net.  If not, please, let's have
a rousing technical chorus on it before 2.10.3 is released.

	Paul Vixie
	{twg decwrl}!greipa!paul

chuqui@nsc.UUCP (Chuq Von Rospach) (05/04/85)

In article <185@greipa.UUCP> paul@greipa.UUCP (Paul A. Vixie) writes:
>My understanding of news is that every system sends every article it
>receives to every machine it feeds.  Lots of "every"'s in that...

Not quite true. Any site that is already in the 'Path' header line will not
see the message again because inews will recognize that as a duplication
and not ship it. Also, 'every' system doesn't send 'every' article to
'every' machine it feeds (these definitions may vary, void where prohibited
by law). For example, for 'net.all' nsc gets almost all of its news from a
single site, hplabs. We ship this data to three sites, cadtec, daisy, and
voder. In our case we almost never see a duplicate message because we only
get a single feed. Sites downstream of us also don't see duplicates because
they only get their news from us. 

When you have multiple sites feeding in, you will get duplicates. Hplabs is
an example of this, because they get news from (I believe) hao, tektronix,
and sdcrdcf. Major sites such as this are the exception instead of the
rule, though, and the duplication tends to be acceptable because it allows
the news distribution to be sped up by cutting down on the paths it has to
travel.

It'd be nice to be able to recognize duplicates before shipping them, but
the first try at that (inews has an IHAVE/SENDME protocol that attempts to
deal with this...) doesn't work because of the turnaround delays in uucp
lines. What happens is that in many cases between the time you ship out the
SENDME and the time you get the messages someone else shipped them as well
and the duplicates show up anyway. You end up slowing news down
significantly (three phone calls instead of one for a set of messages)
adding a lot of volume to your uucp link (three sets of messages -- an
IHAVE, a SENDME, and the message itself) and it didn't really help. 
-- 
:From the offices of Pagans for Cthulhu:          Chuq Von Rospach
{cbosgd,fortune,hplabs,ihnp4,seismo}!nsc!chuqui   nsc!chuqui@decwrl.ARPA

Who shall forgive the unrepentant?

mark@cbosgd.UUCP (Mark Horton) (05/05/85)

This is essentially correct.  Most hosts on the net are on the
fringes somewhere and get all their news from one place.  So
such duplications don't affect them.  There are many hosts, however,
that exchange news with 2 or 3 (or more) hosts, and they will get
(and throw away) a good number of messages they have already seen.

The reason for this duplication is for reliability.  A lot of things
can go wrong with a news link, and if one neighbor is your only source
of news, you may miss 10% (or more, depending on how robust your
neighbor and your link to this neighbor are) of the news.  There are
also speed arguments - you get news faster if you keep the first of
several copies that come in.  There is no real potential for saving
money by cutting back on duplicate feeds; hosts that are this concerned
about money probably already have only one feed.

There is potential for saving money here by only sending news that
has not already arrived on the machine in question.  A few simple
efforts along these lines are already in the code (for example, it
won't send news to a host in the Path: line, since it knows that host
has already seen it, or to the host the article originated on.)

An interactive protocol for only sending message that have not arrived
yet is very hard, given the dialup/batch nature of UUCP.  There is no
way to get interactive IPC between two news processes over UUCP.  You
can get interactive IPC over TCP/IP links, but these are rarely over
dialup lines, so there's no incentive to write the code.

There is a protocol in news called "ihave/sendme" in which only a
Message-ID is sent over the wire ("ihave xxx") instead of the whole
message; then the other system sends back the Message-ID's of messages
it doesn't already have ("sendme xxx"), and then the first system sends
it.  There isn't any savings, because the headers are often bigger than
the message body, and there is a considerable time delay for the 3 way
handshake (especially if the connection is polled.)  It seems reasonable
that by batching these (sending one message with 50 or more message ID's
in it instead of separate messages) there would be significant cuts in
phone bills (but the delay would still be there), and the method for
doing this is documented, but I haven't seen anybody actually implement it.

	Mark Horton