[news.software.nntp] Survey: C News batching vs. nntplink

coolidge@brutus.cs.uiuc.edu (John Coolidge) (11/20/89)

There's an obvious problem that many people have remarked upon
involving the contradicition between the C News batching code in
nntpd vs. the continuous transmissions of nntplink. Since the
batching code relies on only writing articles every so often, lots
of articles are received when nntplink is run but aren't passed on
to relaynews until later, while nntpxmit-style transfers send the
article later but get it processed later. The end result is slower
article propagation and lots more dups.

I'm wondering what the various C News sites running NNTP have done
to get around this problem. My solution has been to force nntpd to
write each article as a separate batch and change the article-naming
code to avoid collisions. There are some other solutions being worked
on, I know (in fact, I'm working on one of them), and I'm interested
in seeing what people are doing.

Another side of the coin is this: if you're going to run nntplink
to someone, make sure they're not doing C-style batching. Otherwise
you're not helping either yourself or your connection any. I've had
this problem with a few connections, since I'm now switched over to
using nntplink for just about all my connections (with varied sleep
times in nntplink --- this, perhaps, should be a flag). As a
workaround, I've implemented a flag to nntplink that says "Always
break connection". I _could_ just run nntpxmit, but I'd rather keep
things consistently one way or the other.

--John

--------------------------------------------------------------------------
John L. Coolidge     Internet:coolidge@cs.uiuc.edu   UUCP:uiucdcs!coolidge
Of course I don't speak for the U of I (or anyone else except myself)
Copyright 1989 John L. Coolidge. Copying allowed if (and only if) attributed.
You may redistribute this article if and only if your recipients may as well.
New NNTP connections always available! Send mail if you're interested.

lamy@ai.utoronto.ca (Jean-Francois Lamy) (11/20/89)

Obviously, running nntplink to leaf nodes is one case where batching in nntpd
does not get in the way.  Also, by picking nntp feeds that don't talk to each
other you can reduce the likelyhood of duplicates (i.e. there is a point of
diminishing return in any flooding algorithm, that where you start getting...
flooded).  Even if no duplicate article was ever transmitted, NNTP feed mania
can still lead to case where all you do all day is say no to IHAVE requests...

I'd say: pick your feeds carefully, and weed out the less useful ones,
independently of any technical fix you may come up for inews/rnews/relaynews.

Jean-Francois Lamy               lamy@ai.utoronto.ca, uunet!ai.utoronto.ca!lamy
AI Group, Department of Computer Science, University of Toronto, Canada M5S 1A4

coolidge@brutus.cs.uiuc.edu (John Coolidge) (11/20/89)

wesommer@athena.mit.edu (William Sommerfeld) writes:
>Funny you should notice.  As it turns out, nntplink doesn't have to
>change; only nntpd need change.  Patches aren't available yet (and
>might not be; I'm really busy; don't even ask for them), but the
>changes are simple enough to describe:

Yup, this is my opinion too. Really, there's nothing that nntplink
_could_ do differently to fix the problem (except close the connection,
but then what's the point :-) ).

>- I made nntpd aware of the NEWSCTL/LOCKinput lock file.  If relaynews
>is running, this lock file exists.  I rearranged the code in batch.c
>to queue the batch into NEWSARTS/in.coming *first*, and only fork/exec
>newsrun if relaynews isn't running.

I've dispensed with nntpd forking off newsrun a long, long time ago.
It turned out to be moderately costly, and (much worse) crashed
our machine on a regular basis (something about having big things
forking off from inetd-invoked code doesn't make SunOS4.0.3 very
happy. Portmap died on a regular basis...). Of course, I've got
something to take the place of newsrun, and most people don't
(if I had the time to get things stable, it would help...).

>If articles are flowing in a continuous stream (less than a
>five-second delay between articles), they get batched using the
>existing rules (five minutes or 300KB, whichever comes first).

This is what I'm trying to avoid, however. My optimal situation
is to have each article received back on the outgoing wire with
no delay at all. That's impossible, but sub-30 seconds is a
very reasonable approximation of the goal. This requires things
to happen very quickly, which sort of blows batching away. It's
good, though, for people (most of them, I suspect) who consider
rapid propagation a secondary goal.

--John

--------------------------------------------------------------------------
John L. Coolidge     Internet:coolidge@cs.uiuc.edu   UUCP:uiucdcs!coolidge
Of course I don't speak for the U of I (or anyone else except myself)
Copyright 1989 John L. Coolidge. Copying allowed if (and only if) attributed.
You may redistribute this article if and only if your recipients may as well.
New NNTP connections always available! Send mail if you're interested.

wesommer@athena.mit.edu (William Sommerfeld) (11/20/89)

In article <1989Nov20.002159.26404@brutus.cs.uiuc.edu> coolidge@brutus.cs.uiuc.edu (John Coolidge) writes:

   There's an obvious problem that many people have remarked upon
   involving the contradicition between the C News batching code in
   nntpd vs. the continuous transmissions of nntplink. Since the
   batching code relies on only writing articles every so often, lots
   of articles are received when nntplink is run but aren't passed on
   to relaynews until later, while nntpxmit-style transfers send the
   article later but get it processed later. The end result is slower
   article propagation and lots more dups.

Funny you should notice.  As it turns out, nntplink doesn't have to
change; only nntpd need change.  Patches aren't available yet (and
might not be; I'm really busy; don't even ask for them), but the
changes are simple enough to describe:

- I made nntpd aware of the NEWSCTL/LOCKinput lock file.  If relaynews
is running, this lock file exists.  I rearranged the code in batch.c
to queue the batch into NEWSARTS/in.coming *first*, and only fork/exec
newsrun if relaynews isn't running.

- I rearranged the loop in serve.c to make the alarm timeout and
handler come from a global variable instead of a compiled-in constant.
At the top of the loop, the timeout and alarm handler variables are
reset to the default values.

- The function which implements the ihave command sets the timeout to
five seconds, and the alarm handler to a function which, if
NEWSCTL/LOCKinput doesn't exist, terminates the batch.

The effect is that: if articles are coming in one at a time and the
machine isn't backlogged, they get processed one at a time.

If articles are flowing in a continuous stream (less than a
five-second delay between articles), they get batched using the
existing rules (five minutes or 300KB, whichever comes first).

If the machine is backlogged (relaynews is running), the articles get
processed in batches.

We've only been running this way for a couple of days now on
bloom-beacon.mit.edu and snorkelwacker.mit.edu, and it *seems* to be
working well, but it still hasn't been exposed to a full volume
during-the-week feed, so I don't know if it will break down.

Given this kind of code in nntpd, it would make sense for nntplink to
*not* close the connection after every 20 articles... given average
article sizes, every 100 articles would be more like it; that way, if
the machine is backed up, you get large batches which allow C news to
run at full blast.

We're running C news/NNTP on slow machines with slow disks, and it
seems to be keeping up; B news was running at the edge (bloom-beacon's
load was continuously over 10 with B news; these days, it seems to be
hovering around 1-2..).

The five second delay seems fairly short, but 30 seconds wasn't enough
to avoid lots of dups.
--
Henry Spencer is so much of a  |    Bill Sommerfeld at MIT/Project Athena
minimalist that I often forget |    sommerfeld@mit.edu
he's there - anonymous         |