[news.software.b] relaynews too slow

schoch@trident.arc.nasa.gov (Steve Schoch) (09/21/90)

We have a problem with C news.  When we first installed it, incoming nntp
connections would spool articles into /usr/spool/news/in.coming faster than
relaynews could process them.  I installed a patch that has rnews run
relaynews -r immediately which solves our problem with the large in.coming
batches (there are no more) but now I have about 23 nntpd's that are waiting
for relaynews -r to complete.

The 23 relaynews -r we have running are all waiting for the current relaynews
to finish with the LOCK file.

We are using dbz.

Neighbors are complaining that their nntpxmits are running too slowly to us.

What am I doing wrong?

	Steve

henry@zoo.toronto.edu (Henry Spencer) (09/23/90)

In article <1990Sep20.212757.12868@news.arc.nasa.gov> schoch@trident.arc.nasa.gov (Steve Schoch) writes:
>The 23 relaynews -r we have running are all waiting for the current relaynews
>to finish with the LOCK file.

The integration of relaynews with NNTP is far from seamless at present. :-)
The underlying problem is that NNTP does not take notice of the fact that
transferring and processing articles a batch at a time is much more
efficient than doing them one at a time.  The UUCP community learned that
most of a decade ago, but the Internet community has generally been
reluctant to admit that UUCP has anything to teach them.

There is work in progress, on various fronts, aimed at doing something
about the issue.  It's not a solved problem with a canned solution ready
to hand.
-- 
TCP/IP: handling tomorrow's loads today| Henry Spencer at U of Toronto Zoology
OSI: handling yesterday's loads someday|  henry@zoo.toronto.edu   utzoo!henry

zeeff@b-tech.ann-arbor.mi.us (Jon Zeeff) (09/23/90)

>The underlying problem is that NNTP does not take notice of the fact that
>transferring and processing articles a batch at a time is much more
>efficient than doing them one at a time.  The UUCP community learned that


I've modified a version of nntpxfer that batches  a number of articles
into memory and then feeds it to relaynews.  Others have done similar
things.  It is MUCH more efficient - I'm suprised than the distribution
version doesn't do something similar.

-- 
Jon Zeeff (NIC handle JZ)	 zeeff@b-tech.ann-arbor.mi.us

brian@ucsd.Edu (Brian Kantor) (09/24/90)

In article <8?P*Z-=@b-tech.uucp> zeeff@b-tech.ann-arbor.mi.us (Jon Zeeff) writes:
>I've modified a version of nntpxfer that batches  a number of articles
>into memory and then feeds it to relaynews.  Others have done similar
>things.  It is MUCH more efficient - I'm suprised than the distribution
>version doesn't do something similar.

That's because nntpxfer is a hack kluge and wasn't really ever supposed
to be used in production systems.  I can say that: I wrote it.

Batch transmission will be supported in NNTP v2.
	- Brian

jerry@olivey.olivetti.com (Jerry Aguirre) (09/25/90)

In article <1990Sep23.000826.15925@zoo.toronto.edu> henry@zoo.toronto.edu (Henry Spencer) writes:
>The underlying problem is that NNTP does not take notice of the fact that
>transferring and processing articles a batch at a time is much more
>efficient than doing them one at a time.  The UUCP community learned that
>most of a decade ago, but the Internet community has generally been
>reluctant to admit that UUCP has anything to teach them.

Henry,

I think this is more a case of a square peg and a round hole rather
than NIH.  UUCP in inherently a batched operation in the sense that one
submits a job and it is processed sometime later.  That is what makes
the ihave/sendme protocol so inefficient for UUCP connections.

Adding batches, in the sense of multiple articles sent as one file, is
a natural optimization with only minor disadvantages.  The use of
compression makes batching for UUCP even more appealing.

The type of transmission used for NNTP establishes a real time
connection.  This allows for the potential to virtually eliminate the
wasted overhead of sending again an article already on the receivers
system.  NNTP lends itself to multiple feeds for a number of reasons
and the number of duplicates grows proportionally.  (It is the norm
rather than the exception for me to see the same article being offered
multiple times within a few seconds.

With B news there was little advantage to using batches (in the news
processing itself).  For most IP network connections there is little
advantage to using compression.  Therefor the advantages of eliminating
the extra overhead of the duplicate copies very much outwayed the
almost nonexistant advantages of batching.  The release of C news may
have shifted that balance but it is hardly fair to critizise NNTP
because you changed the rules.  NNTP doesn't break when one uses
batching, it just looses a couple of its advantages.  Some of us just
happen to think they are important advantages.

The performace advantage of C news seems to rest primarily on a
deliberate delaying of the processing and retransmission of news
articles.  This is at odds with the goal of many NNTP developers who
wanted to reduce the propagation delay of articles.  Obviously we are
dealing with different design goals here.  A leaf UUCP site that is
short on CPU cycles is going to have a different set of requirements
than a NNTP site with 10 neighbors and a faster CPU.  If I were such a
leaf site I would have converted to C news long ago.  As it is I am
still waiting for that "seam" to become less obvious.

				Jerry Aguirre

henry@zoo.toronto.edu (Henry Spencer) (09/25/90)

In article <49453@olivea.atc.olivetti.com> jerry@olivey.olivetti.com (Jerry Aguirre) writes:
>The type of transmission used for NNTP establishes a real time
>connection...

There seems to be a general illusion that real-time connections are exempt
from considerations of efficiency.  With the volume of news we currently
see, this is not true.  Real-time or not, the most efficient way to
transfer news is to pump data bytes, in bulk, from one end to the other,
without control handshaking or other time-wasting complications interspersed.
Rev 2 of NNTP includes a batching protocol for this.

Our reaction to the way a lot of NNTP sites currently do their news
transmission is roughly:  "Jesus, are they all running on Crays?!?".
The waste of resources is mind-boggling.  We wish we could afford to
squander so many cycles on ruinously inefficient transmission methods;
it would make life a lot easier.

>The performace advantage of C news seems to rest primarily on a
>deliberate delaying of the processing and retransmission of news
>articles.  This is at odds with the goal of many NNTP developers who
>wanted to reduce the propagation delay of articles...

I confess that we fail to understand why some of the NNTP folks are so
obsessed with propagating talk.religion in seconds rather than minutes.
However, this need not imply a contradiction with C News's philosophy
of processing in bulk for efficiency.  You just have to do things more
cleverly to combine the two.  Work is in progress on this.
-- 
TCP/IP: handling tomorrow's loads today| Henry Spencer at U of Toronto Zoology
OSI: handling yesterday's loads someday|  henry@zoo.toronto.edu   utzoo!henry

jerry@olivey.olivetti.com (Jerry Aguirre) (09/26/90)

In article <1990Sep25.153101.2437@zoo.toronto.edu> henry@zoo.toronto.edu (Henry Spencer) writes:
>
>There seems to be a general illusion that real-time connections are exempt
>from considerations of efficiency.  With the volume of news we currently
>see, this is not true.  Real-time or not, the most efficient way to
>transfer news is to pump data bytes, in bulk, from one end to the other,
>without control handshaking or other time-wasting complications interspersed.
>Rev 2 of NNTP includes a batching protocol for this.

Henry,

It is a pretty convincing illusion.  Certainly the NNTP/network
connections transfer a lot more news with less CPU load than UUCP
had.  The nntpxmit asks if the receiver has a particular message
ID and it the answer is no it sends it.  Granted there are turn around
delays but they effect thruput not system or network load.  On most
systems the serial input interrupts for each character.  The typical
network card interupts on the packet resulting in an order of magnitude
less overhead.  My experience with running both UUCP and NNTP bears out
this theoretical conclusion.  A uucico can be hogging 50% of the system
while a nntpd is transferring twice the articles and is not even in the
top 10 processes.

Just how do you propose to prevent massive transmission of duplicates if
"rnews" squirels away the articles without updating the history file?
I seriouly want to know your philosophy on this.

>Our reaction to the way a lot of NNTP sites currently do their news
>transmission is roughly:  "Jesus, are they all running on Crays?!?".

I manage a full feed quite nicely and without a Cray.  Even a 0.69 MIPs
B news system can handle multiple full NNTP feeds (if it wern't for the
damn UUCP connections).

>However, this need not imply a contradiction with C News's philosophy
>of processing in bulk for efficiency.  You just have to do things more
>cleverly to combine the two.  Work is in progress on this.

Glad to hear it.

				Jerry Aguirre

I.G.Batten@fulcrum.bt.co.uk (Ian G Batten) (09/26/90)

jerry@olivey.olivetti.com (Jerry Aguirre) writes:
> It is a pretty convincing illusion.  Certainly the NNTP/network
> connections transfer a lot more news with less CPU load than UUCP
> had.  The nntpxmit asks if the receiver has a particular message

This is no advert for NNTP, merely a statement that UUCP over serial
lines is an IO bandwidth hog in a way that TCP isn't.  My newsfeed is
via a 64K leased line, which replaced 2K4 modems and 2K4 X25.  Since I
already had UUCP over TCP running here for local purposes, we initially
ran our existing 100K 8-bit compressed batches over ``e'' protocol.
This screamed, and the inbound uucico comsumed almost no resources.  I
then switched to NNTP for reasons of modernity and suddenly found
performance going through the floor, with the nntpd consuming
significant resources.  I'm essentially a leaf site, so I rarely get an
article presented more than once.

I now run faster with bizarre tweaks to the NNTP batching, but often
think that UUCP over TCP would be neat to go back to...

ian

henry@zoo.toronto.edu (Henry Spencer) (09/27/90)

In article <49460@olivea.atc.olivetti.com> jerry@olivey.olivetti.com (Jerry Aguirre) writes:
>>There seems to be a general illusion that real-time connections are exempt
>>from considerations of efficiency...
>
>It is a pretty convincing illusion.  Certainly the NNTP/network
>connections transfer a lot more news with less CPU load than UUCP
>had...

This is more a function of the nature of the hardware -- typically doing
a packet at a time rather than a character at a time -- than of the
protocol, I would say.  I'm told the costs of NNTP are *not* trivial.
And processing articles one at a time is massively inefficient, even
if the inefficiency is spread out so it's not so noticeable.  Your
machine is being gnawed to death by mice rather than trampled by an
elephant, but it's still losing just as much blood.

>Just how do you propose to prevent massive transmission of duplicates if
>"rnews" squirels away the articles without updating the history file?
>I seriouly want to know your philosophy on this.

There are several tactics on this one.  The one I would personally favor
is to keep a record of received-but-not-yet-processed articles separate
from the history file -- they *are* on disk, so there is no reliability
impact from not processing them immediately -- but I haven't had a chance
to experiment with this yet.
-- 
TCP/IP: handling tomorrow's loads today| Henry Spencer at U of Toronto Zoology
OSI: handling yesterday's loads someday|  henry@zoo.toronto.edu   utzoo!henry