[news.software.nntp] improved nntpxfer

urlichs@smurf.sub.org (Matthias Urlichs) (03/15/90)

In news.software.nntp, article <X?#$?!*@b-tech.uucp>,
  zeeff@b-tech.ann-arbor.mi.us (Jon Zeeff) writes:
< I wanted to use nntpxfer, so I made some improvements - basically more 
< efficient reads and batching of articles into memory and then popening 
< rnews when the buffer gets full.  Let me know if you want to beta test 
< a copy.  
< 
I decided to go the simpler route and, instead of popen()ing inews, just
create a temporary file and rename it to "/usr/spool/news/xfer.%d.%d",
getpid(), counter++ (and let the next rnews -U deal with it).

Another improvement is to open two NNTP channels to your favorite server. On
one, you do your NEWNEWS, and the other is used to fetch articles as soon as
their IDs come in over the first channel.
This is necessary on some low-speed Internet links like ours (which frequently
makes nntpd time out, drops connections, and other fun stuff) and basically
enabled us to get 24 hours of Usenet traffic in 14 hours instead of 30.

I'd like to convert this to a somewhat better C programming style before
letting the rest of the world see it, though...

-- 
Matthias Urlichs

zeeff@b-tech.ann-arbor.mi.us (Jon Zeeff) (03/19/90)

>< I wanted to use nntpxfer, so I made some improvements - basically more 
>< efficient reads and batching of articles into memory and then popening 
>< rnews when the buffer gets full.  Let me know if you want to beta test 
>< a copy.  
>< 
>I decided to go the simpler route and, instead of popen()ing inews, just
>create a temporary file and rename it to "/usr/spool/news/xfer.%d.%d",
>getpid(), counter++ (and let the next rnews -U deal with it).

This does seem slightly simpler, but I'd be concerned about how often
rnews -U gets run.  You want to keep the delays to a minimum.  Also, some
systems (like this one), don't have an rnews -U or anything equivalent.


>Another improvement is to open two NNTP channels to your favorite server. On
>one, you do your NEWNEWS, and the other is used to fetch articles as soon as
>their IDs come in over the first channel.
>This is necessary on some low-speed Internet links like ours (which frequently
>makes nntpd time out, drops connections, and other fun stuff) and basically
>enabled us to get 24 hours of Usenet traffic in 14 hours instead of 30.
>

In my experience, once the id's start coming over, they all make it pretty 
quickly.  You still have to use the old "last success time" if the second
connection fails, meaning you have to start over again on the next connection.

Another solution would be to allow multiple lines in the 
/usr/spool/news/nntp.site file and have nntpxfer cycle through them.  You
could then break it up a bit, eg:

rec time time
soc time time
comp time time

urlichs@smurf.sub.org (Matthias Urlichs) (03/19/90)

In news.software.nntp, article <+&*$DG_@b-tech.uucp>,
  zeeff@b-tech.ann-arbor.mi.us (Jon Zeeff) writes:
< Matthias Urlichs wrote:

< >I decided to go the simpler route and, instead of popen()ing inews, just
< >create a temporary file and rename it to "/usr/spool/news/xfer.%d.%d",
< >getpid(), counter++ (and let the next rnews -U deal with it).
< 
< This does seem slightly simpler, but I'd be concerned about how often
< rnews -U gets run.  You want to keep the delays to a minimum.  Also, some
< systems (like this one), don't have an rnews -U or anything equivalent.
< 
Delays may be a problem on some sites, but ours is not one of them.
After all, nntpxfer is restarted here every hour, and takes 30 minutes to
complete. rnews -U is run every ten minutes or so, and nntpxmits to other
sites are run every hour. The additional delay doesn't seem to be significant.

But you're right, faster is better, and I'll think about it.

< >Another improvement is to open two NNTP channels to your favorite server. On
< >one, you do your NEWNEWS, and the other is used to fetch articles as soon as
< >their IDs come in over the first channel.
< >This is necessary on some low-speed Internet links like ours (which frequently
< >makes nntpd time out, drops connections, and other fun stuff) and basically
< >enabled us to get 24 hours of Usenet traffic in 14 hours instead of 30.
< 
< In my experience, once the id's start coming over, they all make it pretty 
< quickly.  You still have to use the old "last success time" if the second
< connection fails, meaning you have to start over again on the next connection.
< 
Your experience does not include 9600 baud Internet links which typically
semi-hang an incoming data stream after the a few kByte (BSD 4.2 TCP back-off
code raises its ugly head here).

You can avoid starting over by examining the Date: field of incoming articles.
(I didn't do this yet, because said backing-off does not seem to be a
problem.) I would maintain something like a floating average on all headers
with dates between the starting and the current time. When the transfer fails,
rewrite the access file with that time minus a few hours or so.

< Another solution would be to allow multiple lines in the 
< /usr/spool/news/nntp.site file and have nntpxfer cycle through them.  You
< could then break it up a bit, eg:
< 
< rec time time
< soc time time
< comp time time

This did not seem to work as well, mainly because the other side had a big
history file and the time between the NEWNEWS command and the IDs becomes
somewhat significant.
-- 
Matthias Urlichs