[news.software.nntp] nntpxfer

loverso@Xylogics.COM (John Robert LoVerso) (04/12/90)

In article <G&L$Z=$@b-tech.uucp> zeeff@b-tech.ann-arbor.mi.us (Jon Zeeff) writes:
> >... nntpxfer ...
> >This puppy is slow.  It took about 3 hours to fetch 4K articles, and it
> 
> I have some fixes for these things that will hopefully be included in a 
> future nntp release.  Until then, copies are available on request.

After seeing how slow it really was, I just changed it to just do
the NEWNEWS and return a list of message-ids I need.  You can then
just turn that into a sendme control message to the other host.
Thus, the articles will get to your machine using standard whatever
xmit channel you already use (nntpxmit/nntplink/etc).

As an aside, but I just found out that this is an easy way to
retrieve articles that the nntp access file won't let you have.
I.e., a "newnews *" will list message-ids for all articles, even
if you are not allowed to get transfer them.  However, that
restriction doesn't exist on sendme control messages.  Of course,
this only works against sites that feed you news to begin with...

John
-- 
John Robert LoVerso			Xylogics, Inc.  617/272-8140 x284
loverso@Xylogics.COM			Annex Terminal Server Development Group

urlichs@smurf.sub.org (Matthias Urlichs) (04/13/90)

In news.software.nntp, article <8876@xenna.Xylogics.COM>,
  loverso@Xylogics.COM (John Robert LoVerso) writes:
< In article <G&L$Z=$@b-tech.uucp> zeeff@b-tech.ann-arbor.mi.us (Jon Zeeff) writes:
< > >... nntpxfer ...
< > >This puppy is slow.  It took about 3 hours to fetch 4K articles, and it
< > 
< > I have some fixes for these things that will hopefully be included in a 
< > future nntp release.  Until then, copies are available on request.
< 
< After seeing how slow it really was, I just changed it to just do
< the NEWNEWS and return a list of message-ids I need.  You can then
< just turn that into a sendme control message to the other host.
< Thus, the articles will get to your machine using standard whatever
< xmit channel you already use (nntpxmit/nntplink/etc).
Assuming you already have one. But once you have the articles, nntpxfer may be
actually faster than xmit because it only needs one request-reply interaction
instead of two. Assuming you don't have to lower-case the ID in order to get
the article -- see below.
< 
There are a whole bunch of things you can do to nntpxfer if you want to speed
 things up, and/or just feel inclined to add some features:
- Use alarm()/signal() instead of select() and reading one character at a
  time.
- Use fdopen() and fgets/fputs if your stdio library lets you.
- Ignore 5xx results on ARTICLE requests -- some sites say 5xx if you want to
  access an article which local policy forbids you to get.
- Open two channels concurrently -- one to get the IDs and one to concurrently
  get the data. (This will make nntpxfer faster than nntpxmit on lines with
  large ping times.)
- Use signal() to block the SIGPIPE you get when the forked inews aborts
  before reading the whole article.
- Drop the buggers into files instead of forking every time.
- Ask for the article by the original ID. If the other side doesn't have that,
  convert the message-ID's post-@ part to lower case (which is the primitive
  version of what RFC822 says about this topic). If that resulted in a change,
  ask again. If the article is still not present, lowercase the whole
  message-ID and ask, again only if you did change something.
  (The _current_ nntpd code suggests that it should be sufficient either just
   to rfc822ize the ID, or to leave it alone, but this seems not always to work.
   Anyone know for sure?)
- Batch articles by calling the batching code (../server/batch.c). This
  involves modifying batch.c to (a) optionally read from somewhere other than
  stdin and (b) don't say 2xx "Give me the article" in case of (a).
- Log via (fake)syslog.
- Make -d an incremental swicth instead of a toggle, and make the
  debug-printing logic somewhat more clever.
- Log progress into an almost-temporary file which also serves a lock to make
  sure that no two nntpxfers concurrently access the same site. Rename this
  file when done, so that one can see why the last xfer failed while watching
  the current one die. :-(
- Control all of the above with some new option letters.

I've done most of these -- it's not much work, but my code is really ugly
  right now. (You thought it was ugly before? Ha!) It also lacks a whole lot
  of error checking, safe termination, and the aforementioned alarm()-type
  stuff isn't even tested yet.
Unfortunately I don't have time to prettify it all -- I'll mail my version to
  anyone who wants to do that.

Nntpxfer is now reasonably fast and about the only dead time is spent in
  forking newsrun(C) or rnews -U (B). Besides waiting for the negative
  responses of ARTICLE commands, of course. ;-) :-(

Next project: Convincing the NEWNEWS code to not report IDs of expired
  articles. This is harder because it's not my nntpd which has that problem
  but the machines' we xfer news from. Not good.

Aside: Does anyone keep at least two weeks of News online? I'd like to have a
  backup to xfer our news from if our Internet link drops dead again,
  as it did last week...
-- 
Matthias Urlichs

loverso@Xylogics.COM (John Robert LoVerso) (04/13/90)

I should have been clearer...  NEWNEWS has a security hole in that it
can advertise message-ids of articles that the other end is restricted from
getting via the access file.  LIST does the same thing.  They should both
filter their output based upon restricted groups.

John