[news.newusers.questions] How can I access USENET articles database on my computer ?

aaron@cbnewsh.ATT.COM (aaron.michael.chesir) (02/08/90)

Here's a question for real UNIX users....

How can I access the vast storage of USENET articles on my host computer
without running the USENET articles reader programs (i.e. readnews, vnews,
etc.) ? I would love to be able to run a program that searches a particular
newsgroup for all articles whose header contains a key word, etc.

Is this possible ? How can I do it ?


Any response will be appreciated.


Aaron Michael Chesir
AT&T Bell Laboratories
Room HO3C206
..att!twitch!aaron

wcs) (02/08/90)

In article <8000@cbnewsh.ATT.COM> aaron@cbnewsh.ATT.COM (aaron.michael.chesir) writes:
]How can I access the vast storage of USENET articles on my host computer
]without running the USENET articles reader programs (i.e. readnews, vnews,
]etc.) ? I would love to be able to run a program that searches a particular
]newsgroup for all articles whose header contains a key word, etc.

I'm on the same machine as you are, but this kind of stuff was
useful to me when I was a new user, so I'm posting it ...

There are two main ways netnews gets passed around:
- The B News / C News method
- The NNTP Network News Transfer Protocol.

In B/C News, all news articles that a machine might want get shipped to
that machine, using whatever network is around (uucp, ftp, etc.).
The news system stores each article in a file, and keeps a database
of what articles it has around (file name, message id, age, title).
News reader programs keep track of what artilces you've read, and
use the news system databases and the spool directory of articles.

Where the databases and articles are kept depends on where your
adminstrator feels like putting them, but the typical locations are
/usr/lib/news for the databases and /use/spool/news for the articles.
Subdirectories under the spool directory correspond to newsgroups:
this article is a file in /usr/spool/news/news/announce/newusers.
This is the method your machine uses, so this is where to grep.
( Article numbers are different on each machine.  The command "hgrep"
is a grep that only greps article headers and skips the body - much faster.)

NNTP is a different approach, designed for use in a TCP/IP network.
The theory is that, if you've got high-speed, mostly-reliable networks,
you don't need to have everybody keep a copy of every article
whether they want it or not.  I don't know much about NNTP because
the news machine I used to run for my department was a leftover 3B2
that didn't have an Ethernet board, and NNTP was just coming out then.
Essentially, you'll have a server machine that feeds a bunch of others,
which distributes a certain amount of the database of what articles it has.
When a client machine wants an article (because some user wants to
read it), it retrieves the article from the server.  I don't know if
the client keeps the article around for a while or not.
NNTP is much more efficient, and is the way much of the Internet gets
its news, but requires teaching the newsreaders how to use it.
NNTP also has been ported to networks like AT&T's Datakit.

An intermediate approach is to mount /usr/spool/news over a remote
file system like RFS or NFS, with a few hacks to inews to make
outgoing news do the right thing.  This is less efficient than NNTP
(being an NFS server is more work than an NNTP server),
but is pretty transparent.
-- 
# Bill Stewart AT&T Bell Labs 4M312 Holmdel NJ 201-949-0705 erebus.att.com!wcs

# ho95c has gone the way of all VAX/785s, so I'm now on erebus.att.com

packer@chrpserv.gsfc.nasa.gov (Charles Packer) (02/10/90)

The previous article in this thread described the two methods of
organizing the Usenet news process.

The disadvantage of the NNTP method is that the message database is
not available for perusal by means other than the news reader,
since the files reside on a different computer than the user's.

At least one progammer on the net is developing a modified version
of the nntp reader that simply sucks messages off the server into
your computer. 

tale@cs.rpi.edu (David C Lawrence) (02/10/90)

In <8006@cbnewsh.ATT.COM> wcs@cbnewsh.ATT.COM (Bill Stewart) writes:
> There are two main ways netnews gets passed around:
> - The B News / C News method
> - The NNTP Network News Transfer Protocol.

This is an inaccurate way to characterise the system.  NNTP is meant
to be exclusive of the resident news system.  The above makes it sound
as though you can't use NNTP with either of the above systems, which
is wholly fallacious.

The rest of this article will address things in the Unix world, since
as you mention these are the main news machines.  There are many other
methods of storage/retrieval/transport in use.

> In B/C News, all news articles that a machine might want get shipped to
> that machine, using whatever network is around (uucp, ftp, etc.).

If anyone at all is using FTP to regularly pass news as part of a
feed, I'd really like to hear more about it.  Neither uucp nor ftp are
strictly networks, by the way.

> (Article numbers are different on each machine.  

Well, mostly true.  Depends on how you want to semantically look at
this.  I would say "at each site" and call the amalgomation of all
machines which share a single news database one site.

> The command "hgrep" is a grep that only greps article headers and
> skips the body - much faster.)

It is also a non-standard grep.  While easy enough to write, it is not
available on many, many machines.  Certainly a much smaller number
than have grep.

> NNTP is a different approach, designed for use in a TCP/IP network.

It is not a different approach to storage and feed arrangement.  It is
only a different way of passing articles over the line.  When RPI had
a couple of different machines hosting news, nearly all used NNTP to
get thier news.  One also ran a small UUCP feed.  All of them used B
News.  Since we centralised I also moved us to C News.  rpi.edu feeds
and receives all of its news, even locally posted articles, via NNTP.

> Essentially, you'll have a server machine that feeds a bunch of others,
> which distributes a certain amount of the database of what articles it has.

One very important thing to remember is that NNTP does double duty as
both a transport mechanism between feeds and as a database retrieval
system for news readers.

> When a client machine wants an article (because some user wants to
> read it), it retrieves the article from the server.  I don't know if
> the client keeps the article around for a while or not.

It depends very much on the client.  If the client is another news
site, it keeps the article around until it expires it.  If it is a
news reader then it is up to the reader how it wants to interact with
the server and its user.  Most common NNTP readers now only keep the
full text of one article at any given time.

> NNTP is much more efficient, and is the way much of the Internet gets
> its news, but requires teaching the newsreaders how to use it.

Not necessarily, but you've covered that base in the next paragraph.

> An intermediate approach is to mount /usr/spool/news over a remote
> file system like RFS or NFS, with a few hacks to inews to make
> outgoing news do the right thing.  This is less efficient than NNTP
> (being an NFS server is more work than an NNTP server), but is
> pretty transparent.

rpi.edu uses both methods.  rpi.edu:/usenet is exported read-only via
NFS to anyone on campus that wants it and can take it.  All of the
USENET operation for that machine as a server is kept under /usenet,
including the source code, spool files, client support for various
architectures, and administrative tools.  Readers can use either NFS
or NNTP.

In <859@dftsrv.gsfc.nasa.gov> packer@chrpserv.gsfc.nasa.gov (Charles Packer):

> The disadvantage of the NNTP method is that the message database is
> not available for perusal by means other than the news reader,
> since the files reside on a different computer than the user's.

If the reader is up to the challenge then it should be better than
anything except truly local spooling for most scanning (as grep(1)
would do) and paging (as more(1) would do).  I basically agree with
you though; if I have the spool mounted on my machine I would (and do)
occasionally go right to it in a shell rather than bothering to fire
up my news reader.

> At least one progammer on the net is developing a modified version
> of the nntp reader that simply sucks messages off the server into
> your computer. 

Does he know about the existing nntpxfer which comes as part of the
NNTP distribution?  While admitted by its author to be a hack, it does
exactly that.
-- 
   (setq mail '("tale@cs.rpi.edu" "tale@ai.mit.edu" "tale@rpitsmts.bitnet"))
               "Nice plant.  Looks like a table cloth."

eps@toaster.SFSU.EDU (Eric P. Scott) (02/10/90)

In article <859@dftsrv.gsfc.nasa.gov> packer@chrpserv.gsfc.nasa.gov
	(Charles Packer) writes:
>The disadvantage of the NNTP method is that the message database is
>not available for perusal by means other than the news reader,
>since the files reside on a different computer than the user's.

The advantage of the NNTP method is that the message database is
not directly accessible; this allows an enormous quantity of
data to be stored in interesting ways, taking advantage of
archiving/compression methods where appropriate--and requiring no
changes to, or sophistication in, client software.  It may be
quite impossible to make "sense" of the actual article storage in
such a system.

					-=EPS=-

jill@tank.uchicago.edu (jill holly hansen) (02/13/90)

In article <8000@cbnewsh.ATT.COM> aaron@cbnewsh.ATT.COM (aaron.michael.chesir) writes:
:
:Here's a question for real UNIX users....
:

I knew there was something wrong when I woke up this morning; I was
an unreal unix user. 




-- 
========================================================================
     Jill Hansen               | Can you imagine what this world be like
jill@tank.uchicago.edu         | if God's operating system were Unix?