[comp.sources.wanted] Looking for a netnews archiver program

marcos@caus-dp.UUCP (Marcos R. Della) (11/26/87)

I am looking for any program out there that will selectivly archive and
compress news articles that come over the net, taking the sections to
archive from a file. I'm looking not to re-invent the wheel as there is
obviously something out there that several other sites are using... Better
to use someone elses proven code than to  create my own bugs inhouse!
 
Marcos Della

lenny@icus.UUCP (11/28/87)

In article <303@caus-dp.UUCP> marcos@caus-dp.UUCP (Marcos R. Della) writes:
>I am looking for any program out there that will selectivly archive and
>compress news articles that come over the net, taking the sections to
>archive from a file. I'm looking not to re-invent the wheel as there is
>obviously something out there that several other sites are using... Better
>to use someone elses proven code than to  create my own bugs inhouse!
> 
>Marcos Della

Probably the easiest way to do this is to create a entry in your "sys"
file for news unbatching:

archive:<newsgroup1>,<newsgroup2>,<newsgroupN>::cat >> /usr/spool/news/Archived

When the news program "rnews -U" is executed, any groups that you want will
be sent to "archive" and hence concatenated on the end of the file 
/usr/spool/news/Archived for later use.

							-Lenny
-- 
============================ US MAIL:   Lenny Tropiano, ICUS Computer Group
 IIIII   CCC   U   U   SSSS             PO Box 1
   I    C   C  U   U  S                 Islip Terrace, New York  11752
   I    C      U   U   SSS   PHONE:     (516) 968-8576 [H] (516) 582-5525 [W] 
   I    C   C  U   U      S  AT&T MAIL: ...attmail!icus!lenny  TELEX: 154232428
 IIIII   CCC    UUU   SSSS   UUCP:
============================       ...{uunet!godfre, mtune!quincy}!\
               ...{ihnp4, boulder, harvard!talcott, skeeve, ptsfa}! >icus!lenny 
"Usenet the final frontier"        ...{cmcl2!phri, hoptoad}!dasys1!/

wmp@vaxine.UUCP (Wayne Power) (11/30/87)

In article <61@icus.UUCP> lenny@icus.UUCP (Lenny Tropiano) writes:
>In article <303@caus-dp.UUCP> marcos@caus-dp.UUCP (Marcos R. Della) writes:
>>I am looking for any program out there that will selectivly archive and
>>compress news articles that come over the net, taking the sections to
>
Lenny responds:
>Probably the easiest way to do this is to create a entry in your "sys"
>file for news unbatching:
>
>archive:<newsgroup1>,<newsgroup2>,<newsgroupN>::cat >> /usr/spool/news/Archived

Expire will also archive selected newsgroups in a parallel tree.  Methinks the
key phrase here is "selectivly archive news articles".  The signal to noise
ratio in the best of newsgroups leaves something to be desired.  Both Lenny's
approach and expire -a will archive entire newsgroups, not selected articles.
Additionally, you don't have much of a handle on your archives with either
approach.  You'll have some trudging to do if you want to find a selected
posting.

I've considered writing a program that would eat news articles and put them
into a news archive, leaving some header information in a data base so users
could peruse the archives and select articles for extraction.  Given my work
load, I wouldn't hold my breath.

To echo Marcos' query, has anybody out there got one?

--wmp

dave@galaxia.zone1.com (David H. Brierley) (12/06/87)

In article <705@vaxine.UUCP> wmp@vaxine.cs.ulowell.edu (Wayne Power) writes:
>In article <61@icus.UUCP> lenny@icus.UUCP (Lenny Tropiano) writes:
>>In article <303@caus-dp.UUCP> marcos@caus-dp.UUCP (Marcos R. Della) writes:
>>>I am looking for any program out there that will selectivly archive and
>>>compress news articles that come over the net, taking the sections to
>>
>
>I've considered writing a program that would eat news articles and put them
>into a news archive, leaving some header information in a data base so users
>could peruse the archives and select articles for extraction.  Given my work
>load, I wouldn't hold my breath.
>
>To echo Marcos' query, has anybody out there got one?

To answer the query, yes.  I have been working on exactly this problem on
and off for a while now.  My original intention was to create a mail based
archive server for use by the network community.  Unfortunately, my manager
decided our phone bills were too high and told me not to continue the project.
The project was recently brought back to life to allow archive access for
people within the company.  There is a possibility of opening up access to
the archives in the future but I wouldnt count on it.  What I can do is
release the software once it's complete.  A brief description of the software
follows:

- When articles are received for the specified groups (controlled by the
  sys file) they are placed into the archive.  The article itself is
  compressed and stored as YY.MM/NNN, where YY, MM, and NNN are the year,
  month, and unique id number.  The id number, article size, author, and
  subject are stored in a log file called GROUP/YY.MM where GROUP is the
  name of the newsgroup.  For cross-posted articles, a log file entry is
  made for each group.

- A mail-based archive server program is also provided which can return a
  canned help message, a list of the groups being archived, an index listing
  of a specified group, or a specific article.  The newsgroup index listing
  can be qualified by either author or subject (or both) using regular
  expressions.

The server program can be run in either an unrestricted mode or a restricted
mode.  In restricted mode, in order to retrieve an article the user must be
listed in a special validation file.  This was required for my environment
since I am forced to restrict access to people within the company but it's
very hard to prevent people from mailing to any address (or alias) that they
want to.

Anyway, the software is about 98% complete.  If anyone wants it let me know.
If there is enough demand I can post it.

Oh, one last thing.  In order to maintain the capacity of our disks, the
program allows previous months archives to be moved to tape and will
(semi-) automatically fetch an article from the tape if someone sends in
a reuquest for it.  The semi-automatic tape retrieval is the only part
that isn't completely working yet.

I know that other sites have various forms of archiving software but I am
partial to this one for several reasons.  First, the archive is maintained
compressed to conserve disk space.  Second, for the most part it is
completely automatic and requires very little attention.  Third, I wrote it.
-- 
David H. Brierley
Home: dave@galaxia.zone1.com	{cbosgd,gatech,necntc,ukma}!rayssd!galaxia!dave
Work: dhb@rayssd.ray.com	{cbosgd,gatech,necntc,ukma}!rayssd!dhb

barnett@vdsvax.steinmetz.UUCP (Bruce G Barnett) (12/09/87)

In article <705@vaxine.UUCP> wmp@vaxine.cs.ulowell.edu (Wayne Power) writes:
|To echo Marcos' query, has anybody out there got one?

I have two programs from the net that I use.
I have modified both of them to suit my needs.

First of all, expire will store articles in the same format as the
spool directory. You can specify which articles to archive.

I archive some newsgroups automatically with this.

The second method I use - which keeps a particular article - one I
don't usually archive - is with a program called keepnews.

While reading an article with rn, I type "|keepnews" .
This saves the article, and also modifies two log files.
One in the newsgroup directory, one on the top level.

I store them in the same directory as the expire'd articles.

I then use a program distributed with the USENET software.
It is either called savenews or keepnews. It was written by Chuq.
What this program does is strip off some of the headers, and store the
file in the following format:

	newsgroup/mm-yy/article-id

It also stores a one line summary in the file

	LOGS/newsgroup

I occasionally go thru and compress the older directories.
This works well for large archives.

Check out the news source directory for the above program.
I can send you the diff's of my version, if you wish.
It only works with BSD filenames - as far as I know.
-- 
	Bruce G. Barnett 	<barnett@ge-crd.ARPA> <barnett@steinmetz.UUCP>
				uunet!steinmetz!barnett