marcos@caus-dp.UUCP (Marcos R. Della) (11/26/87)
I am looking for any program out there that will selectivly archive and compress news articles that come over the net, taking the sections to archive from a file. I'm looking not to re-invent the wheel as there is obviously something out there that several other sites are using... Better to use someone elses proven code than to create my own bugs inhouse! Marcos Della
lenny@icus.UUCP (11/28/87)
In article <303@caus-dp.UUCP> marcos@caus-dp.UUCP (Marcos R. Della) writes: >I am looking for any program out there that will selectivly archive and >compress news articles that come over the net, taking the sections to >archive from a file. I'm looking not to re-invent the wheel as there is >obviously something out there that several other sites are using... Better >to use someone elses proven code than to create my own bugs inhouse! > >Marcos Della Probably the easiest way to do this is to create a entry in your "sys" file for news unbatching: archive:<newsgroup1>,<newsgroup2>,<newsgroupN>::cat >> /usr/spool/news/Archived When the news program "rnews -U" is executed, any groups that you want will be sent to "archive" and hence concatenated on the end of the file /usr/spool/news/Archived for later use. -Lenny -- ============================ US MAIL: Lenny Tropiano, ICUS Computer Group IIIII CCC U U SSSS PO Box 1 I C C U U S Islip Terrace, New York 11752 I C U U SSS PHONE: (516) 968-8576 [H] (516) 582-5525 [W] I C C U U S AT&T MAIL: ...attmail!icus!lenny TELEX: 154232428 IIIII CCC UUU SSSS UUCP: ============================ ...{uunet!godfre, mtune!quincy}!\ ...{ihnp4, boulder, harvard!talcott, skeeve, ptsfa}! >icus!lenny "Usenet the final frontier" ...{cmcl2!phri, hoptoad}!dasys1!/
wmp@vaxine.UUCP (Wayne Power) (11/30/87)
In article <61@icus.UUCP> lenny@icus.UUCP (Lenny Tropiano) writes: >In article <303@caus-dp.UUCP> marcos@caus-dp.UUCP (Marcos R. Della) writes: >>I am looking for any program out there that will selectivly archive and >>compress news articles that come over the net, taking the sections to > Lenny responds: >Probably the easiest way to do this is to create a entry in your "sys" >file for news unbatching: > >archive:<newsgroup1>,<newsgroup2>,<newsgroupN>::cat >> /usr/spool/news/Archived Expire will also archive selected newsgroups in a parallel tree. Methinks the key phrase here is "selectivly archive news articles". The signal to noise ratio in the best of newsgroups leaves something to be desired. Both Lenny's approach and expire -a will archive entire newsgroups, not selected articles. Additionally, you don't have much of a handle on your archives with either approach. You'll have some trudging to do if you want to find a selected posting. I've considered writing a program that would eat news articles and put them into a news archive, leaving some header information in a data base so users could peruse the archives and select articles for extraction. Given my work load, I wouldn't hold my breath. To echo Marcos' query, has anybody out there got one? --wmp
dave@galaxia.zone1.com (David H. Brierley) (12/06/87)
In article <705@vaxine.UUCP> wmp@vaxine.cs.ulowell.edu (Wayne Power) writes: >In article <61@icus.UUCP> lenny@icus.UUCP (Lenny Tropiano) writes: >>In article <303@caus-dp.UUCP> marcos@caus-dp.UUCP (Marcos R. Della) writes: >>>I am looking for any program out there that will selectivly archive and >>>compress news articles that come over the net, taking the sections to >> > >I've considered writing a program that would eat news articles and put them >into a news archive, leaving some header information in a data base so users >could peruse the archives and select articles for extraction. Given my work >load, I wouldn't hold my breath. > >To echo Marcos' query, has anybody out there got one? To answer the query, yes. I have been working on exactly this problem on and off for a while now. My original intention was to create a mail based archive server for use by the network community. Unfortunately, my manager decided our phone bills were too high and told me not to continue the project. The project was recently brought back to life to allow archive access for people within the company. There is a possibility of opening up access to the archives in the future but I wouldnt count on it. What I can do is release the software once it's complete. A brief description of the software follows: - When articles are received for the specified groups (controlled by the sys file) they are placed into the archive. The article itself is compressed and stored as YY.MM/NNN, where YY, MM, and NNN are the year, month, and unique id number. The id number, article size, author, and subject are stored in a log file called GROUP/YY.MM where GROUP is the name of the newsgroup. For cross-posted articles, a log file entry is made for each group. - A mail-based archive server program is also provided which can return a canned help message, a list of the groups being archived, an index listing of a specified group, or a specific article. The newsgroup index listing can be qualified by either author or subject (or both) using regular expressions. The server program can be run in either an unrestricted mode or a restricted mode. In restricted mode, in order to retrieve an article the user must be listed in a special validation file. This was required for my environment since I am forced to restrict access to people within the company but it's very hard to prevent people from mailing to any address (or alias) that they want to. Anyway, the software is about 98% complete. If anyone wants it let me know. If there is enough demand I can post it. Oh, one last thing. In order to maintain the capacity of our disks, the program allows previous months archives to be moved to tape and will (semi-) automatically fetch an article from the tape if someone sends in a reuquest for it. The semi-automatic tape retrieval is the only part that isn't completely working yet. I know that other sites have various forms of archiving software but I am partial to this one for several reasons. First, the archive is maintained compressed to conserve disk space. Second, for the most part it is completely automatic and requires very little attention. Third, I wrote it. -- David H. Brierley Home: dave@galaxia.zone1.com {cbosgd,gatech,necntc,ukma}!rayssd!galaxia!dave Work: dhb@rayssd.ray.com {cbosgd,gatech,necntc,ukma}!rayssd!dhb
barnett@vdsvax.steinmetz.UUCP (Bruce G Barnett) (12/09/87)
In article <705@vaxine.UUCP> wmp@vaxine.cs.ulowell.edu (Wayne Power) writes: |To echo Marcos' query, has anybody out there got one? I have two programs from the net that I use. I have modified both of them to suit my needs. First of all, expire will store articles in the same format as the spool directory. You can specify which articles to archive. I archive some newsgroups automatically with this. The second method I use - which keeps a particular article - one I don't usually archive - is with a program called keepnews. While reading an article with rn, I type "|keepnews" . This saves the article, and also modifies two log files. One in the newsgroup directory, one on the top level. I store them in the same directory as the expire'd articles. I then use a program distributed with the USENET software. It is either called savenews or keepnews. It was written by Chuq. What this program does is strip off some of the headers, and store the file in the following format: newsgroup/mm-yy/article-id It also stores a one line summary in the file LOGS/newsgroup I occasionally go thru and compress the older directories. This works well for large archives. Check out the news source directory for the above program. I can send you the diff's of my version, if you wish. It only works with BSD filenames - as far as I know. -- Bruce G. Barnett <barnett@ge-crd.ARPA> <barnett@steinmetz.UUCP> uunet!steinmetz!barnett