[comp.sources.wanted] FAQ extractor

wengland@stephsf.stephsf.com (Bill England) (01/03/91)

In article <1048@unisql.UUCP> you write:
>Does anyone have a little cron daemon or some such that recognizes and
>extracts the growing number of FAQ (Frequently Asked Questions) postings
>and puts them in some other location?  I currently extract those in the
>newsgroups that I follow and put them in a public location, but I'm
>thinking of writing something like the above to relieve myself of the 
>burden and to also catch those FAQs in newsgroups I don't follow (but
>it's possible someone has already considered and done this) ...
>-- 
>alfred

  It should be pretty easy to hack such an animal in perl.  Right now 
  I have a utility that renames archive news files as a preliminary for 
  archiving comp.sources.* to floppy disk. 

  As an initial design how about running find /usr/spool/news -ctime 1
  -print to find new news articles and then parsing the Subject line 
  for the string FAQ.

  You will probally need a copy/rename function to translate files names ... 
  Hmm, maybe copy /usr/spool/news/GROUP/0000 /u/ftp/FAQS/GROUP If your file 
  system only supports 14 chars then some group renaming/mapping function 
  will be required.

  A modification of this utility could provide you with a daily or weekly
  posting list of your favorite authors or topics.



  Bill England
  wengland@stepsf.COM

#!/u/bin/perl
##
 # As a starter here is a perl script that finds some FAQ files
 # (This one currently goes through all news files, in pratice
 # a -ctime 1 would go into the find.)
 #
 #  Bill England
 #  wengland@stephsf.com
##
eval "exec /u/bin/perl -S $0 $*"
	if $running_under_some_shell;


open (FIND_PIPE, "find /usr/spool/news/ -print|");

while (<FIND_PIPE>){
	($fn)=split;

	# Skip files that are not text files ... ( Skip Directory names )
	#
	next unless -T "$fn";

	open(IN_NEWS, $fn);

	while(<IN_NEWS>){
		local($subject);

		($junk, $subject)= split(/: /, $_, 2) if /^Subject: /; 
	
	 	if ($subject){
			chop($subject);

       	 	printf("name: %s, Subject: %s\n", $fn, $subject) 
				 if $subject =~ /^Freq/;

			last; # IN_NEWS
		}
	}
} # end of find_pipe 
-- 
 +-  Bill England,  wengland@stephsf.COM -----------------------------------+
 |   * *      H -> He +24Mev                                                |
 |  * * * ... Oooo, we're having so much fun making itty bitty suns *       |
 |__ * * ___________________________________________________________________| 

harald.alvestrand@elab-runit.sintef.no (01/04/91)

There is a simpler way, at least for the end-user.
I hacked up a PERL script to run nntp (the socket-based mechanism), and
by using the XHDR command, I can get the subject fields, and search them.

This gives me at least the local article numbers of the FAQ articles.

I expect the server would not like me to implement a command that searches
all groups for FAQ subjects :-)
(no, the script IS too ugly for posting :-)


                   Harald Tveit Alvestrand
Harald.Alvestrand@elab-runit.sintef.no
C=no;PRMD=uninett;O=sintef;OU=elab-runit;S=alvestrand;G=harald
+47 7 59 70 94