wyle@lavi.uucp (Mitchell Wyle) (03/02/90)
Usenet News readers present you with *ALL* new articles in the groups you are reading. Kill files help filter, but you are still bombarded with articles. Information Retrieval (boolean search full text databases) present you with all articles matching your regular expression, regardless of whether you have seen them previously or not. You actively search, instead of passively read, junk, or kill. SDI systems (Selective Dissemination of Information) are sort of like boolean retrieval but they give you only new items since the last query, and query for you automatically and periodically. The system searches for you. I want to build a very simple e-mail based SDI system for usenet news. Perl seems ideal for such an application. If I create a directory of user-profiles with files in them like: |-.newsrc |-address |-person1-|-limit |-query and have the program maintain the individual .newsrc's, it should be easier than using find based on date and a "touchfile." query contains an egrep regular expression, and limit contains the maximum number of articles the user wants to receive. Here is a rather weak first attempt in awk: #!/bin/sh # @(#)server 1.2 90/02/28 11:50:36 echo " " | awk 'BEGIN { sp="/usr/spool/news/spool/" ac="/usr/spool/news/lib/active" pr="/usr/wyle/pasadina/3/profiles" while ( getline < ac > 0) { # load active file in assoc arrays active_high[$1] = $2 + 0 # highest active article } "ls " pr | getline t # load all users profile dirs split(t,profs) # with a hack (using split) close("ls " pr); } { for (ind in profs) { prof = profs[ind] ne = pr "/" prof "/newsrc" system("rm /tmp/re " ne) # remove old work file(s) f = pr "/" prof "/.newsrc" while ( getline < f > 0 ) { # go through the .newsrc high_nrc = substr($2,index($2,"-")+1,9) # get list of read articles low_nrc = substr($2,1,index($2,"-")-1) ng=substr($1,1,(length($1)-1)) # get newsgroup name upper=active_high[ng] # get active files high val for ( i=high_nrc+1 ; i < upper; i++) { # for all unread articles, gsub("\\.","/",ng) # convert . to / print sp ng "/" i | "xargs egrep -l \"`cat query`\" >> /tmp/re" close "xargs egrep -l \"`cat query`\" >> /tmp/re" } # end for gsub("/",".",ng) print ng ": " low_nrc "-" upper >> ne } # end while close(ne); system("mv " ne " " f) # update current .newsrc # We have a list of articles which have matched the regex in # the file /tmp/re. Loop through it and send stuff to subscribers getline ad < pr "/address" # users address getline li < pr "/limit" # message limit li += 0 while ((getline < "/tmp/ne" > 0 ) && (mn <= li) ) { system("cat " $0 " | mail " ad) } } }' This system is pretty slow because it forks and execs an "xargs egrep ..." Anyone want to give this a whirl in perl? You could use it to supplement your usenet habit ;-) I'd be much obliged. Another hack using rn: ----------------------------------------------------------------- #!/bin/sh export RNMACRO RNINIT WODIR TDIR=/home/antares1/wyle/research/wan/pasadina/3 WODIR=$TDIR"/work/news" PRDIR=$TDIR"/profiles" RNINIT="-s -T -t -d$WoDir" RNMACRO=/tmp/rnkill$$ trap 'rm -f $RNMACRO; exit' 1 2 3 15 /bin/rm $PRDIR/*/hits # remove old hitlists /bin/rm -rf $WODIR/* # remove old KILL file macros # create all KILL file directories, and empty KILL files: cat $PRDIR/*/groups | sed -e 's;\.;/;g' -e "s;^;mkdir -p $WODIR/;" > $WODIR/x . $WODIR/x sed -e "s/mkdir -p/touch/" -e "s;$;/KILL;" < $WODIR/x > $WODIR/y . $WODIR/y /bin/rm -f $WODIR/x $WODIR/y # create all KILL file entries: cd $PRDIR for p in * ; do touch $p/hits groups=`cat $p"/groups"` c=`cat $p/query` c=`echo $c | sed "s;^;/:\.\*;"` c=`echo $c | sed "s;$;\.\*/a:\!echo \%A \>\> $PRDIR/$p/hits;"` for group in $groups; do kf=$WODIR"/"`echo $group | sed 's;\.;/;g'`"/KILL" echo $c >> $kf done done # We have to hack the .newsrc here for testing: export RNMACRO RNINIT WODIR WODIR=$TDIR"/work" # RNINIT="-s -T -t -d$WODIR" RNINIT="-s -T -d$WODIR" RNMACRO=/tmp/rnkill$$ mv $HOME/.newsrc $HOME/s.newsrc cp $WODIR/.newsrc $HOME/.newsrc # use rn to create hit lists: echo "z %(%m=n?.qcy^M:n)^(z^)" > $RNMACRO echo "z" | rn rm $RNMACRO # mv $HOME/s.newsrc $HOME/.newsrc # Now we should have a for each subscriber a new "hit" list of # articles which match the regular expressions in the rn kill files exit 0 ------------------------------------------------------------------- I'd be much obliged. Does anything (other than newsclip) like this exist? -Mitchell F. Wyle Institut fuer Informationssysteme wyle@inf.ethz.ch ETH Zentrum / 8092 Zurich, Switzerland +41 1 254 7224 Kleptomaniac, n.: A rich thief. -- Ambrose Bierce, "The Devil's Dictionary"
emv@math.lsa.umich.edu (Edward Vielmetti) (03/03/90)
Mitchell Wylie, in <1190@gorath.cs.utexas.edu>, writes SDI systems (Selective Dissemination of Information) are sort of like boolean retrieval but they give you only new items since the last query, and query for you automatically and periodically. The system searches for you. I want to build a very simple e-mail based SDI system for usenet news. Perl seems ideal for such an application. I do this in 'gnus', the gnu emacs newsreader. Here's a sample kill file: ~News/comp.lang.c.KILL (gnus-kill "" "FTP" "u") (gnus-kill "From" "Torek\\|Spencer\\|Gwyn\\|pardo" "u") (gnus-kill "Subject" ".") (gnus-expunge "X") i.e. search everywhere for FTP, read stuff by these folks, and ditch the rest. For comp.sys.amiga.hardware: (gnus-kill "" "FTP\\|SCSI" "u") (gnus-kill "Subject" ".") (gnus-expunge "X") No doubt there's an equally good way with perl.