mcg@omepd.UUCP (06/17/87)
Summary: Enclosed below is a simple shell script which we use to archive our news sources. Its input comes from two sources: clean, moderated source groups, directly from rnews via lines in 'sys': sources:world,na,comp.sources.unix::/usr/lib/news/src_arch mod bsdbugs:world,na,comp.bugs.4bsd.ucb-fixes::/usr/lib/news/src_arch 4bsd-bugs and also from users mailing sources in to be archived via some mail aliases in /usr/lib/aliases, e.g.: netsrc: "|/usr/local/lib/news/src_arch net" netbug: "|/usr/local/lib/news/src_arch bug" netgames: "|/usr/local/lib/news/src_arch games" Thus we archive the moderated stuff, along with that stuff our user community feels is important. If more than one user mails the same article to the archiver, the duplicate versions are screened out. The script uses the existing 'Archive-name:' headers (actually in the body, not the header) to rename the files when they are available, otherwise the articles simply accumulate in numbered files until a human can rename them. The archiver is careful not to overwrite existing files. Thus I have little problem with cruft in comp.sources.misc -- it doesn't get archived if no one thinks it's important. In any case, the script follows. S. McGeady #! /bin/sh # # src_arch group # # archive a source file in the source archive directory # source file is on stdin # SOURCEDIR=/usr/src/netsrc LOG=/usr/local/lib/news/src_arch.log umask 002 if [ $# != 1 ]; then d=$SOURCEDIR/misc else d=$SOURCEDIR/$1 fi; if [ ! -d $d ]; then /bin/echo "`date`: $0: can't access $d" >> $LOG d=$SOURCEDIR/misc fi; cd $d; # compute the next index number if [ -s cur ]; then cur=`cat cur` else cur=1 fi; # see if this number is already in use while [ -f $cur ]; do cur=`expr "$cur" + 1` done; expr "$cur" + 1 > cur # save the article, deleting any spurious header info cat > $cur chmod 0444 $cur # do some checking subject=`head -25 $cur | grep '^Subject: ' | tail -1` msg_id=`head -50 $cur | grep '^Message-ID: '` msg_id=`expr "$msg_id" : '^Message-ID: *<\([^>]*\)>'` # if the message-id appears in the INDEX, it's a duplicate if [ "$msg_id" != "" ]; then if grep -s "$msg_id" INDEX; then rm -f $cur /bin/echo "$cur: DUPLICATE $msg_id ($subject)" >> INDEX /bin/echo "$cur: DUPLICATE $msg_id ($subject)" >> $LOG exit; fi; fi; # pick first word of Subject line vol=`expr "$subject" : '^Subject: *\([^ :]\):.*'` # if it has an 'archive-name' header, save it there if name=`head -50 $cur | grep '^Archive-name: '`; then name=`expr "$name" : '^Archive-name: *\(.*\)'` if [ "$name" != "" ] ; then file=`basename "$name"` dir=`expr "$name" : "\(.*\)/$file"` if [ "$dir" != "" ]; then if [ ! -d "$dir" ]; then mkdir $dir 2>> $LOG fi; fi; if [ ! -f "$name" ] && cp $cur $name 2>>$LOG; then chmod 0444 $name rm -f $cur /bin/echo "$name: $msg_id: $subject" >> INDEX else /bin/echo "$cur: '$name' ALREADY EXISTS for $msg_id: $subject" >> INDEX /bin/echo "$cur: '$name' ALREADY EXISTS for $msg_id: $subject" >> $LOG fi; else /bin/echo "$cur: $msg_id: $subject" >> INDEX fi; else /bin/echo "$cur: $msg_id: $subject" >> INDEX fi; # the end