[comp.sources.d] A simple source archiver

mcg@omepd.UUCP (06/17/87)
Summary:


Enclosed below is a simple shell script which we use to archive our
news sources.  Its input comes from two sources: clean, moderated
source groups, directly from rnews via lines in 'sys':

sources:world,na,comp.sources.unix::/usr/lib/news/src_arch mod
bsdbugs:world,na,comp.bugs.4bsd.ucb-fixes::/usr/lib/news/src_arch 4bsd-bugs

and also from users mailing sources in to be archived via some mail
aliases in /usr/lib/aliases, e.g.:

	netsrc: "|/usr/local/lib/news/src_arch net"
	netbug: "|/usr/local/lib/news/src_arch bug"
	netgames: "|/usr/local/lib/news/src_arch games"


Thus we archive the moderated stuff, along with that stuff our user
community feels is important.  If more than one user mails the same
article to the archiver, the duplicate versions are screened out.

The script uses the existing 'Archive-name:' headers (actually in the
body, not the header) to rename the files when they are available,
otherwise the articles simply accumulate in numbered files until a
human can rename them.  The archiver is careful not to overwrite
existing files.

Thus I have little problem with cruft in comp.sources.misc -- it doesn't
get archived if no one thinks it's important.

In any case, the script follows.

S. McGeady


#! /bin/sh
#
# src_arch group
#
# archive a source file in the source archive directory
# source file is on stdin
#
SOURCEDIR=/usr/src/netsrc
LOG=/usr/local/lib/news/src_arch.log
umask 002
if [ $# != 1 ]; then
	d=$SOURCEDIR/misc
else
	d=$SOURCEDIR/$1
fi;
if [ ! -d $d ]; then
	/bin/echo "`date`: $0: can't access $d" >> $LOG
	d=$SOURCEDIR/misc
fi;
cd $d;
# compute the next index number
if [ -s cur ]; then
	cur=`cat cur`
else
	cur=1
fi;
# see if this number is already in use
while [ -f $cur ]; do
	cur=`expr "$cur" + 1`
done;
expr "$cur" + 1 > cur
# save the article, deleting any spurious header info
cat > $cur
chmod 0444 $cur
# do some checking
subject=`head -25 $cur | grep '^Subject: ' | tail -1`
msg_id=`head -50 $cur | grep '^Message-ID: '`
msg_id=`expr "$msg_id" : '^Message-ID:  *<\([^>]*\)>'`
# if the message-id appears in the INDEX, it's a duplicate
if [ "$msg_id" != "" ]; then
	if grep -s "$msg_id" INDEX; then
		rm -f $cur
		/bin/echo "$cur: DUPLICATE $msg_id ($subject)" >> INDEX
		/bin/echo "$cur: DUPLICATE $msg_id ($subject)" >> $LOG
		exit;
	fi;
fi;
# pick first word of Subject line
vol=`expr "$subject" : '^Subject:  *\([^ 	:]\):.*'`

# if it has an 'archive-name' header, save it there
if name=`head -50 $cur | grep '^Archive-name: '`; then
	name=`expr "$name" : '^Archive-name:  *\(.*\)'`
	if [ "$name" != "" ] ; then
		file=`basename "$name"`
		dir=`expr "$name" : "\(.*\)/$file"`
		if [ "$dir" != "" ]; then
			if [ ! -d "$dir" ]; then
				mkdir $dir 2>> $LOG
			fi;
		fi;
		if [ ! -f "$name" ] && cp $cur $name 2>>$LOG; then
			chmod 0444 $name
			rm -f $cur
			/bin/echo "$name: $msg_id: $subject" >> INDEX
		else
			/bin/echo "$cur: '$name' ALREADY EXISTS for $msg_id: $subject" >> INDEX
			/bin/echo "$cur: '$name' ALREADY EXISTS for $msg_id: $subject" >> $LOG
		fi;
	else
		/bin/echo "$cur: $msg_id: $subject" >> INDEX
	fi;
else
	/bin/echo "$cur: $msg_id: $subject" >> INDEX
fi;
# the end