[net.sources] keepnews -- A Usenet news archival system

mms@bocklin.UUCP (05/14/85)

			     KEEPNEWS

     Keepnews is a program that attempts to provide a sane way of
     extracting and keeping information that comes over Usenet.
     The basic idea is to allow users to designate an article
     as being of some worth and then save it in a common place.
     Thus, keepnews provides an alternative to personal archival
     of news articles.  Keepnews has been in use here at the
     University of Arizona for about two years and has proven to
     be a simple and viable way of cataloging useful information
     that arrives via the net.

     Keepnews accepts a news article on standard input, analyzes
     the headers of the article and stores the article in a file.
     Articles are stored as individual files in a tree structure
     matching that of the newsgroup hierarchy.  The Article-I.D.
     is used to uniquely identify an article; each article is
     stored only once.  If an article is associated with multiple
     newsgroups, it is linked into all appropriate directories.
     Both a master log file (kept in the keepnews root directory)
     and individual group log files (kept in each group's node)
     are maintained with the relevant information regarding each
     of the archived files (newsgroups, subject, source, etc.).

     The initial version of keepnews was written in Icon by
     Bill Mitchell at the University of Arizona.  This implemen-
     tation in C was motivated by the need for greater speed and
     various functional improvements.


-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
Mark M. Swenson                          Department of Computer Science
mms%arizona@csnet-relay                       The University of Arizona
{ihnp4,noao,mcnc,utah-cs}!arizona!mms     Tucson, Arizona  85721 // USA
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-


# This is a shell archive.  Remove anything before this line,
# then unpack it by saving it in a file and typing "sh file".
#
# Wrapped by arizona!mms on Thu May 13 22:49:27 MST 1985
# Contents:  README keepnews.1 Makefile keepnews.c

echo unbundling README 1>&2
cat >README <<'AlBeRtEiNsTeIn'
Before installing keepnews, change the KROOT definition in
keepnews.c and Makefile as appropriate for your system.

Also, you will have to create the KROOT directory manually
before you can use keepnews.

AlBeRtEiNsTeIn
echo unbundling keepnews.1 1>&2
cat >keepnews.1 <<'AlBeRtEiNsTeIn'
.TH KEEPNEWS 1 
.UC 4
.SH NAME
keepnews \- selectively archive (keep) news articles in a common location
.SH SYNOPSIS
\fBkeepnews\fP [ -n additional.newsgroups ] [ -t more subject line ]
.SH DESCRIPTION
\fIkeepnews\fP accepts a news article on standard input, analyzes the
headers of the article and stores the article in a file.  Articles
are stored as individual files in a tree structure matching that of
the newsgroup hierarchy.  The Article-I.D. is used to uniquely
identify an article; each article is stored only once.  If an article
is associated with multiple newsgroups, it is linked into all
appropriate directories.
.PP
The directory \fI/usr/spool/keepnews\fP serves as the root of
the \fIkeepnews\fP
tree.  If an article, \fIsomesite.100\fP
for example, is posted to net.general and
subsequently archived, it is stored in
\fI/usr/spool/keepnews/net/general/somesite.100\fP.
Log files which contain
information about articles that have been kept are
maintained in both the root directory and individual group
directories.  Log files do not record any information about who
kept an article.
.TP
.B \-n
The remaining arguments (to the next -n, -t, or newline) will be
taken as additional newsgroups, eg.,
\fB-n\fP
\fInet.additional.group.1 net.additional.group.2\fP.
The additional groups will only be added to the newsgroup field of
the related log files.
.TP
.B \-t
The remaining arguments (to the next -n, -t, or newline) will be
taken as additional information for the article subject, eg.,
\fB-t\fP
\fIadditional subject material\fP.
This will only be added to the subject field of the related log files.
.SH PURPOSE
\fIkeepnews\fP is an attempt to provide a sane way of extracting and
keeping information that comes over Usenet.  The basic idea
is to allow users to designate an article as being of some worth and
then save it in a common place.  Thus, \fIkeepnews\fP provides an
alternative to personal archival of news articles.
.SH USAGE
Because there are a variety of news reading programs available,
the exact usage of \fIkeepnews\fP depends on the program being used.
If \fIreadnews\fP is being used, responding with
.br
.ti 1i
\fBs- |keepnews\fP
.br
to the \fB[ynq]\fP prompt that appears after an article will cause
the article to be kept.  \fIkeepnews\fP produces no messages unless an
error occurs.
.SH "SEE ALSO"
\fIreadnews(1)\fP, \fInews(5)\fP
.SH BUGS
If \fIkeepnews\fP is given something that doesn't appear to be a news
article, a program abort may occur.
.PP
Some news reading programs (including \fIreadnews\fP) will prepend
a summary header line to redirected standard output that is subsequently
included in the archive created by \fIkeepnews\fP.  It is harmless.
AlBeRtEiNsTeIn
echo unbundling Makefile 1>&2
cat >Makefile <<'AlBeRtEiNsTeIn'
SRCS =	keepnews.c
OBJS =	keepnews.o
BINDIR= /usr/local
CFLAGS= -O

keepnews:	${OBJS}
	${CC} -o keepnews ${OBJS}

install: keepnews
	install -m 6755 -s keepnews ${BINDIR}/keepnews
	chown news ${BINDIR}/keepnews
	chgrp 0 ${BINDIR}/keepnews

clean:
	rm -f *.o

print:	${SRCS}
	lpr -p ${SRCS} Makefile
AlBeRtEiNsTeIn
echo unbundling keepnews.c 1>&2
cat >keepnews.c <<'AlBeRtEiNsTeIn'
/*
** The initial implementation of keepnews was written in Icon by
** Bill Mitchell at the University of Arizona.  This version
** is motivated by the need for greater speed and various func-
** tional improvements.
** 
** -- Mark Swenson {ihnp4,noao,utah-cs,mcnc}!arizona!mms
**    Dept. of Computer Science / University of Arizona
**
**
** keepnews extracts the header information from a redirected news article
** and checks to see if the article has already been saved.  If so, it
** appends a corresponding message in KROOT/log, otherwise it creates a
** file in the location specified by the first newsgroup listed, and
** links the remaining newsgroups to this file.  The body of the article
** is not read unless it is actually saved, and if that is the case,
** stdin is copied directly into this new file.
**
** The -n and -t options have been added to allow the user to supply
** additional newsgroups (eg. -n net.another.one net.another.two) or
** additional subject matter (eg. -t more subject material) respectively.
** These flags can be intermixed, and everything following each is taken
** as its argument up to the next -n, -t, or newline.
** 
*/

#include		<stdio.h>
#define	KROOT		"/usr/spool/keepnews"
#define	MAXGROUPS	20
#define	LINE		128

extern	FILE	*fopen();
extern	char	*index(), *malloc();
char		*match(); 
static	char	*head[5] =	{
				"From: ",
			 	"Newsgroups: ",
			 	"Subject: ",
			 	"Posted: ",
			 	"Article-I.D.: "
				} ;
main(argc,argv) char **argv;
{   FILE  *rlogfp, *flogfp, *filefp;
					/* use buffers extensively for speed */
    char  line[2*LINE];
    char  content[5][2*LINE];		/* data corresponding to head[] */
    char  *dirlist[MAXGROUPS];		/* individual newsgroups */
    char  pathlist[MAXGROUPS][LINE];	/* path to archive each group */
    char  flog[LINE];			/* path to leaf log */
    char  tmpbuf[LINE];			/* temp buffer */
    char  logline[4*LINE];		/* data for log files */
    char  rlog[LINE];			/* path to KROOT/log */
    char  tfile[LINE];			/* temp buffer */
    char  installfile[LINE];		/* actual file name for archive */
    char  lnfilename[LINE];		/* temp name for linking */
    char  wbuf[8*LINE];			/* write output buffer */
    char  *f = wbuf;			/* index to header lines in wbuf */
    char  t_list[LINE];			/* user supplied subject */
    char  n_list[LINE];			/* user supplied newsgroups */
    char  ch, *m, *tmpline;
    int   t_flag = 0;			/* user supplied subject added */
    int   n_flag = 0;			/* user supplied groups added */
    int   c, i, j, ofd, rcnt;
    int   numgroups = 1 ;
    int   line_length = LINE;		/* for internal reference */

    while (--argc > 0)			/* get any user supplied info */
      { if ((++argv)[0][0] == '-')
	  { ch = argv[0][1] ;
	    argv++ ;
	    argc-- ;
	  }
	switch (ch)
	  { case 'n':
		n_flag = 1 ;
		sprintf(n_list,"%s,%s",n_list,*argv) ;
		break ;
	    case 't':
		t_flag = 1 ;
		sprintf(t_list,"%s %s",t_list,*argv) ;
		break ;
	    default:
		fprintf(stderr,"Unknown flag: -%s\n",c) ;
		exit(1) ;
	  }
      }

    umask(0022) ;
    setuid(geteuid());
    if (access(KROOT,7))
      { fprintf(stderr,"Can't find or access keepnews root directory %s\n",
				KROOT);
	exit(1);
      }

    rcnt = read(0, wbuf, 1024);
    while ((j=getline(line,2*line_length,f)) > 0) 	/* get header info */
      { if (j < 0)
	  { fprintf(stderr,"Attempt to install incompatible article failed.\n");
	    exit(1);
	  }
	f += j + 1;
        for (i = 0 ; i <= 4 ; i++)
	    if (tmpline = match(line,head[i]))
		{ strcpy(content[i],tmpline) ; break ; }
      }

    /* add '-t' and '-n' information to article information */
    if (n_flag)
	sprintf(content[1],"%s%s",content[1],n_list);
    if (t_flag)
	sprintf(content[2],"%s (+%s +)",content[2],t_list);
    if (*content[1] == NULL)
      { fprintf(stderr,"Attempt to install incompatible article failed.\n");
        exit(1);
      }

    /* break newsgroups into separate groups */
    strcpy(*dirlist=malloc(strlen(content[1])),content[1]) ;
    while (dirlist[numgroups] = index(dirlist[numgroups-1],','))
	*dirlist[numgroups++]++ = NULL ;
    numgroups-- ;

    /* convert each newsgroup name into a directory path */
    for (j=0 ; j <= numgroups ; j++)
      { sprintf(pathlist[j],"%s/%s",KROOT,dirlist[j]) ;
	while (m = index(pathlist[j],'.')) *m = '/' ;

	m = pathlist[j] ;	/* create any missing ancestor directories */
	while (m = index(m,'/'))
	  { *m = NULL ; mkdir(pathlist[j],0755) ; *m++ = '/' ; }
        mkdir(pathlist[j],0755) ;
      }

    sprintf(rlog,"%s/log",KROOT) ;
    sprintf(logline,"%s; %s, %s, %s, %s",content[1],content[4],
		content[0],content[3],content[2]) ;
    sprintf(installfile,"%s/%s",pathlist[0],content[4]) ;
    sprintf(flog,"%s/log",pathlist[0]) ;

    rlogfp = fopen(rlog,"a") ;
    if (access(installfile,0))
      { fprintf(rlogfp,"Saved: %s\n",logline) ;	/* update "root" log */
	fclose(rlogfp) ;

	flogfp = fopen(flog,"a") ;	/* update "leaf" log */
	fprintf(flogfp,"%s\n",logline) ;
	fclose(flogfp) ;

	/*  write entire article header and body to archive file */
	ofd = creat(installfile, 0644) ;
	write(ofd, wbuf, rcnt) ;
	if (rcnt == 1024)
	  { while ((rcnt = read(0, wbuf, 1024)) == 1024)
	        write(ofd, wbuf, 1024) ;
	    write(ofd, wbuf, rcnt) ;
	  }
	close(ofd) ;

	/* install hard links to remaining newsgroups */
	for (i = 1 ; i <= numgroups ; i++)
	  { sprintf(tfile,"%s/%s",pathlist[i],content[4]) ;
	    sprintf(tmpbuf,"%s/log",pathlist[i]) ;
	    link(installfile,tfile) ;
	    flogfp = fopen(tmpbuf,"a") ;
	    fprintf(flogfp,"%s\n",logline) ;
	    fclose(flogfp) ;
	  }
      }
    else if (n_flag)		/* t_flag is implied */
      { fprintf(rlogfp,"Saved: %s\n",logline) ;	/* update "root" log */
	fclose(rlogfp) ;

	/* install hard links to remaining newsgroups */
	for (i = 0 ; i <= numgroups ; i++)
	  { sprintf(tfile,"%s/%s",pathlist[i],content[4]) ;
	    sprintf(tmpbuf,"%s/log",pathlist[i]) ;
	    link(installfile,tfile) ;
	    flogfp = fopen(tmpbuf,"a") ;
	    fprintf(flogfp,"%s\n",logline) ;
	    fclose(flogfp) ;
	  }
      }
    else if (t_flag)
      { fprintf(rlogfp,"Saved: %s\n",logline) ;	/* update "root" log */
	fclose(rlogfp) ;

	/* update "leaf" log for each leaf */
	for (i = 0 ; i <= numgroups ; i++)
	  { sprintf(tmpbuf,"%s/log",pathlist[i]) ;
	    flogfp = fopen(tmpbuf,"a") ;
	    fprintf(flogfp,"%s\n",logline) ;
	    fclose(flogfp) ;
	  }
      }
    else	/* update only KROOT log */
      { fprintf(rlogfp,"Duplicate request: %s\n",content[4]) ;
	fclose(rlogfp) ;
      }
    exit(0) ;
}

char *
match(s,t) char *s, *t;   	/* return end of t in s, NULL if none */
{   while (*t != NULL && *s++ == *t++) ;
    if (*t == NULL) return(s) ;
    else return(NULL) ;
}

getline(s,lim,f) char *s, *f;	/* get line into s from f & return length */
{   int c, i;			/* don't include newline; -1 if len > lim */
    char *ftmp;

    ftmp = index(f,'\n');
    *ftmp = NULL;
    if (strlen(f) <= lim)
	strcpy(s,f);
    else
	return(-1);
    *ftmp++ = '\n';
    f = ftmp;
    return(strlen(s));
}
AlBeRtEiNsTeIn
 
echo Inspecting for damage in transit...
temp=/tmp/shar$$; dtemp=/tmp/.shar$$
trap "rm -f $temp $dtemp; exit" 0 1 2 3 15
cat > $temp <<\!!!
      18      40     298 Makefile
       6      31     205 README
      67     415    2700 keepnews.1
     223     973    7038 keepnews.c
     314    1459   10241 total
!!!
wc  Makefile README keepnews.1 keepnews.c | sed 's=[^ ]*/==' | diff -b $temp - >$dtemp
if [ -s $dtemp ]
then echo "Ouch [diff of wc output]:" ; cat $dtemp
else echo "No problems found."
fi
exit 0