[comp.sources.misc] v17i001: news_split1.6 - News to Archive, Part01/01

fmc@cnam.cnam.fr (Frederic Chauveau) (02/19/91)

Submitted-by: Frederic Chauveau <fmc@cnam.cnam.fr>
Posting-number: Volume 17, Issue 1
Archive-name: news_split1.6/part01

This is version 1.6 of news_split.  news_split is used to create and 
update an archive tree from the *.sources.* usenet groups. 

Frederic
----
#! /bin/sh
# This is a shell archive.  Remove anything before this line, then feed it
# into a shell via "sh file" or similar.  To overwrite existing files,
# type "sh file -c".
# The tool that generated this appeared in the comp.sources.unix newsgroup;
# send mail to comp-sources-unix@uunet.uu.net if you want that tool.
# If this archive is complete, you will see the following message at the end:
#		"End of archive 1 (of 1)."
# Contents:  README INSTALL Makefile news_split.c
# Wrapped by kent@sparky on Mon Feb 18 16:23:01 1991
PATH=/bin:/usr/bin:/usr/ucb ; export PATH
if test -f 'README' -a "${1}" != "-c" ; then 
  echo shar: Will not clobber existing file \"'README'\"
else
echo shar: Extracting \"'README'\" \(2932 characters\)
sed "s/^X//" >'README' <<'END_OF_FILE'
X/*
X * news_split		[fmc] 09/02/91		Version 1.6
X *			Browse a given NewsGroup (usualy comp.sources.???
X *			and try to update/maintain a directory for the
X *			sources found in it. 
X *			I use it to maintain the ~ftp/pub/comp.sources.???
X *			from the equivalent News directories.
X *
X *	CopyLeft and CopyWrong Frederic Chauveau [fmc@cnam.cnam.fr]
X */
X
XUsage: news_split [-s|-n] [-l logdir] [-f groupfile | group1 ... groupN]
X
XThe group_name is in Usenet format (eg. comp.sources.games). A file with
Xthe same name is created, containing the Min and Max article number already
Xprocessed. 
X
XA log file is created for each group processed. This logfile can be used to
Xmaintain an index of the created files. These logfile are created in the 
XLOGDIR directory (same as the destination directory by default). You can
Xspecify another directory with the '-l' option.
X
XIf the '-s' flag is present, new files will supercede old files in case of
Xconflicting names. This is the default.
X
XIf the '-n' flag is present, new files will not supercede old files.
X
XGroup names are apecified either on the command line, or with the 
X'-f groupfile' option. In the last case, groupfile is a list of group names,
Xone per line.
X
XYou can merge group_name and options on the command line.
X
X
XKnown Bugs/Limitations. 
X
XIt works on Ultrix 4.0 and later. The main system dependencies should
Xbe the scandir function. I've provided a replacement but it has not been
Xchecked. 
X
XThe header format is the most limitating factor. We're looking for :
X
Xa Subject line ("^Subject:") 
Xa Posting-number line ("^Posting-number:") and
Xa Archive-name line ("^Archive-name:").
X
Xin the first 30 lines, or before the first line beginning with a '#',
Xwhichever happend first.
X
XFrom the Subject line we get a first subject as the word following Subject:
XFrom the Post line we get a Volume-number and perhaps an Issue number.
XFrom the Arch line (if present) we get a destination filename.
X
XThe destination file will be :
X
X  if No Posting-number line, we copy the file as the article number. 
X
X  if No Archive-name line, and Posting-number isn't recognized as an
X     Information posting, we copy the file under the subject name (i.e.
X     vxxxcnnn usualy.)
X
X  if We have a volume number VV (from the Posting-number line) and an issue
X     number, we save as FTP_DIR/group_name/volumeVV/filename, where
X     filename comes from the archive-name line.
X
X  if we have a volume number VV (from the Posting-number line) and an info
X     number NN (From the Posting-number line) we save as
X     FTP_DIR/group_name/volumeVV/InfoNN.
X
XElse we warn that we couldn't parse the Posting-number line.
X
XMy current use for it is to run it once a day on the following groups :
X
Xcomp.sources.amiga	comp.sources.apple2	comp.sources.atari
Xcomp.sources.games	comp.sources.mac	comp.sources.misc
Xcomp.sources.sun	comp.sources.unix	comp.sources.x
Xcomp.binaries.amiga	comp.binaries.atari	comp.binaries.ibm.pc
X
END_OF_FILE
if test 2932 -ne `wc -c <'README'`; then
    echo shar: \"'README'\" unpacked with wrong size!
fi
# end of 'README'
fi
if test -f 'INSTALL' -a "${1}" != "-c" ; then 
  echo shar: Will not clobber existing file \"'INSTALL'\"
else
echo shar: Extracting \"'INSTALL'\" \(1931 characters\)
sed "s/^X//" >'INSTALL' <<'END_OF_FILE'
Xnews_split Version 1.6		Bugs, Gripes, etc... to fmc@cnam.cnam.fr
X				Last Modification:	12 Feb 91
X
XBefore compiling, check to see if the default values for the following
Xdefines are adequates :
X
X- NEWS_SPOOL	(/usr/spool/news)
X  The root directory of the news tree. news_split will try to find
X  articles in this tree.
X
X- FTP_DIR	(/local/ftp/pub)
X  The root directory where the destination files will be created. I
X  use it to update an anonymous ftp server.
X
X- LOGFILE	(%s/Index.ns)
X  A format string for printf.
X- LOGDIR	(FTP_DIR)
X  the root directory for logfiles.
X  A log file for comp.sources.misc will be created as :
X	LOGDIR/comp.sources.misc/Index.ns
X
X
XCompilation options and caveats :
X
X- For systems without /usr/include/unistd.h, add -DNOUNISTD to CFLAGS.
X  It will include <sys/files.h> instead. Only purpose is to get
X  defines for access() return codes.
X
X- For systems without the scandir() function, add -DNOSCANDIR to CFLAGS.
X  A scandir replacement is provided, but hasn't been extensively
X  tested. Purpose is to fill an array with 'interesting' file names.
X
X- Some systems have no DIRENT variable type (usually defined in
X  <sys/dir.h>). In most case, it is similar to 'struct direct'. I
X  think you need the BSD-style directory functionalities.
X
X
X
XFor news_split to work, you need read_access to your NEWS_SPOOL
Xdirectory (usualy the case) and write_access to FTP_DIR and LOGDIR
Xdirectories. 
X
X
XIf you encounter troubles with news_split, don't sue me, warn me :-)
X
X
X							[fmc]
X
X-------------------------------------------------------------------------------
XFrederic Chauveau		     Conservatoire National des Arts et Metiers
Xfmc@cnam.cnam.fr
X-------------------------------------------------------------------------------
XParadise is exactly like where you are right now, only much, much better.
X							   William S. Burroughs
X-------------------------------------------------------------------------------
X
X
X
END_OF_FILE
if test 1931 -ne `wc -c <'INSTALL'`; then
    echo shar: \"'INSTALL'\" unpacked with wrong size!
fi
# end of 'INSTALL'
fi
if test -f 'Makefile' -a "${1}" != "-c" ; then 
  echo shar: Will not clobber existing file \"'Makefile'\"
else
echo shar: Extracting \"'Makefile'\" \(539 characters\)
sed "s/^X//" >'Makefile' <<'END_OF_FILE'
XALLFILES = news_split.c Makefile README INSTALL
X
X# for system with no scandir function add -DNOSCANDIR to CFLAGS
X# for system with no <unistd.h> file, add -dNOUNISTD  to CFLAGS
X
XCFLAGS = -O
X
X# Perhaps you need to add -ldir to LDFLAGS.
X
XLDFLAGS =
X
XDESTDIR = /usr/local/bin
XINSTALL = install -c
X
Xnews_split : news_split.o
X	$(CC) $(CFLAGS) news_split.o -o news_split $(LDFLAGS)
X
Xclean :
X	rm -f news_split *.o *~ core
X
Xinstall : news_split
X	$(INSTALL) news_split $(DESTDIR)
X
Xshar :
X	shar -n news_split -a -x -c  -o news_split.shar $(ALLFILES)
END_OF_FILE
if test 539 -ne `wc -c <'Makefile'`; then
    echo shar: \"'Makefile'\" unpacked with wrong size!
fi
# end of 'Makefile'
fi
if test -f 'news_split.c' -a "${1}" != "-c" ; then 
  echo shar: Will not clobber existing file \"'news_split.c'\"
else
echo shar: Extracting \"'news_split.c'\" \(7587 characters\)
sed "s/^X//" >'news_split.c' <<'END_OF_FILE'
X/*
X * news_split		[fmc] 09/02/91		Version 1.6
X *			Browse a given NewsGroup (usualy comp.sources.???
X *			and try to update/maintain a directory for the
X *			sources found in it. 
X *			I use it to maintain the ~ftp/pub/comp.sources.???
X *			from the equivalent News directories.
X *
X *			Usage: news_split [-sn] [-l logdir] group ... group
X *			    or news_split [-sn] [-l logdir] -f groupfile
X *
X *			In the second case, groupfile is a file containing 
X *			each group to be processed, one per line.
X *
X *	CopyLeft and CopyWrong Frederic Chauveau [fmc@cnam.cnam.fr]
X */
X
X#include <stdio.h>
X#ifdef NOUNISTD
X#include <sys/file.h>
X#else
X#include <unistd.h>
X#endif
X#include <ctype.h>
X#include <errno.h>
X#include <sys/types.h>
X#include <sys/dir.h>
X
X
X#ifndef NEWS_SPOOL
X#define NEWS_SPOOL "/usr/spool/news"
X#endif /* no NEWS_SPOOL */
X
X#ifndef FTP_DIR
X#define FTP_DIR "/local/ftp/pub"
X#endif /* no FTP_DIR */
X
X#ifndef LOGFILE
X#define LOGFILE "%s/Index.ns"
X#endif /* no LOGFILE */
X
X#ifndef LOGDIR
X#define LOGDIR FTP_DIR
X#endif
X
XFILE *logfile;
Xchar *logdir = LOGDIR;
X
Xchar logname[BUFSIZ];
Xchar Supercede_P = 1;
X
X#ifndef DIRENT
X#define DIRENT struct direct
X#endif
X
X#ifdef NOSCANDIR
X
Xint scandir(DirName,NameList,Select,Sort)
Xchar *DirName;
XDIRENT *(*NameList[]);
Xint (*Select)(), (*Sort)(); {
X  int c = 0;
X  DIRENT **tab, *entry;
X  DIR *dirp;
X
X  dirp = opendir(DirName);
X  while (entry = readdir(dirp))
X    c += (Select ? (*Select)(entry) : 1);
X  tab = (DIRENT **) malloc(c * sizeof(DIRENT *));
X  rewinddir(dirp); c = 0;
X  while (entry = readdir(dirp))
X    if (!Select || (*Select)(entry))
X      tab[c++] = entry;
X  if (Sort)
X    qsort(tab,c,sizeof(DIRENT *),Sort);
X  *NameList = tab;
X  return c;
X}
X
X#endif /* NOSCANDIR */  
X
XDirSelect(dp)
XDIRENT *dp; {
X  return isdigit(dp->d_name[0]);
X}
X
Xvoid CreateDirAndFile(dirname,infile)
Xchar *dirname;
XFILE *infile; {
X  FILE *outfile;
X  char *p = dirname;
X  char buf[BUFSIZ];
X
X  while (p = (char *) strchr(p+1,'/'))
X    {
X      *p = '\0';
X      if (mkdir(dirname,0766) && (errno != EEXIST))
X	{
X	  FILE *foo = stderr;
X	  *stderr = *logfile;
X	  perror(dirname);
X	  *stderr = *foo;
X	  perror(dirname);
X	  return;
X	}
X      *p = '/';
X    }
X  if (!infile)
X    return;
X  if (access(dirname,F_OK) != -1)
X    {
X      fprintf(stderr,"  Warning: %s %s\n",
X	      (Supercede_P ? "Superceding" : "Not Superceding"),dirname);
X      fprintf(logfile,"  Warning: %s %s\n",
X	      (Supercede_P ? "Superceding" : "Not Superceding"),dirname);
X      if (!Supercede_P)
X	return;
X    }
X  outfile = fopen(dirname,"w");
X  if (!outfile)
X    {
X      FILE *foo = stderr;
X      *stderr = *logfile;
X      perror(dirname);
X      *stderr = *foo;
X      perror(dirname);
X    }
X  else
X    {
X      rewind(infile);
X      while (fgets(buf,BUFSIZ,infile))
X	fputs(buf,outfile);
X      fclose(outfile);
X    }
X}  
X
Xvoid SaveIt(pnumb,aname,subj,anumb,infile,group)
Xchar *pnumb, *aname, *group, *subj, *anumb;
XFILE *infile; {
X  int vol, iss;
X  char dirname[BUFSIZ];
X
X  pnumb[strlen(pnumb)-1] = '\0';
X  if (sscanf(pnumb,"Posting-number: Volume %d, Info%*[^0-9] %d",&vol,&iss) != 2)
X    {
X      if (sscanf(pnumb,"Posting-number: Volume %d %*[^0-9] %d",&vol,&iss) != 2)
X	{
X	  fprintf(stderr," Couldn't get volume for article [%s]\n",anumb);
X	  fprintf(logfile," Couldn't get volume for article [%s]\n",anumb);
X	  sprintf(dirname,"%s/%s/%s",FTP_DIR,group,anumb);
X	  CreateDirAndFile(dirname,infile);
X	}
X    }
X  else
X      sprintf(aname,"Archive-name: Info%d\n",iss);
X  if (!aname[0])
X    {
X      sprintf(dirname,"%s/%s/%s",FTP_DIR,group,subj);
X      CreateDirAndFile(dirname,infile);
X    }
X  else
X    {
X      aname += 14; aname[strlen(aname)-1] = '\0';
X      sprintf(dirname,"%s/%s/volume%d/%s",FTP_DIR,group,vol,aname);
X      fprintf(logfile,"Volume %3d, Issue %3d :\t\t%s\n",vol,iss,aname);
X      CreateDirAndFile(dirname,infile);
X    }
X}
X
Xvoid ProcArticle(art,group)
Xchar *art, *group; {
X  FILE *inp;
X  char buf[BUFSIZ], postnum[BUFSIZ], archnam[BUFSIZ];
X  char subject[BUFSIZ];
X  int i = 0;
X
X  if (!(inp = fopen(art,"r")))
X    {
X      perror(art);
X      return;
X    }
X  *postnum = *archnam = *subject = '\0';
X  while (fgets(buf,BUFSIZ,inp))
X    {
X      if ((*postnum && *archnam && *subject) ||
X	  (*buf == '#') || (i > 30))
X	{
X	  SaveIt(postnum,archnam,subject,art,inp,group);
X	  break;
X	}
X      if (!strncmp("Subject:",buf,8)) 
X	sscanf(buf,"Subject: %[^ :]",subject);
X      else if (!strncmp("Posting-number",buf,14))
X	strcpy(postnum,buf);
X      else if (!strncmp("Archive-name",buf,12))
X	strcpy(archnam,buf);
X      i++;
X    }
X  fclose(inp);
X}
X
XGroupToDir(group,dirname)
Xchar *group, *dirname; {
X  char *p;
X  strcpy(dirname,group);
X  
X  for (p = (char *) strchr(dirname,'.'); p; p = (char *) strchr(p,'.'))
X    *p++ = '/';
X}
X
Xvoid ProcGroup(group,first,last)
Xint *first, *last;
Xchar *group; {
X  register int i, art_num, max, nf = *first, nl = *last;
X  char dirname[BUFSIZ];
X  DIRENT **namelist;
X
X  chdir(NEWS_SPOOL);
X  GroupToDir(group,dirname);
X  chdir(dirname);
X  max = scandir(".",&namelist,DirSelect,NULL);
X  if (max == -1)
X    {
X      perror(dirname);
X      fprintf(stderr,"Cannot scan %s/%s\n",NEWS_SPOOL,group);
X      return;
X    }
X  for (i = 0; i < max; i++)
X    {
X      art_num = atoi(namelist[i]->d_name);
X      if (art_num < *first)
X	ProcArticle(namelist[i]->d_name,group);
X      else if (art_num > *last)
X	ProcArticle(namelist[i]->d_name,group);
X      if (art_num && art_num < nf)
X	nf = art_num;
X      if (art_num && art_num > nl)
X	nl = art_num;
X    }
X  *first = nf;
X  *last = nl;
X}
X
XFILE *OpenLog(group_name)
Xchar *group_name; {
X  char tmp[BUFSIZ], ln[BUFSIZ];
X  FILE *logfile;
X
X  sprintf(tmp,LOGFILE,group_name);
X  sprintf(ln,"%s/%s",logdir,tmp);
X  CreateDirAndFile(ln,NULL);
X  if (!(logfile = fopen(ln,"w")))
X    {
X      perror(ln);
X      logfile = stderr;
X    }
X  return logfile;
X}
X
Xvoid ProcName(group_name, backtodir)
Xchar *group_name, *backtodir; {
X  FILE *tmp;
X  int first, last;
X  long now;
X
X  now = time((long *) 0);
X  logfile = OpenLog(group_name);
X  tmp = fopen(group_name,"r");
X  if (!tmp)
X    {
X      first = 9999;
X      last = 0;
X    }
X  else
X    {
X      fscanf(tmp,"%d %d",&first,&last);
X      fclose(tmp);
X    }
X  fprintf(logfile," -- Processing group %s (%d %d) at %s",group_name,first,
X	  last,ctime(&now));
X  fprintf(stderr,"Processing group %s (%d %d)\n",group_name,first,last);
X  ProcGroup(group_name,&first,&last);
X  chdir(backtodir);
X  tmp = fopen(group_name,"w");
X  fprintf(tmp,"%d %d\n",first,last);
X  fclose(tmp);
X  if (logfile != stderr)
X    fclose(logfile);
X  fprintf(stderr,"Processed  group %s (%d %d)\n",group_name,first,last);
X}
X
X
Xvoid Usage(s,pn)
Xchar *s, *pn; {
X  fprintf(stderr,"Unknown option %s\n",s);
X  fprintf(stderr,"Usage: %s [-sn] [-l logdir] [-f group_file | group1 .. groupn]\n",pn);
X  exit(1);
X}
X
Xmain(argc,argv)
Xchar **argv; {
X  FILE *from_file;
X  char cwd[BUFSIZ], *pname = *argv;
X  
X  getcwd(cwd,BUFSIZ);
X  while (++argv, --argc)
X    {
X      if (**argv == '-')
X	{
X	  switch (argv[0][1])
X	    {
X	    case 's' : case 'S' : Supercede_P = 1; break;
X	    case 'n' : case 'N' : Supercede_P = 0; break;
X	    case 'l' : case 'L' : logdir = argv[1]; argv++; argc--; break;
X	    case 'f' : case 'F' : 
X	      from_file = fopen(argv[1],"r"); 
X	      if (!from_file)
X		{
X		  perror(argv[1]);
X		  argv++; argc--; 
X		}
X	      break;
X	    default: Usage(*argv,pname);
X	    }
X	  continue;
X	}
X      if (from_file)
X	{
X	  char gname[BUFSIZ];
X
X	  while (fgets(gname,BUFSIZ,from_file))
X	    {
X	      gname[strlen(gname)-1] = '\0';
X	      ProcName(gname,cwd);
X	    }
X	  fclose(from_file);
X	  from_file = NULL;
X	}
X      else 
X	ProcName(*argv,cwd);
X    }
X}
X
X	  
END_OF_FILE
if test 7587 -ne `wc -c <'news_split.c'`; then
    echo shar: \"'news_split.c'\" unpacked with wrong size!
fi
# end of 'news_split.c'
fi
echo shar: End of archive 1 \(of 1\).
cp /dev/null ark1isdone
MISSING=""
for I in 1 ; do
    if test ! -f ark${I}isdone ; then
	MISSING="${MISSING} ${I}"
    fi
done
if test "${MISSING}" = "" ; then
    echo You have the archive.
    rm -f ark[1-9]isdone
else
    echo You still need to unpack the following archives:
    echo "        " ${MISSING}
fi
##  End of shell archive.
exit 0
exit 0 # Just in case...
-- 
Kent Landfield                   INTERNET: kent@sparky.IMD.Sterling.COM
Sterling Software, IMD           UUCP:     uunet!sparky!kent
Phone:    (402) 291-8300         FAX:      (402) 291-4362
Please send comp.sources.misc-related mail to kent@uunet.uu.net.