[comp.sources.misc] v16i084: News to Archive, Part01/01

fmc@cnam.cnam.fr (Frederic Chauveau) (01/29/91)

Submitted-by: fmc@cnam.cnam.fr (Frederic Chauveau)
Posting-number: Volume 16, Issue 84
Archive-name: news_split/part01

Ok, here it is version 1.5. It runs (tested) on Ultrix. I've provided
a replacement for scandir if you don't have the function. I couldn't
test it on other machines but it compiles Ok on a sony RISC (4.3bsd
looking a bit like MIPS/OS). Still no man page but a (simpleton)
Makefile. I ran it for the last couple of month on comp.sources.* with
(apparently) no troubles. 

I've added to compilation flags:

NOUNISTD if you don't have /usr/include/unistd.h (replaced by sys/file.h)
DIRENT if your struct direct (default) is called something else (like
struct dirent on Ultrix, but they provide an alias).

It should run Ok on all bsd-derived version of Un*x, don't know for
SysV.

Frederic Chauveau

---- Cut Here and unpack ----
#!/bin/sh
# This is news_split, a shell archive (shar 3.32)
# made 01/21/1991 10:14 UTC by fmc@cnam.cnam.fr
# Source directory /users/labinf/fmc/tools/news_split
#
# existing files will NOT be overwritten
#
# This shar contains:
# length  mode       name
# ------ ---------- ------------------------------------------
#   2608 -rw-r--r-- README
#    196 -rw-r--r-- Makefile
#   6441 -rw-r--r-- news_split.c
#
if touch 2>&1 | fgrep 'amc' > /dev/null
 then TOUCH=touch
 else TOUCH=true
fi
# ============= README ==============
if test X"$1" != X"-c" -a -f 'README'; then
	echo "File already exists: skipping 'README'"
else
echo "x - extracting README (Text)"
sed 's/^X//' << 'SHAR_EOF' > README &&
X/*
X * news_split		[fmc] 15/01/91		Version 1.5
X *			Browse a given NewsGroup (usualy comp.sources.???
X *			and try to update/maintain a directory for the
X *			sources found in it. 
X *			I use it to maintain the ~ftp/pub/comp.sources.???
X *			from the equivalent News directories.
X *
X *			Variables for you to overide :
X *			- LOGFILE a one argument string to generate 
X *	 		  the log file name. Default to "%s.log" where
X *			  %s will be the group name (eg. comp.sources.???)
X *			- NEWS_SPOOL the root of the News Spool tree. 
X *			  When looking for group.misc.name we will cd to
X *			  NEWS_SPOOL/group/misc/name
X *			- FTP_DIR the root of destination tree. Usualy
X *			  the absolute name of ~ftp/pub.
X */
X
XUsage: news_split [-sn] group_name ... group_name
X
XThe group_name is in Usenet format (eg. comp.sources.games). A file with
Xthe same name is created, containing the Min and Max article number already
Xprocessed. 
X
XA log file is created for each group processed. This logfile can be used to
Xmaintain an index of the created files. (See LOGFILE above)
X
XIf the '-s' flag is present, new files will supercede old files in case of
Xconflicting names. This is the default.
X
XIf the '-n' flag is present, new files will not supercede old files.
X
XYou can merge group_name and options on the command line.
X
XKnown Bugs/Limitations. 
X
XIt works on Ultrix 4.0 and later. The main system dependencies should
Xbe the scandir function. I've provided a replacement but it has not been
Xchecked. 
X
XThe header format is the most limitating factor. We're looking for :
X
Xa Subject line ("^Subject:") 
Xa Posting-number line ("^Posting-number:") and
Xa Archive-name line ("^Archive-name:").
X
Xin the first 30 lines, or before the first line beginning with a '#',
Xwhichever happend first.
X
XFrom the Subject line we get a first subject as the word following Subject:
XFrom the Post line we get a Volume-number and perhaps an Issue number.
XFrom the Arch line (if present) we get a destination filename.
X
XThe destination file will be :
X
X  if No Archive-name line, and Posting-number isn't recognized as an
X     Information posting, we copy the file under the subject name (i.e.
X     vxxxcnnn usualy.)
X
X  if We have a volume number VV (from the Posting-number line) and an issue
X     number, we save as FTP_DIR/group_name/volumeVV/filename, where
X     filename comes from the archive-name line.
X
X  if we have a volume number VV (from the Posting-number line) and an info
X     number NN (From the Posting-number line) we save as
X     FTP_DIR/group_name/volumeVV/InfoNN.
X
XElse we warn that we couldn't parse the Posting-number line.
X
SHAR_EOF
$TOUCH -am 0121095891 README &&
chmod 0644 README ||
echo "restore of README failed"
set `wc -c README`;Wc_c=$1
if test "$Wc_c" != "2608"; then
	echo original size 2608, current size $Wc_c
fi
fi
# ============= Makefile ==============
if test X"$1" != X"-c" -a -f 'Makefile'; then
	echo "File already exists: skipping 'Makefile'"
else
echo "x - extracting Makefile (Text)"
sed 's/^X//' << 'SHAR_EOF' > Makefile &&
X
X# for system with no scandir function add -DNOSCANDIR to CFLAGS
X
XCFLAGS = -O
X
X# you should add -ldir
X
XLDFLAGS =
X
Xnews_split : news_split.o
X	$(CC) $(CFLAGS) news_split.o -o news_split $(LDFLAGS)
X
SHAR_EOF
$TOUCH -am 0121095291 Makefile &&
chmod 0644 Makefile ||
echo "restore of Makefile failed"
set `wc -c Makefile`;Wc_c=$1
if test "$Wc_c" != "196"; then
	echo original size 196, current size $Wc_c
fi
fi
# ============= news_split.c ==============
if test X"$1" != X"-c" -a -f 'news_split.c'; then
	echo "File already exists: skipping 'news_split.c'"
else
echo "x - extracting news_split.c (Text)"
sed 's/^X//' << 'SHAR_EOF' > news_split.c &&
X/*
X * news_split		[fmc] 15/01/91		Version 1.5
X *			Browse a given NewsGroup (usualy comp.sources.???
X *			and try to update/maintain a directory for the
X *			sources found in it. 
X *			I use it to maintain the ~ftp/pub/comp.sources.???
X *			from the equivalent News directories.
X *
X *			Variables for you to overide :
X *			- LOGFILE a one argument string to generate 
X *	 		  the log file name. Default to "%s.log" where
X *			  %s will be the group name (eg. comp.sources.???)
X *			- NEWS_SPOOL the root of the News Spool tree. 
X *			  When looking for group.misc.name we will cd to
X *			  NEWS_SPOOL/group/misc/name
X *			- FTP_DIR the root of destination tree. Usualy
X *			  the absolute name of ~ftp/pub.
X *			
X *			Compile with $(CC) -o news_split news_split.c
X *			Usage: news_split [-sn] group_name ... group_name
X *
X *	CopyLeft and CopyWrong Frederic Chauveau [fmc@cnam.cnam.fr]
X */
X
X#include <stdio.h>
X#ifdef NOUNISTD
X#include <sys/file.h>
X#else
X#include <unistd.h>
X#endif
X#include <ctype.h>
X#include <sys/types.h>
X#include <sys/dir.h>
X
X#ifndef LOGFILE
X#define LOGFILE "%s.log"
X#endif /* no LOGFILE */
X
X#ifndef NEWS_SPOOL
X#define NEWS_SPOOL "/usr/spool/news"
X#endif /* no NEWS_SPOOL */
X
X#ifndef FTP_DIR
X#define FTP_DIR "/local/ftp/pub"
X#endif /* no FTP_DIR */
X
XFILE *logfile;
Xchar Supercede = 1;
X
X#ifndef DIRENT
X#define DIRENT struct direct
X#endif
X
X#ifdef NOSCANDIR
X
Xint scandir(DirName,NameList,Select,Sort)
Xchar *DirName;
XDIRENT *(*NameList[]);
Xint (*Select)(), (*Sort)(); {
X  int c = 0;
X  DIRENT **tab, *entry;
X  DIR *dirp;
X
X  dirp = opendir(DirName);
X  while (entry = readdir(dirp))
X    c += (Select ? (*Select)(entry) : 1);
X  tab = (DIRENT **) malloc(c * sizeof(DIRENT *));
X  rewinddir(dirp); c = 0;
X  while (entry = readdir(dirp))
X    if (!Select || (*Select)(entry))
X      tab[c++] = entry;
X  if (Sort)
X    qsort(tab,c,sizeof(DIRENT *),Sort);
X  *NameList = tab;
X  return c;
X}
X
X#endif /* NOSCANDIR */  
X
XDirSelect(dp)
XDIRENT *dp; {
X  return isdigit(dp->d_name[0]);
X}
X
Xvoid CreateDirAndFile(dirn,infile)
Xchar *dirn;
XFILE *infile; {
X  FILE *outfile;
X  char *p = dirn;
X  char buf[BUFSIZ];
X
X  while (p = (char *) strchr(p+1,'/'))
X    {
X      *p = '\0';
X      mkdir(dirn,0766);
X      *p = '/';
X    }
X  if (access(dirn,F_OK) != -1)
X    {
X      fprintf(stderr,"  Warning: %s %s\n",
X	      (Supercede ? "Superceding" : "Not Superceding"),dirn);
X      fprintf(logfile,"  Warning: %s %s\n",
X	      (Supercede ? "Superceding" : "Not Superceding"),dirn);
X      if (!Supercede)
X	return;
X    }
X  outfile = fopen(dirn,"w");
X  if (!outfile)
X    {
X      FILE *foo = stderr;
X      *stderr = *logfile;
X      perror(dirn);
X      *stderr = *foo;
X      perror(dirn);
X    }
X  else
X    {
X      rewind(infile);
X      while (fgets(buf,BUFSIZ,infile))
X	fputs(buf,outfile);
X      fclose(outfile);
X    }
X}  
X
Xvoid SaveIt(pnumb,aname,subj,infile,group)
Xchar *pnumb, *aname, *group;
XFILE *infile; {
X  int vol, iss;
X  char dirn[BUFSIZ];
X
X  pnumb[strlen(pnumb)-1] = '\0';
X  if (sscanf(pnumb,"Posting-number: Volume %d, Info%*[^0-9] %d",&vol,&iss) != 2)
X    {
X      if (sscanf(pnumb,"Posting-number: Volume %d %*[^0-9] %d",&vol,&iss) != 2)
X	{
X	  fprintf(stderr," Couldn't get volume for article [%s]\n",pnumb);
X	  fprintf(logfile," Couldn't get volume for article [%s]\n",pnumb);
X	  sprintf(dirn,"%s/%s/%s",FTP_DIR,group,subj);
X	  CreateDirAndFile(dirn,infile);
X	}
X    }
X  else
X      sprintf(aname,"Archive-name: Info%d\n",iss);
X  if (!aname[0])
X    {
X      sprintf(dirn,"%s/%s/%s",FTP_DIR,group,subj);
X      CreateDirAndFile(dirn,infile);
X    }
X  else
X    {
X      aname += 14; aname[strlen(aname)-1] = '\0';
X      sprintf(dirn,"%s/%s/volume%d/%s",FTP_DIR,group,vol,aname);
X      fprintf(logfile,"Volume %3d, Issue %3d :\t\t%s\n",vol,iss,aname);
X      CreateDirAndFile(dirn,infile);
X    }
X}
X
Xvoid ProcArticle(art,group)
Xchar *art, *group; {
X  FILE *inp;
X  char buf[BUFSIZ], postnum[BUFSIZ], archnam[BUFSIZ], subject[BUFSIZ];
X  int i = 0;
X
X  if (!(inp = fopen(art,"r")))
X    {
X      perror(art);
X      return;
X    }
X  *postnum = *archnam = *subject = '\0';
X  while (fgets(buf,BUFSIZ,inp))
X    {
X      if ((*postnum && *archnam && *subject) ||
X	  (*buf == '#') || (i > 30))
X	{
X	  SaveIt(postnum,archnam,subject,inp,group);
X	  break;
X	}
X      if (!strncmp("Subject:",buf,8))
X	sscanf(buf,"Subject: %[^ :]",subject);
X      else if (!strncmp("Posting-number",buf,14))
X	strcpy(postnum,buf);
X      else if (!strncmp("Archive-name",buf,12))
X	strcpy(archnam,buf);
X      i++;
X    }
X  fclose(inp);
X}
X
XGroupToDir(group,dirname)
Xchar *group, *dirname; {
X  char *p;
X  strcpy(dirname,group);
X  
X  for (p = (char *) strchr(dirname,'.'); p; p = (char *) strchr(p,'.'))
X    *p++ = '/';
X}
X
Xvoid ProcGroup(group,first,last)
Xint *first, *last;
Xchar *group; {
X  register int i, art_num, max, nf = *first, nl = *last;
X  char dirname[BUFSIZ];
X  DIRENT **namelist;
X
X  chdir(NEWS_SPOOL);
X  GroupToDir(group,dirname);
X  chdir(dirname);
X  max = scandir(".",&namelist,DirSelect,NULL);
X  if (max == -1)
X    {
X      perror(dirname);
X      fprintf(stderr,"Cannot scan %s/%s\n",NEWS_SPOOL,group);
X      return;
X    }
X  for (i = 0; i < max; i++)
X    {
X      art_num = atoi(namelist[i]->d_name);
X      if (art_num < *first)
X	ProcArticle(namelist[i]->d_name,group);
X      else if (art_num > *last)
X	ProcArticle(namelist[i]->d_name,group);
X      if (art_num && art_num < nf)
X	nf = art_num;
X      if (art_num && art_num > nl)
X	nl = art_num;
X    }
X  *first = nf;
X  *last = nl;
X}
X
Xvoid Usage(s)
Xchar *s; {
X  fprintf(stderr,"Unknown option %s, valid flags are -s or -n\n",s);
X}
X
Xmain(argc,argv)
Xchar **argv; {
X  FILE *tmp;
X  int first, last;
X  char cwd[BUFSIZ], Logname[BUFSIZ];
X  
X  getcwd(cwd,BUFSIZ);
X  while (++argv, --argc)
X    {
X      if (**argv == '-')
X	{
X	  switch (argv[0][1])
X	    {
X	    case 's' : case 'S' : Supercede = 1; break;
X	    case 'n' : case 'N' : Supercede = 0; break;
X	    default: Usage(*argv); exit(1);
X	    }
X	  continue;
X	}
X      tmp = fopen(*argv,"r");
X      sprintf(Logname,LOGFILE,*argv);
X      logfile = fopen(Logname,"a");
X
X      if (!tmp)
X	{
X	  first = 9999;
X	  last = 0;
X	}
X      else
X	{
X	  fscanf(tmp,"%d %d",&first,&last);
X	  fclose(tmp);
X	}
X      fprintf(logfile,"Processing group %s (%d %d)\n",*argv,first,last);
X      fprintf(stderr,"Processing group %s (%d %d)\n",*argv,first,last);
X      ProcGroup(*argv,&first,&last);
X      chdir(cwd);
X      tmp = fopen(*argv,"w");
X      fprintf(tmp,"%d %d\n",first,last);
X      fclose(tmp);
X      fclose(logfile);
X    }
X}
X
X	  
SHAR_EOF
$TOUCH -am 0121101391 news_split.c &&
chmod 0644 news_split.c ||
echo "restore of news_split.c failed"
set `wc -c news_split.c`;Wc_c=$1
if test "$Wc_c" != "6441"; then
	echo original size 6441, current size $Wc_c
fi
fi
exit 0

							[fmc]

-------------------------------------------------------------------------------
Frederic Chauveau		     Conservatoire National des Arts et Metiers
fmc@cnam.cnam.fr
-------------------------------------------------------------------------------
Paradise is exactly like where you are right now, only much, much better.
							   William S. Burroughs
-------------------------------------------------------------------------------

exit 0 # Just in case...
-- 
Kent Landfield                   INTERNET: kent@sparky.IMD.Sterling.COM
Sterling Software, IMD           UUCP:     uunet!sparky!kent
Phone:    (402) 291-8300         FAX:      (402) 291-4362
Please send comp.sources.misc-related mail to kent@uunet.uu.net.