[comp.sources.misc] v16i101: newsclean2.1 - tool to cleanup .newsrc, Part01/01

kluge@informatik.tu-muenchen.de (Oliver Kluge) (02/16/91)

Submitted-by: Oliver Kluge <kluge@informatik.tu-muenchen.de>
Posting-number: Volume 16, Issue 101
Archive-name: newsclean2.1/part01

This is the brand new 2.10 version of NewsClean, my .newsrc-compactor.
Please read README for details on how the program works and what
improvements are implemented in the current version.

Thanks to all of you who wrote me e-mail to tell me what they would
like to see improved in the new version!

Oliver Kluge
-------

#!/bin/sh
# shar:	Shell Archiver  (v1.23)
#
#	Run the following text with /bin/sh to create:
#	  README
#	  Makefile
#	  newsclean.c
#
sed 's/^X//' << 'SHAR_EOF' > README &&
XVersion 2.10 of 15.2.1991
X
X- Introduced new option -q (quiet), telling NewsClean not to print
X  status messages of each stage of compression.
X- Option -h now gives a options summary of NewsClean's features. Upon
X  detection of illegal options, NewsClean will also show the summary.
X- Temporary files now get where they belong: in /tmp. To avoid 
X  conflicts with other users, the filenames contain the pid of the
X  newsclean session that creates them.
X- Now error checking is performed on each opening of files.
X- Bug reported: The use of the markers of the compact routine core-
X  dumped on any non-VAX machine. Fixed (thanks to Dean Luick!).
X
XVersion 2.00 of 29.1.1991 
X
X- The program has been widely restructured in order to speed it up.
X- The program can now update your .newsrc to contain the latest
X  newsgroups. With command line switch -u (lower case u) new
X  newsgroups get listed (and sorted!) as unsubscribed, with -U
X  (upper case u) they get appended to the end of subscribed
X  newsgroups, themselves being subscribed-to. If you want to use
X  this feature, please check if you have the program "newsgroups"
X  available which prints out newsgroups not recorded in .newsrc.
X- The program is now able to deal with any .newsrc, not only with
X  rrn's. E.g. some news readers support "options" lines within .newsrc
X  (rrn doesn't). NewsClean now handles options correctly. They are
X  placed between subscribed and unsubscribed newsgroups.
X- Additonally, the program can now be run in any directory, if HOME
X  is set correctly the program will automatically find .newsrc.
X  If HOME is not set, the program will try to access LOGDIR and 
X  DOTDIR (in that order). If all this fails, the current directory 
X  is assumed.
X- Further more, if the environment variable NEWSRC is set, the 
X  complete file path in it is used instead of $HOME/.newsrc.
X- Also a small bug was fixed: NewsClean used to demand dictionary
X  sorting from sort by the flag -d. Together with flag -u, which
X  eliminates duplicate entries, it deleted C++ newsgroups if C
X  newsgroups were present, because sort recognizes duplicates
X  according to its key, and in dictionary sorting, there is no ++, so
X  C and C++ were virtually identical for sort. Very funny. Fixed.
X
XVersion 1.21 of 17.1.1991
X
X- Oops - rec.newsrc wasn't unlinked after use. Fixed.
X
XVersion 1.20 of 17.1.1991
X
X- Additional feature: Now the program can handle .newsrc files that
X  have invalid data structure. It is possible to remove the separating 
X  space between a newsgroup name and the read article counter without 
X  having rrn complaining immediately. Now newsclean can handle such 
X  .newsrc and will correct the data structure.
X- Another small bug fixed: If the .newsrc's last subscribed-to
X  newsgroup had no read articles, it got merged with the first
X  unsubscribed newsgroup. Fixed.
X
XVersion 1.10 of 16.1.1991
X
X- A bug occurred only when there was a subscribed-to newsgroup with
X  no read articles followed at least one additional subscribed-to
X  newsgroup with read articles. The bug is fixed.
X- An additional bug was fixed. The program did not optimally compact
X  read article counters that had combinations of skip marks and
X  ranges. Not really a bug, but now the result is even more compact.
X
XVersion 1.00 of 15.1.1991
X
XThis program compacts the .newsrc file of news readers. It does
Xthis by eliminating the read article counter of unsubscribed
Xnewsgroups. Additionally, the counters of subscribed-to newsgroups
Xget compacted by removing the unavailable article skip marks. It 
Xalso sorts the file so that all subscribed-to newsgroups appear at
Xthe beginning of the file. This also speeds up the newsreader because
Xit needs less time to seek your subscribed-to newsgroups. The 
Xunsubscribed ones get alphabetically sorted so whenever you want to 
Xrearrange your .newsrc, they will be neatly ordered (for convenience, 
Xcase is ignored). The order in which the subscribed-to newsgroups 
Xappear is left unchanged, so you can order them in your favorite 
Xreading order using vi.
X
XTo compile, have a look at Makefile and adjust as necessary.
XNewsclean was originally written for a DEC MicroVAX II under Ultrix 
Xto clean up rrn's .newsrc. Currently it is also tested on Sun's SunOS.
XAfter checking, simply type make.
X
XOliver Kluge
X
X                                         / relay.cs.net (CS-NET, ARPA)
Xkluge%lan.informatik.tu-muenchen.dbp.de@ - unido.uucp   (UUCP)
X                                         \ unido.bitnet (BITNET)
XTTTTTTUU  MUMMMMMMMM            Munich Institute of Technology
XTTTTTTUU  UMMMMMMMMM  Department of Mathematics and Computer Sciences SAB
X  TT  UU  MU  MM  MM           Laboratory for Parallel Computing
X  TT  UU  UM  MM  MM                    Arcisstrasse 21
X  TT  UU  MU  MM  MM                     8000-Munich 2
X  TT  UUUUUM  MM  MM              Federal Republic of Germany
X  TT  UUUUMU  MM  MM       Voice +49 89 2105-3251, Fax +49 89 2800529
X"Why stop now just when I'm hating it?" Marvin, the paranoid android
SHAR_EOF
chmod 0644 README || echo "restore of README fails"
sed 's/^X//' << 'SHAR_EOF' > Makefile &&
XCFLAGS=-O
X
Xnewsclean: newsclean.c
X	$(CC) $(CFLAGS) -o newsclean newsclean.c 
X	strip newsclean
X
Xlint:
X	lint newsclean.c > lint.out
X	
Xclean:
X	rm -f newsclean.o lint.out core
SHAR_EOF
chmod 0644 Makefile || echo "restore of Makefile fails"
sed 's/^X//' << 'SHAR_EOF' > newsclean.c &&
X/* ================================================================
X   |                                                              |
X   |   N E W S C L E A N                                          |
X   |                                                              |
X   |   Utility to clean up .newsrc. This program removes read     |
X   |   article count of unsubscribed newsgroups, eliminates       |
X   |   unavailable article skips in count, rearranges .newsrc so  |
X   |   subscribed newsgroups appear first and unsubscribed ones   |
X   |   get alphabetically sorted. By doing this, newsclean sig-   |
X   |   nificantly decreases .newsrc's size and increases the      |
X   |   newsreader's startup speed. Command line option -u adds    |
X   |   the latest newsgroups not recorded as unsubscribed, -U as  |
X   |   subscribed-to. Option -q activates quiet mode, no status   |
X   |   messages will be printed. -h prints options summary.       |
X   |   Version 2.10 (c) 15.2.1991 Oliver Kluge                    |
X   |   Syntax: newsclean [<-u><-U>] [-q] [-h]                     |
X   |                                                              |
X   |   Please send corrections or suggestions for improvement to: |
X   |   kluge%lan.informatik.tu-muenchen.dbp.de@relay.cs.net       |
X   |                                      ... @unido.uucp         |
X   |   >>>>> Bug reports *must* include:                          |
X   |   a) Version number of NewsClean                             |
X   |   b) Name and version of your newsreader                     |
X   |   c) Sample .newsrc which provokes the bug                   |
X   |   d) System name and operating system version you use.       |
X   |   No warranty, expressed or implied, is made that this pro-  |
X   |   gram fits a specific purpose or that it does no damage.    |
X   |   This program is free to be used as long as the copyright   |
X   |   notices are not removed or altered in any way.             |
X   |                                                              |
X   |==============================================================| */
X
X#include <stdio.h>
X#include <string.h>
X#include <ctype.h>
X
X#define FALSE 0
X#define TRUE  1
X
Xchar *getenv();
Xint UpdUns, UpdSub, Quiet;
XFILE *Old, *Clean, *Sub, *Opt, *Unsub, *Cmp, *New;
Xchar OldPath[80], ClnPath[80], SubPath[80], OptPath[80], UnsPath[80],
X     CmpPath[80];
Xchar Entry[1024];
Xint EndOfInput, CharEntry;
X
XFILE *open_file (fname, option)
Xchar *fname, *option;
X{
X	/* This routine checks for I/O errors when opening files */
X	FILE *fp;
X
X	if ((fp=fopen(fname, option))==NULL) {
X		fprintf(stderr,"Error - Can't open %s for %s!\n", fname,
X			(*option == 'r' ? "reading" : "writing"));
X		exit(1);
X	}
X	return fp;
X}
X
Xcleanup ()
X{
X	/* This routine checks the .newsrc if invalid newsgroups
X	   entries are present. These have no separating spaces between
X	   the newsgroup name and the read article counter */
X	int i;
X	char *Mark, *Mark2;
X	char Entry2[1024];
X
X	if (Quiet==FALSE) {
X		printf("Checking syntax   \r"); fflush(stdout);
X	}
X	Old = open_file(OldPath, "r");
X	Clean = open_file(ClnPath, "w");
X	do {
X		EndOfInput = fscanf(Old, "%s", Entry);
X		if (EndOfInput!=EOF) {
X			if (!isdigit(Entry[0])) {
X			/* Newsgroup name */
X				if (strncmp(Entry, "options", 7)==0)
X				/* Options found, process */
X					options();
X				else {
X					if (isdigit(Entry[strlen(Entry)-1])) {
X					/* Illegal counter without
X					   preceding space found */
X						Mark = strpbrk(Entry, ":!");
X						strcpy(Entry2, Entry);
X						Mark2 = strpbrk(Entry2, ":!");
X						strcpy(Mark2+2, Mark+1);
X						Mark2[1] = ' ';
X						fprintf(Clean, "\n%s", Entry2);
X					}
X					else fprintf(Clean, "\n%s", Entry);
X					/* No illegal counter */
X				}
X			}
X			else fprintf(Clean, " %s", Entry);
X			/* Read article counter */
X		}
X	}
X	while (EndOfInput!=EOF);
X	fclose(Clean);
X	fclose(Old);
X}
X
Xoptions ()
X{
X	/* This procedure does processing of options lines.
X	   Extra processing is needed, because options lines have
X	   a different syntax. */
X	int EndOpt;
X
X	if (Quiet==FALSE) {
X		printf("Processing options\r"); fflush(stdout);
X	}
X	Opt = open_file(OptPath, "a");
X	fprintf(Opt, "%s", Entry);
X	/* Save the word "options" */
X	EndOpt = FALSE;
X	do {
X		CharEntry = getc(Old);
X		putc(CharEntry, Opt);
X		if (CharEntry=='\n')
X		/* Look ahead into following line: Still options? */
X		{
X			CharEntry = getc(Old);
X			if ((CharEntry!=' ') &&  (CharEntry!='\t')) {
X			/* Option continuation lines always start
X			   with a space or tab */
X				EndOpt = TRUE;
X				ungetc(CharEntry, Old);
X				/* Oops - one too many, put it back
X				   for cleanup()! */
X			}
X			else putc(CharEntry, Opt);
X		}
X	}
X	while ((!EndOpt) && (CharEntry!=EOF));
X	fclose(Opt);
X}
X
Xcompact (Count)
Xchar Count[1024];
X{
X	/* This routine eliminates all skips for unavailable
X	   articles in a newsgroup's read article count entry */
X	int i;
X	char *Mark, *Mark2, *Mark2a;
X
X	if ((Mark=strpbrk(Count, "-,"))!=NULL) {
X		Mark2 = strrchr(Count, '-');
X		Mark2a = strrchr(Count, ',');
X 		/* One of Mark2 and Mark2a is not NULL. */
X 		if ((Mark2==NULL) ||
X 			((Mark2a!=NULL) && (strlen(Mark2a)<strlen(Mark2))))
X			Mark2 = Mark2a;
X		strcpy(Mark+1, Mark2+1);
X		Mark[0] = '-';
X	}
X} 
X
Xhelp ()
X{
X	/* This routine prints a command line options summary */
X	printf("\n===Options summary===\n");
X	printf("-u Update. Check for new newsgroups, add as unsubscribed.\n");
X	printf("-U Update. As above, but add as subscribed.\n");
X	printf("-q Quiet mode. Don't print status messages.\n");
X	printf("-h Help. Print this summary.\n\n");
X	fflush(stdout);
X	exit(0);
X}
X
Xmain (argc, argv) 
Xchar *argv[];
X{
X	int Subscribed, NoCount; 
X	int MyPid;
X	int i;
X	char SortCom[160], CompCom[80];
X	char *Home, *DiffNewsrc;
X	extern opterr;
X
X	printf("NewsClean - Version 2.10 (c) 15.2.1991 Oliver Kluge\n");
X	if ((Home = getenv("HOME"))==NULL)
X		if ((Home = getenv("LOGDIR"))==NULL)
X			if ((Home = getenv("DOTDIR"))==NULL)
X				Home = ".";
X	if ((DiffNewsrc = getenv("NEWSRC"))==NULL) 
X		sprintf(OldPath, "%s/.newsrc", Home);
X	else strcpy(OldPath, DiffNewsrc);
X	MyPid = getpid();
X 	sprintf(ClnPath, "/tmp/newsclean.cln.%d", MyPid);
X 	sprintf(SubPath, "/tmp/newsclean.sub.%d", MyPid);
X 	sprintf(CmpPath, "/tmp/newsclean.cmp.%d", MyPid);
X 	sprintf(OptPath, "/tmp/newsclean.opt.%d", MyPid);
X 	sprintf(UnsPath, "/tmp/newsclean.uns.%d", MyPid);
X	Subscribed = FALSE;
X	UpdUns = UpdSub = Quiet = FALSE;
X	opterr = 0;
X	while ((i=getopt(argc, argv, "uUqQ"))!=-1) {
X		switch(i) {
X			case 'u': UpdUns = TRUE; break;
X			case 'U': UpdSub = TRUE; break;
X			case 'q':
X			case 'Q': Quiet = TRUE; break;
X			case '?': help(); break;
X		}
X	}
X	
X	/* Clean up .newsrc of entries with invalid structure */
X	cleanup();
X	
X	/* Distribute the contents of .newsrc in two new files, one
X	   to hold all subscribed newsgroups and one to hold the
X	   unsubscribed. The latter gets stripped of the read article
X	   counter entries. */
X	if (Quiet==FALSE) {
X		printf("Splitting         \r"); fflush(stdout);
X	}
X	Clean = open_file(ClnPath, "r");
X	Sub = open_file(SubPath, "w");
X	Unsub = open_file(UnsPath, "w");
X
X	do {
X		EndOfInput = fscanf(Clean, "%s", Entry);
X		if (EndOfInput!=EOF) {
X			if (!isdigit(Entry[0])) {
X			/* Got a newsgroup name! */
X				if (Entry[strlen(Entry)-1]==':') {
X				/* It is subscribed-to */
X					Subscribed = TRUE;
X					fprintf(Sub, "\n%s", Entry);
X				}
X				else {
X				/* It is unsubscribed */
X					Subscribed = FALSE;
X					fprintf(Unsub, "\n%s", Entry);
X				}
X			}
X			else if (Subscribed==TRUE) {
X			/* Got a newsgroup read article counter! */
X				/* It is subscribed-to */
X					compact(Entry);
X					fprintf(Sub, " %s", Entry);
X				}
X		}
X	}
X	while (EndOfInput!=EOF);
X	fprintf(Sub, "\n");
X	fprintf(Unsub, "\n");
X
X	/* Updating requested? */
X	if ((UpdUns==TRUE) || (UpdSub==TRUE)) {
X		if (Quiet==FALSE) {
X			printf("Updating          \r"); fflush(stdout);
X		}
X 		sprintf(CompCom, "newsgroups \\* flag > %s", CmpPath);
X		system(CompCom);
X		Cmp = open_file(CmpPath, "r");
X		if (UpdUns==TRUE) do {
X			EndOfInput=fscanf(Cmp, "%s", Entry);
X			if (EndOfInput!=EOF) {
X				fprintf(Unsub, "%s!\n", Entry);
X				printf("Adding %s!\n", Entry);
X			}
X		}
X		while(EndOfInput!=EOF);
X		else do {
X			EndOfInput=fscanf(Cmp, "%s", Entry);
X			if (EndOfInput!=EOF) {
X				fprintf(Sub, "%s:\n", Entry);
X				printf("Adding %s:\n", Entry);
X			}
X		}
X		while(EndOfInput!=EOF);
X		fclose(Cmp);
X		unlink(CmpPath);
X	}
X
X	fclose(Clean);
X	fclose(Sub);
X	fclose(Unsub);
X	/* Trash the temporary file */
X	unlink (ClnPath);
X
X	/* Now let UNIX's sort do the alphabetical sorting of the
X	   unsubscribed newsgroups. */
X	if (Quiet==FALSE) {
X		printf("Sorting           \r"); fflush(stdout);
X	}
X 	sprintf(SortCom, "sort -i -f -u %s -o %s", UnsPath, UnsPath);
X	system(SortCom);
X
X	/* And reunite the subscribed and the unsubscribed to form
X	   the new .newsrc */
X	if (Quiet==FALSE) {
X		printf("Re-uniting        \r"); fflush(stdout);
X	}
X	New = open_file(OldPath, "w");
X	/* Subscribed */
X	Sub = open_file(SubPath, "r");
X	NoCount = FALSE;
X	do {
X		/* Newsgroup names first */
X		EndOfInput = fscanf(Sub, "%s", Entry);
X		if (EndOfInput!=EOF) {
X			if (!isdigit(Entry[0])) {
X			/* Newsgroup name */
X				if (NoCount==TRUE) fprintf(New, "\n");
X				fprintf(New, "%s", Entry);
X				NoCount = TRUE;
X			} 
X			else {
X			/* Read article counter */
X				fprintf(New, " %s\n", Entry);
X				NoCount = FALSE;
X			}
X		}
X	}
X	while (EndOfInput!=EOF);
X	if (NoCount==TRUE) fprintf(New, "\n");
X	fclose(Sub);
X
X	/* Options */
X	if ((Opt=fopen(OptPath, "r"))!=NULL) {
X		do {
X			CharEntry = getc(Opt);
X			/* Must use getc, because leading spaces and
X			   tabs are important for option continuation
X			   lines to be recognized! */
X			if (CharEntry!=EOF) putc(CharEntry, Clean);
X		}
X		while (CharEntry!=EOF);
X		fclose(Opt);
X	}
X
X	/* Unsubscribed */
X	Unsub = open_file(UnsPath, "r");
X	do {
X		/* Unsubscribed ones have no counter anymore */
X		EndOfInput = fscanf(Unsub, "%s", Entry);
X		if (EndOfInput!=EOF) fprintf(New, "%s\n", Entry);
X	}
X	while (EndOfInput!=EOF);
X	fclose(Unsub);
X	fclose(New);
X
X	/* Now trash the temporary files */
X	unlink(OptPath);
X	unlink(SubPath);
X	unlink(UnsPath);
X	if (Quiet==FALSE) {
X		printf("Finished.         \n"); fflush(stdout);
X	}
X	return 0;
X}
SHAR_EOF
chmod 0644 newsclean.c || echo "restore of newsclean.c fails"
exit 0

-------

--
                                         / relay.cs.net (CS-NET, ARPA)
kluge%lan.informatik.tu-muenchen.dbp.de@ - unido.uucp   (UUCP)
                                         \ unido.bitnet (BITNET)
TTTTTTUU  MUMMMMMMMM            Munich Institute of Technology
TTTTTTUU  UMMMMMMMMM  Department of Mathematics and Computer Sciences SAB
  TT  UU  MU  MM  MM           Laboratory for Parallel Computing
  TT  UU  UM  MM  MM                    Arcisstrasse 21
  TT  UU  MU  MM  MM                     8000-Munich 2
  TT  UUUUUM  MM  MM              Federal Republic of Germany
  TT  UUUUMU  MM  MM       Voice +49 89 2105-3251, Fax +49 89 2800529
"Why stop now just when I'm hating it?" Marvin, the paranoid android

exit 0 # Just in case...
-- 
Kent Landfield                   INTERNET: kent@sparky.IMD.Sterling.COM
Sterling Software, IMD           UUCP:     uunet!sparky!kent
Phone:    (402) 291-8300         FAX:      (402) 291-4362
Please send comp.sources.misc-related mail to kent@uunet.uu.net.