allbery@uunet.UU.NET (Brandon S. Allbery - comp.sources.misc) (03/05/89)
Posting-number: Volume 6, Issue 45 Submitted-by: mirk@warwick.UUCP (Mike Taylor) Archive-name: dissect2 [Okay, so csplit can't be this tricky. Still, you could do wonders with it and a shell script wrapper.... Of course, BSD may not have "csplit". ++bsa] Here is a simple and self-explainatory little number for comp.sources.misc It should run on any UNIX machine, though I've only tried it on a sun3 with Berkeley 4.3. It splits a large mbox into individually named personal mboxes for each person who has composed one or more of the mbox's constituent articles. See the manual page for more details. -------------------------- Cut here, cheese-heads! -------------------------- #! /bin/sh # This is a shell archive, meaning: # 1. Remove everything above the #! /bin/sh line. # 2. Save the resulting text in a file. # 3. Execute the file with /bin/sh (not csh) to create the files: # Makefile # Manifest # README # dissect.1 # dissect.c # This archive created: Sat Jan 21 18:30:57 1989 # By: Mike Taylor () export PATH; PATH=/bin:$PATH if test -f 'Makefile' then echo shar: will not over-write existing file "'Makefile'" else cat << \SHAR_EOF > 'Makefile' all: dissect.c cc -O -s dissect.c -o dissect rm -f count ln dissect count dissect: dissect.c cc -O -s dissect.c -o dissect count: dissect.c cc -O -s dissect.c -o count SHAR_EOF fi # end of overwriting check if test -f 'Manifest' then echo shar: will not over-write existing file "'Manifest'" else cat << \SHAR_EOF > 'Manifest' -rw-r--r-- 1 mirk csother 182 Jan 21 18:01 Makefile -rw-r--r-- 1 mirk csother 315 Jan 21 18:28 Manifest -rw-r--r-- 1 mirk csother 564 Jan 21 18:24 README -rw-r--r-- 1 mirk csother 1955 Jan 21 18:21 dissect.1 -rw-r--r-- 1 mirk csother 3194 Jan 21 17:57 dissect.c SHAR_EOF fi # end of overwriting check if test -f 'README' then echo shar: will not over-write existing file "'README'" else cat << \SHAR_EOF > 'README' Evening, all. This is a program written in a hurry by me one night because I was sick of wading through 1/4M mailboxes, trying to find some archaic piece of correspondance. It breaks up a large mailbox (or several of them, if you like) into smaller ones, named after the sender of the pieces of mail they contain. See the manual entry if this is unclear. Mail bugs, flames, pieces of frozen vomit, slices of intestinal lining etc., to mirk@uk.ac.warwick.cs. That's about it really. Lap it up! PS. 1st man: "My dog's got not nose" 2nd man: "Frog off." SHAR_EOF fi # end of overwriting check if test -f 'dissect.1' then echo shar: will not over-write existing file "'dissect.1'" else cat << \SHAR_EOF > 'dissect.1' .\" @(#)dissect.1 1.17 89/01/20 SMI; from HACKERS 1.1 .TH DISSECT 1 "20 January 1989" .SH NAME dissect \- Break up an mbox into smaller mboxes .br count \- Count number of articles in an mbox .SH SYNOPSIS .B dissect .I filename1 .I [ filename2 ... ] .br .B count .I filename1 .I [ filename2 ... ] .br .SH DESCRIPTION .B dissect reads through one or more files in mbox format (eg. the file mbox created by most "mail" programs, and the newsgroup files created by rn(1)). It creates new files, each named after the sender of an item of mail in one of the specified mboxes, and in that file, deposits copies of all mail sent by that user, so that together, the new files contain exactly the same data as the old ones. If the files that would be created already exist, then .B dissect will append the news items in the specified mboxes onto the end of the existing files. .B dissect will refuse to overwrite any of its arguments. .sp .B count counts how many articles are in each mbox specified on the command-line, and prints this on standard output. .SH EXAMPLES example% ls .br mbox .br example% dissect mbox .br example% ls .br VIRUS-L cee074 erict jec1 mbox .br andy chip.uucp hjt martin weemba .br example% count mbox martin hjt .br count: 11 items of mail in input file mbox. .br count: 1 items of mail in input file martin. .br count: 1 items of mail in input file hjt. .br example% dissect hjt .br dissect: won't overwrite input file hjt. .SH "SEE ALSO" .BR mail(1), .BR rn(1), .SH BUGS .B dissect creates the new files using only the local name of the user who sent the mail item being saved - thus a piece of mail sent by a user .B mirk@uk.ac.warwick.cs would be saved in a file called simply .B mirk. .SH AUTHOR .B dissect and .B count were written by Michael Taylor (mirk@uk.ac.warwick.cs) in the early hours of the morning of Friday, 20th January, 1989, on Warwick University's Sun3 "emerald". SHAR_EOF fi # end of overwriting check if test -f 'dissect.c' then echo shar: will not over-write existing file "'dissect.c'" else cat << \SHAR_EOF > 'dissect.c' /****************************************************************************\ |* *| |* Dissect.c: a rough-and-ready heap of junk to split a file in mbox *| |* format into a number of mbox-format files, each containing *| |* all the messages from a sender whose mail was in the *| |* original mbox, and named after that sender. *| |* *| |* Also: it will count the number of articles in each mbox in its *| |* argument list, when called with argv[0] not equal to *| |* dissect. *| |* *| |* This program written in the early hours of 21st January 1989. *| |* Copyright (C) 1989 by Mike Taylor. No rights reserved - copy me! *| |* *| \****************************************************************************/ #include <stdio.h> #include <strings.h> #define LINELEN 1024 extern char *fgets (); static int onlycount = 0; /*--------------------------------------------------------------------------*/ int handle (argv, index) char **argv; int index; { FILE *fp; FILE *to = NULL; static char name[LINELEN]; static char line[LINELEN]; static char last[LINELEN] = "\n"; char *cp; int flag = 0; if ((fp = fopen (argv[index], "r")) == NULL) { (void) fprintf (stderr, "%s: couldn't open input file %s.\n", argv[0], argv[index]); return (1); } while (fgets (line, LINELEN, fp) != NULL) { if ((!strncmp (line, "From ", 5)) && (*last == '\n')) { flag++; if (!onlycount) { (void) fclose (to); (void) strcpy (name, line+5); for (cp = name; (*cp != ' ') && (*cp != '@') && (*cp != '%'); cp++); *cp = '\0'; if (!strcmp (name, argv[index])) { (void) fprintf (stderr, "%s: won't overwrite input file %s.\n", argv[0], argv[index]); continue; } if ((to = fopen (name, "a")) == NULL) { (void) fprintf (stderr, "%s: couldn't open output file %s.\n", argv[0], name); return (1); } } } if ((to != NULL) && (!onlycount)) (void) fputs (line, to); (void) strcpy (last, line); } if (flag == 0) (void) fprintf (stderr, "%s: found no mail in input file %s.\n", argv[0], argv[index]); else if (onlycount) (void) printf ("%s: %3d items of mail in input file %s.\n", argv[0], flag, argv[index]); return (flag == 0); } /*--------------------------------------------------------------------------*/ main (argc, argv) int argc; char **argv; { int status = 0; int i; if (argc == 1) { (void) fprintf (stderr, "Usage: %s file [ file ... ]\n", argv[0]); exit (255); } if (strcmp (argv[0], "dissect")) onlycount = 1; for (i = 1; i < argc; i++) status += handle (argv, i); exit (status); } /*--------------------------------------------------------------------------*/ SHAR_EOF fi # end of overwriting check # End of shell archive exit 0 ______________________________________________________________________________ Mike Taylor - {Christ,M{athemat,us}ic}ian ... Email to: mirk@uk.ac.warwick.cs *** Unkle Mirk sez: "Em9 A7 Em9 A7 Em9 A7 Em9 A7 Cmaj7 Bm7 Am7 G Gdim7 Am" *** ------------------------------------------------------------------------------