[comp.unix.questions] Parsing Mail Headers

itkin@mrspoc.Transact.COM (Steven M. List) (04/30/91)

Does anyone have a program or PERL script or... that already does parsing
of mail headers?  I have no doubt that this exists, and based on a
thread realized that a mail folder sorter could be written fairly easily
once the parsing is done.  And why reinvent the wheel, right?
-- 
 +----------------------------------------------------------------------------+
 :                Steven List @ Transact Software, Inc. :^>~                  :
 :           Chairman, Unify User Group of Northern California                :
 :                         itkin@Transact.COM                                 :

garyp@cognos.UUCP (Gary Puckering) (05/01/91)

In article <1991Apr30.164513.391@mrspoc.Transact.COM> steven@Transact.COM writes:
>Does anyone have a program or PERL script or... that already does parsing
>of mail headers?  I have no doubt that this exists, and based on a
>thread realized that a mail folder sorter could be written fairly easily
>once the parsing is done.  And why reinvent the wheel, right?

Here's a script I wrote which sorts mailbox files.  The one thing it doesn't do
is normalize dates to GMT before sorting.  Other than that, it does the job.
Also included is a script which checks your mailboxes for various kinds of problems,
such as messages with the same date and From lines that are not preceded by
blank lines.

---------------------- cut here --------------------------------------------

#!/usr/local/bin/perl -i.bak

# Sort mail folder in date sequence
# Usage:
#        sortmail file...
#
# Backup saved in "file.bak".

$MONTH = "JanFebMarAprMayJunJulAugSepOctNovDec";

while (<>) {
	if (/^From +(.+) +(\w+) +(\w+) +(\w+) +(\d+[:]\d+[:]\d+)( *\w*) +(\d+)/) {
		$from = $1;
		$dow = $2;
		$month = index($MONTH,$3)/3+1;
		$day = $4;
		$time = $5;
		$tzone = $6;
		$year = $+;
		$date = sprintf("%04d-%02d-%02d_%s",$year,$month,$day,$time);
		print STDERR "\n$ARGV:\n  Sorting " if $ARGV ne $oldargv;
		$oldargv = $ARGV;
		$msg_no++;
		print STDERR "$msg_no ";
		print STDERR "\n          "       if !($msg_no % 20);
		if ($msg{$date} ne "") {
			print STDERR "\n**Duplicate message $msg_no\n->$_          ";
		}
	}
	$msg{$date} .= $_;
	$map{$date} = $msg_no;
	$in_cnt += length($_);
	if (eof) {
		print STDERR "\n  Writing ";
		foreach $k (sort (keys %msg)) {
			$len = length($msg{$k});
			print $msg{$k};
			$out_cnt += $len;
			$msg_no = $map{$k};
			print STDERR "$msg_no ";
			print STDERR "\n          "       if !($msg_no % 20);
		}
		$delta = $input - $output;
		print STDERR 
		  "\n  Byte count: input $in_cnt  output $out_cnt  diff $delta\n";
		undef %msg;
		undef %map;
		undef $date;
		$in_cnt = 0;
		$out_cnt = 0;
		$msg_no = 0;
	}
}


---------------------- cut here --------------------------------------------

#!/usr/local/bin/perl

# Check sequence of mail folders
# Usage:
#        checkmail file...

$MONTH = "JanFebMarAprMayJunJulAugSepOctNovDec";

while (<>) {
	if (/^From +(.+) +(\w+) +(\w+) +(\w+) +(\d+[:]\d+[:]\d+)( *\w*) +(\d+)/) {
		$from = $1;
		$dow = $2;
		$month = index($MONTH,$3)/3+1;
		$day = $4;
		$time = $5;
		$tzone = $6;
		$year = $+;
		$date = sprintf("%04d-%02d-%02d_%s",$year,$month,$day,$time);
		print "$ARGV:\n" if $ARGV ne $oldargv;
		$oldargv = $ARGV;
		$msg_no++;
		if ($msg{$date} ne "") {
			print STDERR "** Duplicate message $msg_no\n-> $_";
		}
		$msg{$date} = "X";
		if ($date lt $last_date) {
			print STDERR "** Out of sequence at $msg_no\n-> $_";
		}
		$last_date = $date;
		if ($last_line !~ /^\s*$/) {
			print STDERR "** From line not preceded by blank line at $msg_no\n-> $_";
		}
	}
	$last_line = $_; chop($last_line);
	if (eof) {
		undef %msg;
		undef $date;
		undef $last_date;
		undef $last_line;
		$msg_no = 0;
	}
}
-- 
Gary Puckering                             Cognos Incorporated
  VOICE: (613) 738-1338 x6100              P.O. Box 9707
  UUCP:  uunet!mitel!cunews!cognos!garyp   Ottawa, Ontario
  INET:  garyp%cognos.uucp@uunet.uu.net    CANADA  K1G 3Z4

Dan_Jacobson@ATT.COM (05/03/91)

the mailer described in gnu.emacs.vm.* newsgroups is packed with
parsing/reordering power.