[comp.mail.headers] Summary of header stripper responses

mjo@irie.ais.org (Mike O'Connor) (04/06/91)

Here's a summary of programs and pointers to programs that will truncate
headers.  Gratuitous thanks to all who responded to my messages!  If I
incorrectly posted a mail address or organization, oops!  Sorry!

Use at your own risk.  I don't know enough about header munging to
ascertain whether all these programs work 100% reliably.

Now to tackle perl...

-------------------------------------------------------------------------------
Reply-To: Paul-Pomes@uiuc.edu
Organization: University of Illinois at Urbana

Some time back I wrote the thdr filter to do just that for mail messages
inbound to notesfiles.  It uses the regexp package (every program should
be a learning experience).  Now for the incantation:

It's available via anon-FTP from uxc.cso.uiuc.edu in utils/thdr .

/pbp
--
         Paul Pomes

UUCP: {att,iuvax,uunet}!uiucuxc!paul   Internet, BITNET: paul@uxc.cso.uiuc.edu
US Mail:  UofIllinois, CSO, 1304 W Springfield Ave, Urbana, IL  61801-2910

-------------------------------------------------------------------------------
Reply-To: romain@pyramid.com
Organization: Pyramid Technology Corp., Mountain View, CA

Peter Honeyman's old motf ("mailer of the future"), which was basically
a cleaned up and enhanced BSD Mail, stripped the "ignored" headers
before saving the message to the files.  It didn't have a visual
interface like elm, but I still use it.  I don't think it's been
revised since around 1986 (when Peter started to play with Macs), but
it might still be available for anonymous ftp on citi.umich.edu.

-------------------------------------------------------------------------------
From: lyndon@cs.athabascau.ca (Lyndon Nerenberg)
Organization: Athabasca University

Regexp? Ick! here's something I hacked together a while back that should
do what he asked for. Also available via anonymous ftp from
aupair.cs.athabascau.ca:mail/striphdrs.c.

/*
 * Clean up headers in mail destined for a mailing list. I usually invoke
 * this from the smail alias file as follows:
 *
 * foo:	"|/usr/local/smail/striphdrs|/usr/local/bin/smail -oi -q -f foo-request foo-redist"
 *
 * Written January 1991 (or there abouts) by Lyndon Nerenberg.
 * This program is in the public domain.
 *
 * --lyndon@cs.athabascau.ca
 */

#ifndef lint
static char RCSid[] = "$Id: striphdrs.c,v 1.6 91/04/03 10:05:09 lyndon Rel $";
#endif /* ! lint */

#include <stdio.h>
#include <strings.h>
#include <malloc.h>

/*
 * Define MAILLIST if you want a "Precedence: bulk" header to be automatically
 * included in every message (useful for mailing list traffic).
 */

#define MAILLIST

#ifdef sun
extern void exit();
#endif /* sun */

#define INBUFSIZE 4096		/* Size of input buffer. Lines longer than */
				/* this will be truncated. */
#define	TRUE	1
#define FALSE	0

static char *hdr_del[] = {	/* NULL terminated list of hdrs to delete */
  "Return-Path:",
  "Received:",
  "Errors-To:",
  "Sender:",
  "Precedence:",
  NULL };

main(argc, argv)
  int argc; char *argv[];
{
  
  char *inbuf;			/* Input buffer */
  char **c;			/* Temporary pointer */
  int  in_headers = TRUE;	/* Set to 0 when last header encountered */
  int  deleting = FALSE;	/* Set to 1 if actively deleting header */
  
  inbuf = (char *) malloc((unsigned) INBUFSIZE);
  if (inbuf == NULL) {
    (void) fprintf(stderr, "%s: malloc(INBUFSIZE) failed!\n", argv[0]);
    exit(1);
  }
  
  while ((fgets(inbuf, INBUFSIZE, stdin)) != NULL) {
    
    if (in_headers) {
      
      if (*inbuf == '\n') {
	in_headers = FALSE;	/* Header/body seperator found */
#ifdef MAILLIST
	(void) fputs("Precedence: bulk\n\n", stdout);
#else /* ! MAILLIST */
	(void) fputs("\n\n", stdout);
#endif /* ! MAILLIST */
	continue;
      }

      if (deleting && ((*inbuf == ' ') || (*inbuf == '\t')))
	continue;		/* Skip any continuation lines */
      else
	deleting = FALSE;
      
      /* See if this is a bogus header */
      for (c = hdr_del; *c != NULL; c++)
	if (strncasecmp(inbuf, *c, strlen(*c)) == 0)
	  deleting = TRUE;

      if (!deleting)
	(void) fputs(inbuf, stdout);
    }
    else
      (void) fputs(inbuf, stdout);
  }
  exit(0);
/*NOTREACHED*/
}

-- 
    Lyndon Nerenberg  VE6BBM / Computing Services / Athabasca University
           atha!cs.athabascau.ca!lyndon || lyndon@cs.athabascau.ca
                    Packet: ve6bbm@ve6bbm.ab.can.noam
      The only thing open about OSF is their mouth.  --Chuck Musciano

-------------------------------------------------------------------------------
From: moore@cs.utk.edu

Well, here's a perl filter that I use to print out mail messages
without all of those extra headers -- it should be easy to hack it a
bit to remove the troff macros and just have it delete the headers.

1.  remove the open (STDOUT,...)
2.  Take out the printfs that set up page breaks & margins for troff.
3.  Take out the s/pattern/replacement/; lines that handle escaping
    troff commands and escape characters so they will print, and
    also the ones that do boldfacing/italicizing.
4.  Replace the if (eof) {...} at the end with something like
    if ($_ ~ /^From /) { printf "%s\n", $_ ; $inheader=1; }
    so that perl will recognize message boundaries in mbox files.

I use mh myself, which has a different format for messages, so I
really can't try this out for you.  Sorry.

-------- begin enclosure
#!/usr/local/bin/perl
#
# Filter to be used when printing a message within xmh.
# Takes a mail message as input or on command line, deletes certain headers,
# and boldfaces the remaining header keywords.
#
# This used to be an awk program before being translated into perl.
#
# Keith Moore
#
eval "exec /usr/local/bin/perl -S $0 $*"
    if $running_under_some_shell;
			# this emulates #! processing on NIH machines.
			# (remove #! line above if indigestible)

eval '$'.$1.'$2;' while $ARGV[0] =~ /^([A-Za-z_]+=)(.*)/ && shift;
			# process any FOO=bar switches

# output filter for printing.  Modify this as needed.
open (STDOUT, "|/usr/local/bin/ptroff -F Helvetica");

# start out in message header 
$inheader = 1;
# set up page breaks and margins for troff
printf ".po 1.5i\n.ll 6i\n" ;			# page offset, line length
printf ".de EP\n'bp\n..\n" ;			# end page macro
printf ".de BP\n'sp 1i\n..\n" ;			# begin page macro
printf ".wh -1i EP\n.wh 0i BP\n.nj\n.nf\n" ;	# traps for EP and BP

while (<>) {
    chop;	# strip record separator

    if (/^[ 	]*$/) {
	# a completely blank line indicates end-of-header
	$inheader = 0;
    }

    if ($inheader == 1) {
	#
	# what follows is a list of regular expressions that match header
	# lines that should not appear in the output.  Add more as necessary.
	#
	if (($_ =~ /^[Ee][Rr][Rr][Oo][Rr][Ss]-[Tt][Oo]:/)		||
	    ($_ =~ /^Flags:/)						||
	    ($_ =~ /^[Ff]ull-[Nn]ame:/)					||
	    ($_ =~ /^Lines:/)						||
	    ($_ =~ /^Mail-System-Version:/)				||
	    ($_ =~ /^News-Software:/ )					||
	    ($_ =~ /^Organization:/)					||
	    ($_ =~ /^Path:/)						||
	    ($_ =~ /^[Pp][Rr][Ee][Cc][Ee][Dd][Ee][Nn][Cc][Ee]:/)	||
	    ($_ =~ /^[Rr][Ee][Cc][Ee][Ii][Vv][Ee][Dd]:/)		||
	    ($_ =~ /^[Rr][Ee][Ff][Er][Rr][Ee][Nn][Cc][Ee][Ss]:/)	||
	    ($_ =~ /^Replied:/)						||
	    ($_ =~ /^[Rr][Ee][Tt][Uu][Rr][Nn]-[Pp][Aa][Tt][Hh]:/)	||
	    ($_ =~ /^[Ss][Tt][Aa][Tt][Uu][Ss]:/)			||
	    ($_ =~ /^[Xx]-.*:/)) {
	    $ig = 1;
	}
	# if line starts with blank space it's a continuation of the previous
	# header line -- print only if last line was printed.
	elsif ($_ =~ /^[ 	]/ ) {
	    if (!$ig) {
	        s/\\/\\\\/g ;
	        printf "%s\n", $_ ;
	    }
	}
	# otherwise it's a header that we should print
	else {
	    $ig = 0;
	    if ($_ =~ /^([A-Za-z][^:]*:)(.*)$/ ) {
		printf "\\fB%s\\fR%s\n", $1, $2;
	    }
	    else {
		s/\\/\\\\/g ;
		s/^\./\\./ ;
		s/^'/\\'/ ;
		printf "bogus header line %s\n", $_;
	    }
	}
    }
    else {
	#
	# print a line from the message body
	#
	s/\\/\\\\/g ;
	s/^\./\\./ ;	
	s/^'/\\'/ ;
	#
	# bold face words surrounded by asterisks, and italicize
	# words surrounded by underlines
	#
	s/( |^)\*([ -)+-~]+)\*( |$)/\1\\fB\2\\fR\3/g ;
	s/( |^)_([ -^`-~]+)_( |$)/\1\\fI\2\\fR\3/g ;
	printf "%s\n", $_ ;
    }
    if (eof) {
	printf ".bp\n" ;
	$inheader = 1;
    }
}

-------- end enclosure

--
Keith Moore / U.Tenn CS Dept / 107 Ayres Hall / Knoxville TN  37996-1301
Internet: moore@cs.utk.edu      BITNET: moore@utkvx

-------------------------------------------------------------------------------
From: metcalf@masala.LCS.MIT.EDU (Chris Metcalf)
Organization: MIT Laboratory for Computer Science

My "striphdr" script in Perl:

#!/bin/perl
$inheader = 0;
$blank = 1;

while (<>) {
	if ($inheader) {
		if (/^From: ./ || /^To: ./ || /^Cc: ./ || 
		    /^Bcc: ./ || /^Subject: ./ || /^Date: ./) {
			print;
		}
		if (/^$/) {
			$inheader = 0;
			print;
		}
	}
	else {
		if ($blank && /^From /) {
			$inheader = 1;
		}
		$blank = /^$/;
		print;
	}
}

-------------------------------------------------------------------------------
From: Gary Gitzen <garyg@hpda.cup.hp.com>

Hi Mike,

I'll include a filter I use on all of my incoming mail. It's a bit
specialized due to my job (postmaster) and enviornment, but it lets
one do interesting things with incoming mail.
I use a variation of it to filter mail while I'm on vacation, to sort
mail based on who it is from and to whom it is addressed (I'm on dozens
of mail aliases). 

BTW, the routine "strhead" is similar to
        hdr=true
        cat $file |while read line
        do
                if [ $hdr = true ]
                then
                        if [ `echo $line |wc -c' -lt 2 ]
                        then
                                hdr=false
                        fi
                else
                        echo $line >>$newmail
                fi
        done



The actual code works a bit different, and I really should someday include
a variation of the above code because it would work better. We learn as 
we do.

Hope it helps.

Regards,
Gary

#! /bin/sh
# This script is a filter for incoming mail. It puts a copy of all incoming
# mail in "rcvdmail", then goes looking for "unprotected" mail headers
# imbedded in the message, which gives Elm fits. It puts a "<" in front of
# the message body if it finds imbedded headers.
# 11/29/89 garyg
#
owner=garyg
maildir=/mnt/admin/garyg/Mail
mboxdir=$maildir/Mailbox
mbox=$mboxdir/$owner
rcvdmail=$mboxdir/rcvdmail

cd $mboxdir

file=new$$

# Get the input
cat > $file
# ensure that the file ends with a blank line
echo "" >>$file

# Elm has problems with included mail headers if mail doesn't go to
# /usr/mail/$owner. Protect any secondary mail headers included in the
# message.

strip=/usr/lib/mail/path.db/strhead     # Strips off mail header
cat $file |$strip > tmp$$ # file without newest mail header
# Now isolate the current header
diff $file tmp$$ >tmpa$$
line1=`head -1 tmpa$$`
cat tmpa$$ |grep -v $line1 |cut -c3-100 >hdr$$

# We want to trash certain classes of mail.
if [ `cat hdr$$ |grep "^Subject: " |grep -c "AUTO ANSWER"` -ne 0 ]
then
        rm *$$
        exit 0
fi
if [ `cat hdr$$ |grep "^Subject: " |grep -c "I'm on vacation"` -ne 0 ]
then
        rm *$$
        exit 0
fi




# Now go looking for lines in the message body that act like mail headers.
cat /dev/null >old$$
mailstuff='^From: |^To: |^Received: |^Date: |^Subject: '
cat tmp$$ |egrep "$mailstuff" >>old$$

if [ `cat old$$ |wc -c` -gt 1 ] 
then
# Assume a second mail header exists. Put "<" at the front of every line.
        cat hdr$$ >n$file
        cat tmp$$ |sed 's/^/</' >>n$file
        mv n$file $file
fi
# Now add the mail message to our mailbox
if [ ! -f $mbox ]
then
        touch $mbox
        chown $owner $mbox
        chgrp mail $mbox
        chmod 760 $mbox
fi


cat $file >>$mbox

# Save a copy in rcvdmail
if [ ! -f $rcvdmail ]
then
        touch $rcvdmail
        chown $owner $rcvdmail
        chgrp mail $rcvdmail
        chmod 760 $rcvdmail
fi
cat $file >>$rcvdmail

rm -f *$$

exit 0

-------------------------------------------------------------------------------
From: Bill Wisner <wisner@ims.alaska.edu>
Organization: Amnesia International

Removes boring headers from file given on command line or standard input;
writes to standard output.  Deals with continuation lines too.  If you don't
have perl you're living in the dark ages.  Tailor at will.

#!/usr/bin/perl

# strip nasty headers from a message

while (<>) {
    if (/^$/) { $body = 1; }
    if (!$body && /^[ 	]/ && $inhdr) { next; }
    if (!$body && /^Message-Id: /io) { $inhdr = 1; next; }
    if (!$body && /^Received: /io) { $inhdr = 1; next; }
    if (!$body && /^X-Mailer: /io) { $inhdr = 1; next; }
    $inhdr = 0;
    print;
}



====
Mike O'Connor <mjo@ais.org>