[comp.mail.headers] Header stripper

mjo@irie.ais.org (Mike O'Connor) (04/02/91)

I am looking for a program/script that strips extraneous mail header
stuff from my mbox...  I generally don't need to see the 20 or so
machines that my mail goes through to get to me.  Does anyone have any
suggestions?  I will post to comp.mail.headers if I receive a large
body of mail on the subject.

Thanks!




____
Mike O'Connor <mjo@ais.org>

rickert@mp.cs.niu.edu (Neil Rickert) (04/02/91)

In article <1991Apr2.063227.25582@engin.umich.edu> mjo@ais.org writes:
>I am looking for a program/script that strips extraneous mail header
>stuff from my mbox...  I generally don't need to see the 20 or so
>machines that my mail goes through to get to me.  Does anyone have any
>suggestions?  I will post to comp.mail.headers if I receive a large
>body of mail on the subject.

 What you really need is a mail program with a user interface which does not
display these lines.  There is a good chance you already have one.  Most
halfway decent mail programs have a setup file where you can ask it to hide
certain headers, so you don't see them.

 Removing and discarding the 'Received:' headers is usually a bad idea, as you
will discover as soon as you run into a message with a bad reply address and
need to look at those headers to try to construct a valid reply address.

-- 
=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=
  Neil W. Rickert, Computer Science               <rickert@cs.niu.edu>
  Northern Illinois Univ.
  DeKalb, IL 60115                                   +1-815-753-6940

mjo@irie.ais.org (Mike O'Connor) (04/02/91)

In article <1991Apr2.123656.24499@mp.cs.niu.edu> rickert@mp.cs.niu.edu (Neil Rickert) writes:

: What you really need is a mail program with a user interface which does not
:display these lines.  There is a good chance you already have one.  Most
:halfway decent mail programs have a setup file where you can ask it to hide
:certain headers, so you don't see them.

Sorry...  poor choice of words in my initial post.  I not only don't
want to SEE the headers, but I also don't want to KEEP the headers.  I
use elm, and find that I am making large folders that are half
headers!  I would like some automagic way of stripping those headers.  

: Removing and discarding the 'Received:' headers is usually a bad idea, as you
:will discover as soon as you run into a message with a bad reply address and
:need to look at those headers to try to construct a valid reply address.

In most cases, I will already know that the address involved is
Reply-To:-able.  I just don't want 60 lines of header per message, and
am wondering if there's a simple way to do that.  (Editing folders by
hand is NOT simple.)

Thanks for your reply.






____
Mike O'Connor <mjo@ais.org>

romain@pyramid.pyramid.com (Romain Kang) (04/03/91)

Peter Honeyman's old motf ("mailer of the future"), which was basically
a cleaned up and enhanced BSD Mail, stripped the "ignored" headers
before saving the message to the files.  It didn't have a visual
interface like elm, but I still use it.  I don't think it's been
revised since around 1986 (when Peter started to play with Macs), but
it might still be available for anonymous ftp on citi.umich.edu.

paul@uxc.cso.uiuc.edu (Paul Pomes - UofIllinois CSO) (04/03/91)

mjo@irie.ais.org (Mike O'Connor) writes:

>Sorry...  poor choice of words in my initial post.  I not only don't
>want to SEE the headers, but I also don't want to KEEP the headers.  I
>use elm, and find that I am making large folders that are half
>headers!  I would like some automagic way of stripping those headers.  

Some time back I wrote the thdr filter to do just that for mail messages
inbound to notesfiles.  It uses the regexp package (every program should
be a learning experience).  Now for the incantation:

It's available via anon-FTP from uxc.cso.uiuc.edu in utils/thdr .

/pbp
--
         Paul Pomes

UUCP: {att,iuvax,uunet}!uiucuxc!paul   Internet, BITNET: paul@uxc.cso.uiuc.edu
US Mail:  UofIllinois, CSO, 1304 W Springfield Ave, Urbana, IL  61801-2910

lyndon@cs.athabascau.ca (Lyndon Nerenberg) (04/04/91)

paul@uxc.cso.uiuc.edu (Paul Pomes - UofIllinois CSO) writes:
>mjo@irie.ais.org (Mike O'Connor) writes:

>>Sorry...  poor choice of words in my initial post.  I not only don't
>>want to SEE the headers, but I also don't want to KEEP the headers.  I
>>use elm, and find that I am making large folders that are half
>>headers!  I would like some automagic way of stripping those headers.  

>Some time back I wrote the thdr filter to do just that for mail messages
>inbound to notesfiles.  It uses the regexp package (every program should
>be a learning experience).  Now for the incantation:

Regexp? Ick! here's something I hacked together a while back that should
do what he asked for. Also available via anonymous ftp from
aupair.cs.athabascau.ca:mail/striphdrs.c.

/*
 * Clean up headers in mail destined for a mailing list. I usually invoke
 * this from the smail alias file as follows:
 *
 * foo:	"|/usr/local/smail/striphdrs|/usr/local/bin/smail -oi -q -f foo-request foo-redist"
 *
 * Written January 1991 (or there abouts) by Lyndon Nerenberg.
 * This program is in the public domain.
 *
 * --lyndon@cs.athabascau.ca
 */

#ifndef lint
static char RCSid[] = "$Id: striphdrs.c,v 1.6 91/04/03 10:05:09 lyndon Rel $";
#endif /* ! lint */

#include <stdio.h>
#include <strings.h>
#include <malloc.h>

/*
 * Define MAILLIST if you want a "Precedence: bulk" header to be automatically
 * included in every message (useful for mailing list traffic).
 */

#define MAILLIST

#ifdef sun
extern void exit();
#endif /* sun */

#define INBUFSIZE 4096		/* Size of input buffer. Lines longer than */
				/* this will be truncated. */
#define	TRUE	1
#define FALSE	0

static char *hdr_del[] = {	/* NULL terminated list of hdrs to delete */
  "Return-Path:",
  "Received:",
  "Errors-To:",
  "Sender:",
  "Precedence:",
  NULL };

main(argc, argv)
  int argc; char *argv[];
{
  
  char *inbuf;			/* Input buffer */
  char **c;			/* Temporary pointer */
  int  in_headers = TRUE;	/* Set to 0 when last header encountered */
  int  deleting = FALSE;	/* Set to 1 if actively deleting header */
  
  inbuf = (char *) malloc((unsigned) INBUFSIZE);
  if (inbuf == NULL) {
    (void) fprintf(stderr, "%s: malloc(INBUFSIZE) failed!\n", argv[0]);
    exit(1);
  }
  
  while ((fgets(inbuf, INBUFSIZE, stdin)) != NULL) {
    
    if (in_headers) {
      
      if (*inbuf == '\n') {
	in_headers = FALSE;	/* Header/body seperator found */
#ifdef MAILLIST
	(void) fputs("Precedence: bulk\n\n", stdout);
#else /* ! MAILLIST */
	(void) fputs("\n\n", stdout);
#endif /* ! MAILLIST */
	continue;
      }

      if (deleting && ((*inbuf == ' ') || (*inbuf == '\t')))
	continue;		/* Skip any continuation lines */
      else
	deleting = FALSE;
      
      /* See if this is a bogus header */
      for (c = hdr_del; *c != NULL; c++)
	if (strncasecmp(inbuf, *c, strlen(*c)) == 0)
	  deleting = TRUE;

      if (!deleting)
	(void) fputs(inbuf, stdout);
    }
    else
      (void) fputs(inbuf, stdout);
  }
  exit(0);
/*NOTREACHED*/
}

-- 
    Lyndon Nerenberg  VE6BBM / Computing Services / Athabasca University
           atha!cs.athabascau.ca!lyndon || lyndon@cs.athabascau.ca
                    Packet: ve6bbm@ve6bbm.ab.can.noam
      The only thing open about OSF is their mouth.  --Chuck Musciano

randy@rls.UUCP (Randall L. Smith) (04/09/91)

In article <1991Apr2.063227.25582@engin.umich.edu>, mjo@irie.ais.org (Mike O'Connor) writes:
> I am looking for a program/script that strips extraneous mail header
> stuff from my mbox...  I generally don't need to see the 20 or so
> machines that my mail goes through to get to me.  Does anyone have any
> suggestions?  I will post to comp.mail.headers if I receive a large
> body of mail on the subject.

I dunno, is this what you're looking for?

/*
 *    header.c
 *
 *    Author:  Randall L. Smith
 *    Bangpath: ...<backbone>!tut!rls!randy
 *    Internet: rls!randy@tut.cis.ohio-state.edu
 *
 *    Donated to the public domain March 1, 1991.  Do whatever you
 *    like with it, even make money, as if you could. :-)
 *
 *    Simply strips headers off news & mail text files. (destructively)
 *
 *    Easily achieved because the first line of a mail or news
 *    header must begin with "Path:" and the last line of the
 *    header must be a blank line.
 *
 *    BUGS:  None known.
 *
 */

#include <stdio.h>

#ifndef NULL
#define NULL  (void* ) 0
#endif

#ifndef BUFSIZE
#define BUFSIZE 512
#endif

#ifndef FILENAME
#define FILENAME 256
#endif

FILE *fil_desc, *temp_chan;

int main(argc, argv)
int argc;
char *argv[];
{
    int file_count = 0, first_blank_line, ch, verbose = 0;

    /*
     *  large BUFSIZE is for news posters that don't know what <cr> is for....
     */
    char fil_name[FILENAME], inp_str[BUFSIZE], temp_file[FILENAME];

    while ((ch = getopt(argc, argv, "hv")) != EOF) {
	switch (ch) {
	case 'v':
	    verbose = -1;
	    file_count++;
	    break;
	case 'h':
	    fprintf(stdout, "Usage: %s [ filenames to have mail or news headers stripped ]\n", argv[0]);
	    exit(0);
	}
    }

    while (file_count < argc)
	while (++file_count < argc) {
	    (void) strcpy(fil_name, argv[file_count]);
	    first_blank_line = 0;
	    fil_desc = fopen(fil_name, "r");
	    if (ferror(fil_desc) != 0 || fil_desc == 0) {
		(void) fprintf(stderr, "Error opening %s\n", fil_name);
		break;
	    }
	    /*
	     *  Make sure there's a header to strip...
	     *  The wierd main while loop is due to this break statement.
	     *  The other option was a goto, ... not. :-|
	     */
	    (void) fgets(inp_str, BUFSIZE, fil_desc);
	    if (strncmp(inp_str, "Path:", 5) != 0) {
		(void) fclose(fil_desc);
		if (verbose)
		    fprintf(stdout, "No header to strip from %s\n", fil_name);
		break;
	    }
	    /*
	     *  Waddle through the file until the first blank line
	     *  past the Path: header line.  Once we find it, open
	     *  a work file to copy the rest of the files contents.
	     */
	    while (feof(fil_desc) == 0 && first_blank_line == 0) {
		(void) fgets(inp_str, BUFSIZE, fil_desc);
		if (strlen(inp_str) <= 1 && first_blank_line == 0) {
		    first_blank_line = -1;
		    strcpy(temp_file, fil_name);
		    strcat(temp_file, ".t");
		    temp_chan = fopen(temp_file, "w");
		    if (ferror(temp_chan) != 0 || temp_chan == 0) {
			(void) fprintf(stderr, "Error opening %s\n", temp_file);
			break;
		    }
		    if (verbose)
			fprintf(stdout, "Stripping header from %s\n", fil_name);
		}
	    }

	    /*
	     *  Copy the rest of the original file into the temp file.
	     */
	    while (feof(fil_desc) == 0) {
		(void) fgets(inp_str, BUFSIZE, fil_desc);
		(void) fputs(inp_str, temp_chan);
	    }

	    /*
	     *  Close original and temp files, then delete original
	     *  and rename temp to original.  Go back and do it again.
	     */
	    (void) fclose(temp_chan);
	    (void) fclose(fil_desc);
	    (void) unlink(fil_name);
	    (void) link(temp_file, fil_name);
	    (void) unlink(temp_file);
	}
    exit(0);
}

Cheers!

- randy

Usenet: randy@rls.uucp 
Bangpath: ...<backbone>!osu-cis!rls!randy
Internet: rls!randy@tut.cis.ohio-state.edu
%CC-I-ANACRONISM, The operator is an obsolete form and may not be portable.

randy@rls.UUCP (Randall L. Smith) (04/09/91)

In article <10712@rls.UUCP>, randy@rls.UUCP (Randall L. Smith) writes:
> In article <1991Apr2.063227.25582@engin.umich.edu>, mjo@irie.ais.org (Mike O'Connor) writes:
>> I am looking for a program/script that strips extraneous mail header
>> stuff from my mbox...  I generally don't need to see the 20 or so
>> machines that my mail goes through to get to me.  Does anyone have any
>> suggestions?  I will post to comp.mail.headers if I receive a large
>> body of mail on the subject.
> 
> I dunno, is this what you're looking for?
> [....] 
>  *    Easily achieved because the first line of a mail or news
>  *    header must begin with "Path:" and the last line of the
>  *    header must be a blank line.

Rats!!  I test *after* I post.  Sorry folks.  Heres the *corrected*
version.  The last version only trims news headers.  This one does both
mail and news.

/*
 *    header.c
 *
 *    Author:  Randall L. Smith
 *    Bangpath: ...<backbone>!tut!rls!randy
 *    Internet: rls!randy@tut.cis.ohio-state.edu
 *
 *    Donated to the public domain March 1, 1991.  Do whatever you
 *    like with it, even make money, as if you could. :-)
 *
 *    Simply strips headers off news & mail text files. (destructively)
 *
 *    Easily achieved because the first line of a mail or news
 *    header must begin with "From " or "Path:", respectively and
 *    the last line of the header must be a blank line.
 *
 *    BUGS:  Has potential to destroy files ending in .t
 *
 */

#include <stdio.h>

#ifndef NULL
#define NULL  (void* ) 0
#endif

#ifndef BUFSIZE
#define BUFSIZE 512
#endif

/* making room to append .t to temp file name. if you have longer
   file names, then change FILNAMLEN to max_file_name_length - 2  */

#ifndef FILENAME
#define FILENAME 12
#endif

FILE *fil_desc, *temp_chan;

int main(argc, argv)
int argc;
char *argv[];
{
    int file_count = 0, first_blank_line, ch, verbose = 0;

    /*
     *  large BUFSIZE is for news posters that don't know what <cr> is for....
     */
    char fil_name[FILENAME], inp_str[BUFSIZE], temp_file[FILENAME+2];

    while ((ch = getopt(argc, argv, "hv")) != EOF) {
	switch (ch) {
	case 'v':
	    verbose = -1;
	    file_count++;
	    break;
	case 'h':
	    fprintf(stdout, "Usage: %s [ filenames to have mail or news headers stripped ]\n", argv[0]);
	    exit(0);
	}
    }

    while (file_count < argc)
	while (++file_count < argc) {
	    (void) strcpy(fil_name, argv[file_count]);
	    if ( strlen(fil_name) > FILENAME ) {
	        (void) fprintf(stderr, "File name %s too long.\n", fil_name);
	        break;
	    }
	    first_blank_line = 0;
	    fil_desc = fopen(fil_name, "r");
	    if (ferror(fil_desc) != 0 || fil_desc == 0) {
		(void) fprintf(stderr, "Error opening %s\n", fil_name);
		break;
	    }
	    /*
	     *  Make sure there's a header to strip...
	     *  The wierd main while loop is due to this break statement.
	     *  The other option was a goto, ... not. :-|
	     */
	    (void) fgets(inp_str, BUFSIZE, fil_desc);
	    if (strncmp(inp_str, "Path:", 5) != 0 && strncmp(inp_str, "From ", 5) != 0) {
		(void) fclose(fil_desc);
		if (verbose)
		    fprintf(stdout, "No header to strip from %s\n", fil_name);
		break;
	    }
	    /*
	     *  Waddle through the file until the first blank line past
	     *  the Path: or From header line.  Once we find it, open a
	     *  work file to copy the rest of the files contents.
	     */
	    while (feof(fil_desc) == 0 && first_blank_line == 0) {
		(void) fgets(inp_str, BUFSIZE, fil_desc);
		if (strlen(inp_str) <= 1 && first_blank_line == 0) {
		    first_blank_line = -1;
		    strcpy(temp_file, fil_name);
		    strcat(temp_file, ".t");
		    temp_chan = fopen(temp_file, "w");
		    if (ferror(temp_chan) != 0 || temp_chan == 0) {
			(void) fprintf(stderr, "Error opening %s\n", temp_file);
			break;
		    }
		    if (verbose)
			fprintf(stdout, "Stripping header from %s\n", fil_name);
		}
	    }

	    /*
	     *  Copy the rest of the original file into the temp file.
	     */
	    while (feof(fil_desc) == 0) {
		(void) fgets(inp_str, BUFSIZE, fil_desc);
		(void) fputs(inp_str, temp_chan);
	    }

	    /*
	     *  Close original and temp files, then delete original
	     *  and rename temp to original.  Go back and do it again.
	     */
	    (void) fclose(temp_chan);
	    (void) fclose(fil_desc);
	    (void) unlink(fil_name);
	    (void) link(temp_file, fil_name);
	    (void) unlink(temp_file);
	}
    exit(0);
}

Cheers!

- randy

Usenet: randy@rls.uucp 
Bangpath: ...<backbone>!osu-cis!rls!randy
Internet: rls!randy@tut.cis.ohio-state.edu
%CC-I-ANACRONISM, The operator is an obsolete form and may not be portable.

lyndon@cs.athabascau.ca (Lyndon Nerenberg) (04/10/91)

randy@rls.UUCP (Randall L. Smith) writes:
> *    Easily achieved because the first line of a mail or news
> *    header must begin with "Path:" and the last line of the
> *    header must be a blank line.

This is incorrect. There is no "Path:" header defined for RFC822 mail,
and if one is present, there's no guarantee that it will be the first
header. The "Path:" header must be present in news articles, but again
there is no guarantee that it will be the first header, although all
existing news transports that I'm aware of will place it there.

-- 
    Lyndon Nerenberg  VE6BBM / Computing Services / Athabasca University
           atha!cs.athabascau.ca!lyndon || lyndon@cs.athabascau.ca
                    Packet: ve6bbm@ve6bbm.ab.can.noam
      The only thing open about OSF is their mouth.  --Chuck Musciano

randy@rls.UUCP (Randall L. Smith) (04/15/91)

In article <1621@aupair.cs.athabascau.ca>, lyndon@cs.athabascau.ca (Lyndon Nerenberg) writes:
> randy@rls.UUCP (Randall L. Smith) writes:
>> *    Easily achieved because the first line of a mail or news
>> *    header must begin with "Path:" and the last line of the
>> *    header must be a blank line.
> 
> This is incorrect. There is no "Path:" header defined for RFC822 mail,
> and if one is present, there's no guarantee that it will be the first
> header. The "Path:" header must be present in news articles, but again
> there is no guarantee that it will be the first header, although all
> existing news transports that I'm aware of will place it there.

Quite right.  See my follow up posting with corrections.  I didn't read
the RFC's for this information.  I simply looked at the actual headers of
news and mail.  Mail consistantly has "From " on the first line and news
has "Path:" on the first line, both in the first 5 characters.  My guess
is the RFC's somewhere define this. 

BTW, has anyone had any problem compiling that code?  Just curious.

Cheers!

- randy

Usenet: randy@rls.uucp 
Bangpath: ...<backbone>!osu-cis or mstar!rls!randy
Internet: rls!randy@tut.cis.ohio-state.edu	rls!randy@mstar.com
%CC-I-ANACRONISM, The operator is an obsolete form and may not be portable.

blarson@blars (04/15/91)

In article <10715@rls.UUCP> randy@rls.UUCP (Randall L. Smith) writes:
>In article <1621@aupair.cs.athabascau.ca>, lyndon@cs.athabascau.ca (Lyndon Nerenberg) writes:
>> randy@rls.UUCP (Randall L. Smith) writes:
>>> *    Easily achieved because the first line of a mail or news
>>> *    header must begin with "Path:" and the last line of the
>>> *    header must be a blank line.
>> 
>> This is incorrect. There is no "Path:" header defined for RFC822 mail,
>> and if one is present, there's no guarantee that it will be the first
>> header. The "Path:" header must be present in news articles, but again
>> there is no guarantee that it will be the first header, although all
>> existing news transports that I'm aware of will place it there.

C news usually (but not always) puts the Xref: header first if there is
one.  The Path: header usually follows.  (There is no RFC specifying
header order, and C news will put the headers in a different order if
a buffer fills before all headers have been read.)

>Quite right.  See my follow up posting with corrections.  I didn't read
>the RFC's for this information.  I simply looked at the actual headers of
>news and mail.  

And guessed the rest of the world followed what you saw on one system.

>Mail consistantly has "From " on the first line 

When using uucp style mail boxes, common (but not the only thing) on
unix systems and rare elsewhere.

>and news has "Path:" on the first line, both in the first 5 characters.

I think this is the case with B news.

>  My guess
>is the RFC's somewhere define this. 

Nope.


-- 
blarson@usc.edu
		C news and rn for os9/68k!
-- 
Bob Larson (blars)	blarson@usc.edu			usc!blarson
	Hiding differences does not make them go away.
	Accepting differences makes them unimportant.

steve@thelake.mn.org (Steve Yelvington) (04/15/91)

[In article <188@blars>,
     blarson@blars writes ... ]

> In article <10715@rls.UUCP> randy@rls.UUCP (Randall L. Smith) writes: 
>>My guess 
>>is the RFC's somewhere define this.  
> 
> Nope.  

If you want a copy of the relevant RFCs, you can send mail to
info-server@sh.cs.net with the following lines:

Request: RFC
Topic:   RFC850
Topic:   RFC1036
Topic:   RFC822
Request: End

It helps to have a dependable address -- there's never a guarantee with
``.UUCP.''

 ----
 Steve Yelvington, Marine on St. Croix, Minnesota, USA / steve@thelake.mn.org

lyndon@cs.athabascau.ca (Lyndon Nerenberg) (04/16/91)

randy@rls.UUCP (Randall L. Smith) writes:

>Quite right.  See my follow up posting with corrections.  I didn't read
>the RFC's for this information.  I simply looked at the actual headers of
>news and mail.  Mail consistantly has "From " on the first line and news
>has "Path:" on the first line, both in the first 5 characters.  My guess
>is the RFC's somewhere define this. 

The followup posting was also incorrect (see my second folowup
to it :-)

"The RFC's" don't define the headers in the manner you allude to.
The "Path:" header is defined in the NNTP RFC. That definition places
no restrictions on where it is located in the message headers. It can
be located anywhere, so depending on it being the first header is
incorrect.

The "From_" header is not defined in any of the RFC's, although
references are made to it in RFC976. The "From_" header is specific
to the UUCP *transport*, and *some* UNIX mail UA's. However there
is no *requirement* that that header be present. For example, all
of my incoming mail does not have a "From_" header - it gets stripped
out during delivery into my mailbox. (I use MH under smail3.1 on a
Sparcstation.)
-- 
    Lyndon Nerenberg  VE6BBM / Computing Services / Athabasca University
           atha!cs.athabascau.ca!lyndon || lyndon@cs.athabascau.ca
                    Packet: ve6bbm@ve6bbm.ab.can.noam
      The only thing open about OSF is their mouth.  --Chuck Musciano