[comp.lang.perl] Parsing RFC822 in Perl?

housel@en.ecn.purdue.edu (Peter S. Housel) (01/08/90)

Does anybody have a perl subroutine which can (fully) parse RFC822-compatible
mail headers? I'm looking to replace the MH 'slocal' program with
something simpler.

Thanks in advance.

-Peter S. Housel-	housel@ecn.purdue.edu		...!pur-ee!housel

lamy@cs.toronto.edu (Jean-Francois Lamy) (01/09/90)

housel@en.ecn.purdue.edu (Peter S. Housel) writes:

>Does anybody have a perl subroutine which can (fully) parse RFC822-compatible
>mail headers? I'm looking to replace the MH 'slocal' program with
>something simpler.

If you really mean "fully" then you'll likely be out of luck. "slocal: likely
is complicated because it has to be.

Looking at at a completely different implementation (the local mailer) reveals
that just the grammar in a real compiler-compiler language (SSL in this case),
takes about 25K, and that excludes semantic processing done (i.e. it doesn't
do anything with what it accepts).  The code to parse dates according to spec
is about 25K on its own, and various semantic stuff to stash headers, collect
addresses, and so on is at least another 50 to 100K, from what I can estimate.

To illustrate the fun things a "full" parser must catch, consider that the
following lines are all (currently) illegal.

To: <user@foo.bar.edu>			-- needs a "phrase"
From: John H. Doe <doe@bar.mumble.com>  -- "." is illegal in a "phrase"
					   (Make the grammar non-LALR(1))
References: <jjj@foo>, <jjj@bar>	-- "," is illegal

So what you probably want is not a full RFC 822 parser (and you can probably
understand now why slocal has lots of goo under the hood).

We're now pretty far away from perl, but such queries about mail parsers
come up often enough that I thought it worthwhile to post.

Jean-Francois Lamy               lamy@cs.utoronto.ca, uunet!cs.utoronto.ca!lamy
Department of Computer Science, University of Toronto, Canada M5S 1A4