housel@en.ecn.purdue.edu (Peter S. Housel) (01/08/90)
Does anybody have a perl subroutine which can (fully) parse RFC822-compatible mail headers? I'm looking to replace the MH 'slocal' program with something simpler. Thanks in advance. -Peter S. Housel- housel@ecn.purdue.edu ...!pur-ee!housel
lamy@cs.toronto.edu (Jean-Francois Lamy) (01/09/90)
housel@en.ecn.purdue.edu (Peter S. Housel) writes: >Does anybody have a perl subroutine which can (fully) parse RFC822-compatible >mail headers? I'm looking to replace the MH 'slocal' program with >something simpler. If you really mean "fully" then you'll likely be out of luck. "slocal: likely is complicated because it has to be. Looking at at a completely different implementation (the local mailer) reveals that just the grammar in a real compiler-compiler language (SSL in this case), takes about 25K, and that excludes semantic processing done (i.e. it doesn't do anything with what it accepts). The code to parse dates according to spec is about 25K on its own, and various semantic stuff to stash headers, collect addresses, and so on is at least another 50 to 100K, from what I can estimate. To illustrate the fun things a "full" parser must catch, consider that the following lines are all (currently) illegal. To: <user@foo.bar.edu> -- needs a "phrase" From: John H. Doe <doe@bar.mumble.com> -- "." is illegal in a "phrase" (Make the grammar non-LALR(1)) References: <jjj@foo>, <jjj@bar> -- "," is illegal So what you probably want is not a full RFC 822 parser (and you can probably understand now why slocal has lots of goo under the hood). We're now pretty far away from perl, but such queries about mail parsers come up often enough that I thought it worthwhile to post. Jean-Francois Lamy lamy@cs.utoronto.ca, uunet!cs.utoronto.ca!lamy Department of Computer Science, University of Toronto, Canada M5S 1A4