tkevans@oss670.UUCP (Tim Evans) (05/01/91)
My vendor's proprietary e-mail package, non-standard in almost every way (including not using sendmail), is generating duplicate headers. Generally, these are the Subject header and/or the From header, and they are always consecutive. That is, identical Subject headers are generated, and they occur consecutively. Since we are connected to the outside world via UUCP, this means we are doing Bad Things to other people's mail. Naturally, I'm anxious to fix this, but the vendor, whose contract doesn't require UUCP support or compliance with RFC-822, isn't. I'm not able to fix the mailer myself, but can pass its output through standard filters--awk, sed, etc.--before it goes out the door. My first thought was to pass things through 'uniq', but this would also delete consecutive identical lines in the body (the mailer doesn't distinguish between header and body). The probability of consecutive, identical lines in the body of mail messages seems low, but not low enough to chance this. So, can anyone provide a solution that would delete the second (and subsequent?) occurrences of identical lines that are RFC-822-style headers? I'd prefer not using 'perl' as I haven't installed it here yet (Real Soon Now). -- INTERNET tkevans%woodb@mimsy.umd.edu UUCP ...!{rutgers|ames|uunet}!mimsy!woodb!tkevans US MAIL 6401 Security Blvd, 2-Q-2 Operations, Baltimore, MD 21235 PHONE (301) 965-3286
lyndon@cs.athabascau.ca (Lyndon Nerenberg) (05/02/91)
[ Tried mailing this but oss670.uucp was unknown to us ] In comp.mail.headers you write: >I'm not able to fix the mailer myself, but can pass its output >through standard filters--awk, sed, etc.--before it goes >out the door. My first thought was to pass things through 'uniq', >but this would also delete consecutive identical lines in the body (the >mailer doesn't distinguish between header and body). The probability >of consecutive, identical lines in the body of mail messages seems >low, but not low enough to chance this. You almost answered your own question :-) Use sed to split the headers and body into seperate files. Run the header file through sort|uniq, then append the body file. Note that you will have to deal with header continuation lines somehow. A short piece of C code should handle folding the headers, and unfolding them when you're done. Perhaps the easiest way to deal with this would be to write the entire filter in C. All you need to do is maintain a linked list of headers you have seen. During the scanning phase, if you encounter a header that's already on the linked list, ignore it (and any possible continuation lines). If it's a new header, start up a second linked list of lines containing the header contents. If there are continuation lines in the header, simply append them to the linked list for that header. This eliminates the need to fold/spindle/mutilate the header continuation lines. Once you've fallen out of the headers, just copy the message body through and you're done!