edward@csvaxa.UUCP (Edward Wilkinson) (01/30/88)
Disclaimer: please excuse me if this idea seems silly - I'm only a beginner at sendmail hacking :-) After getting rather confused (as many seem to do) fiddling with sendmail's rewriting rules, I tried to think of a way to improve them. What follows are a few preliminary ideas which I hope will be discussed, critisized, improved & hopefully not ignored! <1> My current problem is that I can't work out exactly which sets of rewriting rules get applied to which headers. How about having a set of rules for each header. e.g. From(1):- rule#1 rule#2 etc,etc The (1) is a number similar to the current setup, so that you can call this ruleset as a `subroutine' from elsewhere if necessary. There would be a set of these rules for the header & another for the envelope. If no ruleset appears for a particular header, no transformation is applied. <2> There could be a couple of reserved rulesets, such as Initial & Final which would respectively be applied at the start of each and every address manipulation. <3> Lastly, there could be general sets of rules which could be called from all over the place to do common sets of transformations. These would be like the current rulesets which get called from different places. I don't think that these ideas would be hard to implement, but would make sendmail's configuration file a little easier to read, understand and therefore modify. Any and all discussion on these ideas is welcomed and if I'm just being completely ridiculous, PLEASE someone tell me! I suppose this article comes from the fact that I can't understand sendmail as it is. -- Ed Wilkinson @ Computer Centre, Massey University, Palmerston North, NZ uucp: ...!uunet!vuwcomp!{massey, csvaxa}!edward DTE: 530163000005 Greybook: E.Wilkinson@nz.ac.massey Phone: +64 63 69099 x8587 CSNET/ACSnet/Internet: E.Wilkinson@massey.ac.nz New Zealand = GMT+12
jeff@tc.fluke.COM (Jeff Stearns) (02/10/88)
In article <180@csvaxa.UUCP> E.Wilkinson@massey.ac.nz writes: >Disclaimer: please excuse me if this idea seems silly - I'm only a >beginner at sendmail hacking :-) > >After getting rather confused (as many seem to do) fiddling with >sendmail's rewriting rules, I tried to think of a way to improve them. >What follows are a few preliminary ideas which I hope will be >discussed, critisized, improved & hopefully not ignored! > ><1> My current problem is that I can't work out exactly which sets of >rewriting rules get applied to which headers. How about having a set >of rules for each header. e.g. > First, I urge you to acquire the ``ease'' translator for sendmail config files; it's a very useful tool for sendmail.cf hacking. It has been posted to the sources newsgroup and is available via the archiving mechanisms described in that group. (In fact, I wouldn't be surprised to see news of it posted there again soon.) The ``ease'' translator was written by: James S. Schoner Mathematical Sciences Building, Office 204 Purdue University West Lafayette, Indiana 47907 jss@purdue-asc.ARPA In the meantime, here is some sendmail information I've collected over the years; I think you may find it useful for getting the Big Picture. (These comments are excerpted from our custom-written sendmail.cf file. It includes information posted by others over the past couple of years. Thank you all.) Jeff Stearns jeff@tc.fluke.COM John Fluke Mfg. Co, Inc. (206) 356-5064 /* * * Some words about the following data paths and how they are CODED INTO * SENDMAIL: * * (Sun release 3.2 paths) * +-> 3 -> 1 -> 4 -> ${m_saddr} * | * +-> 3 -> 0 -> {mailer, host, user} * | | * | `--> 2 -> R -> 4 -> ${m_sreladdr} * | (N.B. "R" here is not a typo!) * | * | * | (4.xBSD and Sun release 2.x path?) * +--> 3 -> 1 -> 4 --+--> 3 -> 1 -> S -> 4 -> ${m_sreladdr} * | | * | `--> ${m_saddr} * | * | * | * | * | * | .---> 3 -> 0 -> {mailer, host, user} * | | | | * | | ${m_rhost} | * | | | * | | | * | | | * | | | * | | | * | | (4.3BSD) | * | | +------ 4 <- R <- 2 <--+ * | | | | * | | | (4.2BSD) | * | | +---------------- 2 <--+ * | | | * | | v * | | if (mailer == local) * | | then {expand aliases} * | | | * | | | * | | `--> 3 -> 0 -> {mailer, host, user} * | | | * | | `-> 2 -> R -> 4 -> ${m_ruser} * | | * sendmail -f FROM RCPT ... * ______________________ ______________________ * | From: sender ----------> 3 -> D -> 1 -> S -> 4 ---> From: sender | * | To: recipient -------> 3 ------> 2 -> R -> 4 -----> To: recipient | * | Cc: cc-recipient ----> 3 ------> 2 -> R -> 4 -----> Cc: cc-recipient | * | | | | * | .................... | | .................... | * | ... message body ... | ------------------------> | ... message body ... | * | .................... | | .................... | * |______________________| |______________________| * * * A sendmail configuration file is similar to a giant sed script; it contains * sets of regular expressions called rulesets. Rulesets have integer names. * (Rulesets named "R" and "S" above are notational artifacts which * represent rulesets whose numerical name may vary by context; the exact * numerical value is not important here. Ruleset "D" is the addition of * "@domain" to the sender's address iff the C flag is set in the mailer * definition corresponding to the *sending* mailer. See the Sendmail * Installation and Operation Guide.) In this file, we will take advantage * of the ease compiler's ability to bind mnemonic names to rulesets. * * Expanding the ruleset numbers into their mnemonic names gives us a clearer * idea of the address transformations and macro definitions as they occur at * our site: * * Envelope FROM * -> Canonicalize * -> Add_Local_Hostname * -> Uncanonicalize * => ${m_saddr} * * Envelope FROM * -> Canonicalize * -> Add_Local_Hostname * -> Uncanonicalize * -> Canonicalize * -> Add_Local_Hostname * -> Delete_Tc_Hosts / Externalize_Fluke_Domain / Null * -> Uncanonicalize * => ${m_sreladdr} * * Envelope RCPT * -> Canonicalize * -> Zero * => ${m_rhost} * * Envelope RCPT * -> Canonicalize * -> Zero * -> Null * -> Delete_Tc_Hosts / Null * -> Uncanonicalize * => ${m_ruser} * * Envelope FROM * -> Canonicalize * -> Add_Local_Hostname * -> Uncanonicalize * -> Canonicalize * -> Add_Local_Hostname * -> Delete_Tc_Hosts / Externalize_Fluke_Domain / Null * -> Uncanonicalize * => ${m_sreladdr} * * From: * -> Canonicalize * -> "@domain" (optional; probably NOT done) * -> Add_Local_Hostname * -> Delete_Tc_Hosts / Externalize_Fluke_Domain / Null * -> Uncanonicalize * => From: * * To, Cc: * -> Canonicalize * -> Delete_Tc_Hosts / Null * -> Uncanonicalize * => To, Cc: * * Rulesets control the rewriting of header lines as well as the routing of * the letter itself. Some rulesets are applied automatically to certain * addresses in the letter. The diagram above shows which rulesets are applied * sequentially; a ruleset may also recursively call another as a subroutine. * * The paths correspond to the three parts of a letter: the envelope, * the header, and the body. * * The envelope is never seen by the average user. It is the argv[] * passed to sendmail (or uux or /bin/mail) when these processes are * invoked. Thus the envelope changes as transport agents pass the * letter from one process to the next. In our diagram above, the * envelope is "sendmail -f FROM RCPT". * * In the case of SMTP (where there is no argv[]), the envelope is * represented by the MAIL FROM: and RCPT TO: commands. Note that the * envelope is NOT derived from the header (the converse is also generally * true). * * When a message is sitting in the sendmail queue, the envelope is kept * in the qf* file (the header & body reside in the df* file [but as an * optimization, sendmail keeps a second copy of the header in the qf* * file to avoid the overhead of reparsing]). In our diagram above, * the envelope is "sendmail -f FROM RCPT". * * It is the envelope - not the header - which actually directs the flow * and disposition of the letter. You are free to do whatever you wish * to the envelope in order to make it comprehensible to the next transport * or delivery agent. Envelope addresses are typically expressed at a * transport addressing level ("user@host" or "host!user" or just "user"). * Smail is an exception; it can receive user@domain and maps it to * a simple transport address comprehensible to other transport agents. * * When sendmail is called in mailer mode (the default mode), it calls * rulesets 3-1-4 and then 3-1-S-4 on the sender's address. It then calls * 3-0-4 on each recipient address in the envelope. This generates, * for each address, two or three things: * - a mailer name for that address * - a name to pass that mailer to make it work * - for non-local mailers only, a host address * * The header and body live together in the actual message itself, * although some user interfaces are smart enough to hide some header * lines from view. The header lines may contain transport addresses * ("user" or "user@host" or "path!user") or they may be domainist -- * they just reflect whatever the user typed in when she created the * letter. It's important to realize that the header lines exist for * cosmetic purposes only -- the mail transport and delivery programs * deliver the letter to the address(es) on the envelope. Headers and * envelopes are like thunder and lightning -- the headers are impressive, * but the envelope does all the work. * * There are opposing views on whether it's moral to edit or tamper with * the header lines. (Certainly one doesn't edit the message body.) * System V doesn't tamper with message headers (not to any significant * degree, anyway). Sendmail does rewrite header lines, as shown in * the diagram above. (There are other header lines which are edited * or inserted by sendmail (e.g. "Received:"), but they're not terribly * relevant here.) * * It's best to keep header munging to a minimum. This is especially true * as the world becomes domainist and addresses are invariant regardless * of your point of view. The counterexample is uucp (actually rmail), * which prepends its hostname to the "From" line (but not "From:"). * The "To:" line also gets munged for uucp mail. * * Some guidelines: * * - We want the ability to get header lines through sendmail * unscathed, but all header addresses are passed through at least * the rules [3] -> [4]. Therefore, this path should be a no-op. * Ruleset [4] should be the inverse of [3]. * * - All permanent header changes should happen in [1] or [2] or * [S] or [R], which are invoked under more controllable circumstances. * * - Ruleset 0 processes the envelope, and so it works at the level * of transport addresses. This is not the place for heavy emphasis * on domain addresses. * * - Mapping from domain addresses to transport addresses should * happen outside of sendmail. This is the job of programs like * smail, which maps domain addresses to uucp transport addresses * (with modest concessions to other transport agents). If sendmail * encounters a domain address in the ENVELOPE, it should generally * pass the message to smail for domain address -> transport address * mapping. * * - There is no real need for the inverse mapping (transport * addresses to domain addresses), except for the special case of * mapping fluke local addresses to their external domain representation * for offsite (actually, out-of-domain) mail. * * - Mail from the .tc subdomain to another Fluke subdomain should have * return addresses of user@tc. * * The rulesets are mnemonically named by the following bindings. */ bind Add_Local_Hostname = ruleset 1; Bangify = ruleset 23; Canonicalize = ruleset 3; Canon_And_Zero = ruleset 29; /* more/BSD won't let you use 30 */ Delete_Tc_Hosts = ruleset 7; Domainify_Name = ruleset 8; Externalize_Fluke_Domain = ruleset 14; Null = ruleset 21; Uncanonicalize = ruleset 4; UUX_From = ruleset 13; Zero = ruleset 0; #if COMMENT Once again I found myself fooling with sendmail and as usual, I couldn't locate my assorted sheets of notes about what debugging flags do what. I decided to bite the bullet and go through and make notes about what all the flags do. It turned out to only take a couple of hours and the results were fairly reasonable, so I thought I'd pass this information along, for what it's worth. I don't think I overlooked any of the debug operations, but accidents will happen. Note that I didn't follow the logic back far enough to note the conditions when a particular debugging action would be executed. For example, -d0.15 prints the configuration table only if the configuration file is read. Bill Mitchell whm@arizona.edu {allegra,cmcl2,ihnp4,noao}!arizona!whm Here's the list. --------------- 0 -- main.c, recipient.c, util.c 0,1 -- don't fork in daemon mode, permit direct mailings to files, programs, and :includes:'s. 0,4 -- print names for this host 0,15 -- print configuration table 0,44 -- printav() -- prints addresses of elements 1 -- main.c, envelope.c 1,1 -- main() -- prints From person 2 -- main.c 2,1 -- finis() -- print exit status and envelope flags 5 -- clock.c 5,4 -- print calls to tick 5,5 -- print set/clrevent args 5,6 -- prints event queue on each tick 6 -- savemail.c 6,1 -- print savemail() error mode and return-to-sender information 6,5 -- trace states in savemail() state machine 8 -- domain.c 8,1 -- print various information regarding resolver operations 10,11,13 -- deliver.c 10,1 -- print various address information 11,1 -- print openmailer() args 13,1 -- sendall() -- print all addresses being sent to 13,3 -- sendall() -- prints each addr in loop looking for failures 13,4 -- sendall() -- follows above, printing who gets the error 15,16 -- daemon.c 15,1 -- print port and socket numbers in getrequests() 15,2 -- getrequests -- note forking/returning 15,15 -- activate network debugging on daemon socket 16,1 -- makeconnection() -- print host, addr, socket 16,15 -- print network debugging on daemon socket 18 -- usersmtp.c 18,1 -- note openmailer failure, note entry to reply, print smtpmessage() args 20 -- parseaddr.c 20,1 -- print parseaddr() arg and result 21 -- parseaddr.c 21,2 -- print rewrite() arg and result 21,3 -- note ruleset subroutine call 21,4 -- rewritten as ... 21,10 -- note rule failure 21,12 -- note rule attempt and success 21,15 -- print replacement string in hex chars (?) 21,35 -- print elements in pattern and subject 25 -- recipient.c 25,1 -- print sendto() arguments 26,1 -- print recipient in recipient() and duplicate suppression 27 -- alias.c 27,1 -- print arg to alias(), print info about alias, note failure to open alias file, print arg to forward() 30 -- collect.c 30,1 -- note EOH 30,2 -- print eatfrom arg 30,3 -- note addition of Apparently-To 31,32,33,14 -- headers.c 31,6 -- print chompheader argument 32,1 -- print collected header 33,1 -- print crackaddr arg and return value 14,2 -- print headers being commaized(?) 35 -- macro.c 35,9 -- print define() args 35,24 -- print expand() arg and return value 36 -- stab.c 36,5 -- print stab args, sym found/not found, entered 36,9 -- print hfunc value 37 -- readcf.c 37,1 -- print info re option setting/values 40,41,7,51 -- queue.c 40,1 -- note queue insertion and print queue contents 40,4 -- show queue file contents 41,2 -- note open failure on cf file. 7,1 -- print info on envelope assigned to queue file 7,2 -- print selected queue file name 51,4 -- don't unlink x file 45 -- envelope.c 45,1 -- print setsender argument 50 -- envelope.c 50,1 -- print dropenvelope argument 52 -- main.c 52,1 -- print i/o fd's for tty disconnection 52,5 -- don't disconnect --------------- Article 1634 of net.mail: From: jim@cs.hw.AC.UK (Jim Crammond) Newsgroups: net.mail Subject: sendmail changes in 5.45 Date: 30 Jul 86 19:00:13 GMT Organization: Computer Science, Heriot-Watt U., Scotland I've noticed a modification to 4.3bsd's sendmail (version 5.45), and also SUN 3.0 sendmail, which I consider to be a mistake. This concerns the rulesets which the user part of the resolved transport address goes through after returning from ruleset 0. To clarify: 4.12 and 4.40 did this :- address ---> [3]->-[0] ---> { mailer, host, user } | `---> [4] --> 5.45 does this :- address ---> [3]->-[0] ---> { mailer, host, user } | `---> [2]->-[R]->-[4] --> ("R" is the header recipient ruleset) I consider it a mistake to make the assumption that transport addresses have to be in the same format as header addresses; for example, uucp mail should (!) use bang form transport addresses (i.e. in the rmail command line) whilst using RFC822 style addresses in the headers. [ I think UK-sendmail configuration is about the only one that really does this - hence our problem ] Comments, anyone? Is there any chance of getting this "undone"? p.s. The offending code is in buildaddr()/parseaddr.c and is, as far as I am aware, still completely undocumented. -- ------------- -Jim Crammond JANET: jim@uk.ac.hw.cs ARPA: jim@cs.hw.ac.uk UUCP: ..!ukc!cs.hw.ac.uk!jim From: mark@cbosgd.ATT.COM (Mark Horton) Subject: Re: Do you rewrite "From:" lines? Date: 28 Sep 86 15:09:54 GMT Organization: AT&T Bell Laboratories, Columbus, Oh In article <358@tc-jeff.fluke.UUCP> jeff@fluke.UUCP (Jeff Stearns) writes: >Now that I've installed smail, a nasty question comes up. Should our sendmail >modify "From:" lines? If you have a legal 822 domain From: line, and you modify it by sticking your host name on the front, you violate RFC976 and 822, and your mailer is badly broken. Unfortunately, this applies to most 4.2BSD hosts, and I think the bug is still in 4.3. I think the problem is related to the fact that the From_ and From: lines are tied together by sendmail, and it's very hard to prepend to one without breaking the other. One solution is to install smail, which won't involve sendmail in ordinary pass-through mail. If you have a From: line written with bangs, then there are no standards that apply, and you can just about do what you please with it. But note that it's pointless to stick your host name on the front of a bang path in a From: line to help out a reply command. This doesn't work unless EVERY HOP along the way sticks its name in, and since System III and V don't modify the From: line, the message need go only through one System V hop to render the From: line meaningless. Berkeley long ago gave up on expecting this to work - domains are a much better solution to the reply problem. >But all of our neighbors tack their sitename onto the "From:" line as well! >There's gonna be lots of unreplyable(?) mail if we start playing by The Right >Rules while everybody else is Doing What The Majority Does. (Recall that >4.2BSD /usr/ucb/mail sends replies to the "From:" address, not the "From" >address.) > >What's a mother to do? That mail is generally unreplyable anyway, so you really aren't breaking anything by stopping an unsavory practice. Mark #endif COMMENT -- Jeff Stearns Domain: jeff@tc.fluke.COM Voice: +1 206 356 5064 UUCP: {uw-beaver,decvax!microsof,ucbvax!lbl-csam,allegra,sun}!fluke!jeff Snail: John Fluke Mfg. Co. / P.O. Box C9090 / Everett WA 98206