[comp.mail.misc] Idea for changes to sendmail rewriting rules

edward@csvaxa.UUCP (Edward Wilkinson) (01/30/88)

Disclaimer: please excuse me  if this idea seems   silly - I'm  only a
beginner at sendmail hacking :-)

After getting rather confused  (as  many seem  to do)   fiddling  with
sendmail's rewriting rules, I tried to think of a way to improve them.
What  follows  are  a few   preliminary ideas which   I  hope  will be
discussed, critisized, improved & hopefully not ignored!

<1> My current problem is that I can't work  out exactly which sets of
rewriting rules get applied to which headers.  How  about having a set
of rules for each header. e.g.

From(1):-

	rule#1
	rule#2
	etc,etc

The (1) is a number similar to the current setup, so that you can call
this ruleset as  a  `subroutine'  from  elsewhere if  necessary. There
would  be  a set of  these  rules for  the  header  & another for  the
envelope.   If   no ruleset  appears    for  a particular  header,  no
transformation   is applied.

<2> There  could be a  couple of reserved rulesets, such  as Initial &
Final which would  respectively be applied  at  the start of  each and
every address manipulation.

<3> Lastly, there could be general sets of rules which could be called
from  all over  the place to  do common sets of transformations. These
would be  like the  current rulesets which  get called  from different
places.

I don't think that these  ideas would be hard  to implement, but would
make sendmail's configuration file a little easier to read, understand
and therefore  modify.

Any and  all discussion on  these ideas is  welcomed  and if  I'm just
being completely  ridiculous, PLEASE someone  tell  me! I suppose this
article comes from the fact that I can't understand sendmail as it is.


-- 
Ed Wilkinson @ Computer Centre, Massey University, Palmerston North, NZ
uucp: ...!uunet!vuwcomp!{massey, csvaxa}!edward       DTE: 530163000005
Greybook: E.Wilkinson@nz.ac.massey            Phone: +64 63 69099 x8587
CSNET/ACSnet/Internet: E.Wilkinson@massey.ac.nz    New Zealand = GMT+12

jeff@tc.fluke.COM (Jeff Stearns) (02/10/88)

In article <180@csvaxa.UUCP> E.Wilkinson@massey.ac.nz writes:

>Disclaimer: please excuse me  if this idea seems   silly - I'm  only a
>beginner at sendmail hacking :-)
>
>After getting rather confused  (as  many seem  to do)   fiddling  with
>sendmail's rewriting rules, I tried to think of a way to improve them.
>What  follows  are  a few   preliminary ideas which   I  hope  will be
>discussed, critisized, improved & hopefully not ignored!
>
><1> My current problem is that I can't work  out exactly which sets of
>rewriting rules get applied to which headers.  How  about having a set
>of rules for each header. e.g.
>

First, I urge you to acquire the ``ease'' translator for sendmail config files;
it's a very useful tool for sendmail.cf hacking.  It has been posted to the
sources newsgroup and is available via the archiving mechanisms described
in that group.  (In fact, I wouldn't be surprised to see news of it posted
there again soon.)
The ``ease'' translator was written by:
    James S. Schoner
    Mathematical Sciences Building, Office 204
    Purdue University
    West Lafayette, Indiana  47907
    jss@purdue-asc.ARPA

In the meantime, here is some sendmail information I've collected over the
years; I think you may find it useful for getting the Big Picture.  (These
comments are excerpted from our custom-written sendmail.cf file.  It includes
information posted by others over the past couple of years.  Thank you all.)

	Jeff Stearns			jeff@tc.fluke.COM
	John Fluke Mfg. Co, Inc.	(206) 356-5064

/*
 *
 *  Some words about the following data paths and how they are CODED INTO
 *  SENDMAIL:
 *
 *                   (Sun release 3.2 paths)
 *		+-> 3 -> 1 -> 4 -> ${m_saddr}
 *              |
 *              +-> 3 -> 0 -> {mailer, host, user}
 *              |                             |
 *              |                             `--> 2 -> R -> 4 -> ${m_sreladdr}
 *              |				 (N.B. "R" here is not a typo!)
 *              |
 *              |
 *              |    (4.xBSD and Sun release 2.x path?)
 *		+--> 3 -> 1 -> 4 --+--> 3 -> 1 -> S -> 4 -> ${m_sreladdr}
 *		|                  |
 *		|                  `--> ${m_saddr}
 *		|
 *		|
 *		|
 *		|
 *		|
 *		|     .---> 3 -> 0 -> {mailer, host, user}
 *		|     |			        |     |
 *		|     |			   ${m_rhost} |
 *		|     |			              |
 *		|     |			              |
 *		|     |			              |
 *		|     |			              |
 *		|     |			              |
 *		|     |		         (4.3BSD)     |
 *		|     |	       +------ 4 <- R <- 2 <--+
 *		|     |	       |    	              |
 *		|     |	       |    	 (4.2BSD)     |
 *		|     |	       +---------------- 2 <--+
 *		|     |	       |    	              
 *		|     |	       v    	             
 *		|     |	if (mailer == local)
 *		|     | then {expand aliases}
 *		|     |        |                  
 *		|     |        |                 
 *		|     |        `--> 3 -> 0 -> {mailer, host, user}
 *		|     |                                       |
 *		|     |                                       `-> 2 -> R -> 4 -> ${m_ruser}
 *              |     |
 * sendmail -f FROM  RCPT ...
 *  ______________________                             ______________________
 * | From: sender ----------> 3 -> D -> 1 -> S -> 4 ---> From: sender        |
 * |   To: recipient -------> 3 ------> 2 -> R -> 4 -----> To: recipient     |
 * |   Cc: cc-recipient ----> 3 ------> 2 -> R -> 4 -----> Cc: cc-recipient  |
 * |	                  |                           |                      |
 * | .................... |                           | .................... |
 * | ... message body ... | ------------------------> | ... message body ... |
 * | .................... |                           | .................... |
 * |______________________|                           |______________________|
 *
 *
 *  A sendmail configuration file is similar to a giant sed script; it contains
 *  sets of regular expressions called rulesets.  Rulesets have integer names.
 *  (Rulesets named "R" and "S" above are notational artifacts which
 *  represent rulesets whose numerical name may vary by context; the exact
 *  numerical value is not important here.  Ruleset "D" is the addition of
 *  "@domain" to the sender's address iff the C flag is set in the mailer
 *  definition corresponding to the *sending* mailer.  See the Sendmail
 *  Installation and Operation Guide.)  In this file, we will take advantage
 *  of the ease compiler's ability to bind mnemonic names to rulesets.
 *
 *  Expanding the ruleset numbers into their mnemonic names gives us a clearer
 *  idea of the address transformations and macro definitions as they occur at
 *  our site:
 *
 * Envelope FROM
 *	-> Canonicalize
 *	-> Add_Local_Hostname
 *	-> Uncanonicalize
 *	=> ${m_saddr}
 *
 * Envelope FROM
 *	-> Canonicalize
 *	-> Add_Local_Hostname
 *	-> Uncanonicalize
 *	-> Canonicalize
 *	-> Add_Local_Hostname
 *	-> Delete_Tc_Hosts / Externalize_Fluke_Domain / Null
 *	-> Uncanonicalize
 *	=> ${m_sreladdr}
 *
 * Envelope RCPT
 *	-> Canonicalize
 *	-> Zero
 *	=> ${m_rhost}
 *
 * Envelope RCPT
 *	-> Canonicalize
 *	-> Zero
 *	-> Null
 *	-> Delete_Tc_Hosts / Null
 *	-> Uncanonicalize
 *	=> ${m_ruser}
 *
 * Envelope FROM
 *	-> Canonicalize
 *	-> Add_Local_Hostname
 *	-> Uncanonicalize
 *	-> Canonicalize
 *	-> Add_Local_Hostname
 *	-> Delete_Tc_Hosts / Externalize_Fluke_Domain / Null
 *	-> Uncanonicalize
 *	=> ${m_sreladdr}
 *
 * From:
 *	-> Canonicalize
 *	-> "@domain"  (optional; probably NOT done)
 *	-> Add_Local_Hostname
 *	-> Delete_Tc_Hosts / Externalize_Fluke_Domain / Null
 *	-> Uncanonicalize
 *	=> From:
 *
 * To, Cc:
 *	-> Canonicalize
 *	-> Delete_Tc_Hosts / Null
 *	-> Uncanonicalize
 *	=> To, Cc:
 *
 *  Rulesets control the rewriting of header lines as well as the routing of
 *  the letter itself.  Some rulesets are applied automatically to certain
 *  addresses in the letter.  The diagram above shows which rulesets are applied
 *  sequentially; a ruleset may also recursively call another as a subroutine.
 *
 *  The paths correspond to the three parts of a letter: the envelope,
 *  the header, and the body.
 *
 *  The envelope is never seen by the average user.  It is the argv[]
 *  passed to sendmail (or uux or /bin/mail) when these processes are
 *  invoked.  Thus the envelope changes as transport agents pass the
 *  letter from one process to the next.  In our diagram above, the
 *  envelope is "sendmail -f FROM RCPT".
 *
 *  In the case of SMTP (where there is no argv[]), the envelope is
 *  represented by the MAIL FROM: and RCPT TO: commands.  Note that the
 *  envelope is NOT derived from the header (the converse is also generally
 *  true).
 *
 *  When a message is sitting in the sendmail queue, the envelope is kept
 *  in the qf* file (the header & body reside in the df* file [but as an
 *  optimization, sendmail keeps a second copy of the header in the qf*
 *  file to avoid the overhead of reparsing]).  In our diagram above,
 *  the envelope is "sendmail -f FROM RCPT".
 *
 *  It is the envelope - not the header - which actually directs the flow
 *  and disposition of the letter.  You are free to do whatever you wish
 *  to the envelope in order to make it comprehensible to the next transport
 *  or delivery agent.  Envelope addresses are typically expressed at a
 *  transport addressing level ("user@host" or "host!user" or just "user").
 *  Smail is an exception; it can receive user@domain and maps it to
 *  a simple transport address comprehensible to other transport agents.
 *
 *  When sendmail is called in mailer mode (the default mode), it calls
 *  rulesets 3-1-4 and then 3-1-S-4 on the sender's address.  It then calls
 *  3-0-4 on each recipient address in the envelope.  This generates,
 *  for each address, two or three things:
 *     - a mailer name for that address
 *     - a name to pass that mailer to make it work
 *     - for non-local mailers only, a host address
 *
 *  The header and body live together in the actual message itself,
 *  although some user interfaces are smart enough to hide some header
 *  lines from view.  The header lines may contain transport addresses
 *  ("user" or "user@host" or "path!user") or they may be domainist --
 *  they just reflect whatever the user typed in when she created the
 *  letter.  It's important to realize that the header lines exist for
 *  cosmetic purposes only -- the mail transport and delivery programs
 *  deliver the letter to the address(es) on the envelope.  Headers and
 *  envelopes are like thunder and lightning -- the headers are impressive,
 *  but the envelope does all the work.
 *
 *  There are opposing views on whether it's moral to edit or tamper with
 *  the header lines.  (Certainly one doesn't edit the message body.)
 *  System V doesn't tamper with message headers (not to any significant
 *  degree, anyway).  Sendmail does rewrite header lines, as shown in
 *  the diagram above.  (There are other header lines which are edited
 *  or inserted by sendmail (e.g. "Received:"), but they're not terribly
 *  relevant here.)
 *
 *  It's best to keep header munging to a minimum.  This is especially true
 *  as the world becomes domainist and addresses are invariant regardless
 *  of your point of view.  The counterexample is uucp (actually rmail),
 *  which prepends its hostname to the "From" line (but not "From:").
 *  The "To:" line also gets munged for uucp mail.
 *
 *  Some guidelines:
 *
 *      - We want the ability to get header lines through sendmail
 *        unscathed, but all header addresses are passed through at least
 *	  the rules [3] -> [4].  Therefore, this path should be a no-op.
 *	  Ruleset [4] should be the inverse of [3].
 *
 *      - All permanent header changes should happen in [1] or [2] or
 *        [S] or [R], which are invoked under more controllable circumstances.
 *
 *      - Ruleset 0 processes the envelope, and so it works at the level
 *        of transport addresses.  This is not the place for heavy emphasis
 *        on domain addresses.
 *
 *      - Mapping from domain addresses to transport addresses should
 *        happen outside of sendmail.  This is the job of programs like
 *        smail, which maps domain addresses to uucp transport addresses
 *        (with modest concessions to other transport agents).  If sendmail
 *        encounters a domain address in the ENVELOPE, it should generally
 *        pass the message to smail for domain address -> transport address
 *        mapping.
 *
 *      - There is no real need for the inverse mapping (transport
 *        addresses to domain addresses), except for the special case of
 *	  mapping fluke local addresses to their external domain representation
 *	  for offsite (actually, out-of-domain) mail.
 *
 *	- Mail from the .tc subdomain to another Fluke subdomain should have
 *	  return addresses of user@tc.
 *
 *  The rulesets are mnemonically named by the following bindings.
 */
bind
    Add_Local_Hostname  = ruleset 1;
    Bangify		= ruleset 23;
    Canonicalize	= ruleset 3;
    Canon_And_Zero	= ruleset 29;	/* more/BSD won't let you use 30 */
    Delete_Tc_Hosts	= ruleset 7;
    Domainify_Name	= ruleset 8;
    Externalize_Fluke_Domain = ruleset 14;
    Null		= ruleset 21;
    Uncanonicalize	= ruleset 4;
    UUX_From		= ruleset 13;
    Zero		= ruleset 0;



#if COMMENT
Once again I found myself fooling with sendmail and as usual, I couldn't
locate my assorted sheets of notes about what debugging flags do what.
I decided to bite the bullet and go through and make notes about what
all the flags do.  It turned out to only take a couple of hours and the
results were fairly reasonable, so I thought I'd pass this information
along, for what it's worth.  I don't think I overlooked any of the debug
operations, but accidents will happen.

Note that I didn't follow the logic back far enough to note the conditions
when a particular debugging action would be executed.  For example, -d0.15
prints the configuration table only if the configuration file is read.

				Bill Mitchell
				whm@arizona.edu
				{allegra,cmcl2,ihnp4,noao}!arizona!whm

Here's the list.
---------------
 0 -- main.c, recipient.c, util.c
        0,1  -- don't fork in daemon mode, permit direct mailings to files,
                 programs, and :includes:'s.
        0,4  -- print names for this host
        0,15 -- print configuration table
        0,44 -- printav() -- prints addresses of elements

 1 -- main.c, envelope.c
        1,1 -- main() -- prints From person

 2 -- main.c
        2,1 -- finis() -- print exit status and envelope flags

 5 -- clock.c
        5,4 -- print calls to tick
        5,5 -- print set/clrevent args
        5,6 -- prints event queue on each tick

 6 -- savemail.c
        6,1 -- print savemail() error mode and return-to-sender information
        6,5 -- trace states in savemail() state machine

 8 -- domain.c
        8,1 -- print various information regarding resolver operations

10,11,13 -- deliver.c
        10,1 -- print various address information
        11,1 -- print openmailer() args
        13,1 -- sendall() -- print all addresses being sent to
        13,3 -- sendall() -- prints each addr in loop looking for failures
        13,4 -- sendall() --  follows above, printing who gets the error

15,16 -- daemon.c
        15,1  -- print port and socket numbers in getrequests()
        15,2  -- getrequests -- note forking/returning
        15,15 -- activate network debugging on daemon socket
        16,1  -- makeconnection() -- print host, addr, socket
        16,15 -- print network debugging on daemon socket

18 -- usersmtp.c
        18,1 -- note openmailer failure, note entry to reply,
                 print smtpmessage() args

20 -- parseaddr.c
        20,1 -- print parseaddr() arg and result
 
21 -- parseaddr.c
        21,2  -- print rewrite() arg and result
        21,3  -- note ruleset subroutine call
        21,4  -- rewritten as ...
        21,10 -- note rule failure
        21,12 -- note rule attempt and success
        21,15 -- print replacement string in hex chars (?)
        21,35 -- print elements in pattern and subject

25 -- recipient.c
        25,1  -- print sendto() arguments
        26,1  -- print recipient in recipient() and duplicate suppression

27 -- alias.c
        27,1  -- print arg to alias(), print info about alias, note failure
                  to open alias file, print arg to forward()

30 -- collect.c
        30,1  -- note EOH
        30,2  -- print eatfrom arg
        30,3  -- note addition of Apparently-To
 
31,32,33,14 -- headers.c
        31,6 -- print chompheader argument
        32,1 -- print collected header
        33,1 -- print crackaddr arg and return value
        14,2 -- print headers being commaized(?)
 
35 -- macro.c
        35,9  -- print define() args
        35,24 -- print expand() arg and return value

36 -- stab.c
        36,5 -- print stab args, sym found/not found, entered
        36,9 -- print hfunc value

37 -- readcf.c
        37,1 -- print info re option setting/values

40,41,7,51 -- queue.c
        40,1 -- note queue insertion and print queue contents
        40,4 -- show queue file contents
        41,2 -- note open failure on cf file.
        7,1  -- print info on envelope assigned to queue file
        7,2  -- print selected queue file name
        51,4 -- don't unlink x file

45 -- envelope.c
        45,1  -- print setsender argument
         
50 -- envelope.c
        50,1  -- print dropenvelope argument

52 -- main.c
        52,1 -- print i/o fd's for tty disconnection
        52,5 -- don't disconnect
---------------




Article 1634 of net.mail:
From: jim@cs.hw.AC.UK (Jim Crammond)
Newsgroups: net.mail
Subject: sendmail changes in 5.45
Date: 30 Jul 86 19:00:13 GMT
Organization: Computer Science, Heriot-Watt U., Scotland

I've noticed a modification to 4.3bsd's sendmail (version 5.45),
and also SUN 3.0 sendmail, which I consider to be a mistake.
This concerns the rulesets which the user part of the resolved
transport address goes through after returning from ruleset 0.

To clarify:  4.12 and 4.40 did this :-

        address ---> [3]->-[0] --->  { mailer, host, user }
                                                      |
						      `---> [4] -->

5.45 does this :-

        address ---> [3]->-[0] --->  { mailer, host, user }
                                                      |
						      `---> [2]->-[R]->-[4] -->


("R" is the header recipient ruleset)

I consider it a mistake to make the assumption that transport addresses
have to be in the same format as header addresses; for example, uucp mail
should (!) use bang form transport addresses (i.e. in the rmail command line)
whilst using RFC822 style addresses in the headers.

[ I think UK-sendmail configuration is about the only one that
  really does this - hence our problem ]

Comments, anyone?  Is there any chance of getting this "undone"?

p.s. The offending code is in buildaddr()/parseaddr.c  and is, as far as
     I am aware, still completely undocumented.
-- 
-------------
-Jim Crammond			JANET:	jim@uk.ac.hw.cs
				ARPA:	jim@cs.hw.ac.uk
				UUCP:	..!ukc!cs.hw.ac.uk!jim

From: mark@cbosgd.ATT.COM (Mark Horton)
Subject: Re: Do you rewrite "From:" lines?
Date: 28 Sep 86 15:09:54 GMT
Organization: AT&T Bell Laboratories, Columbus, Oh

In article <358@tc-jeff.fluke.UUCP> jeff@fluke.UUCP (Jeff Stearns) writes:
>Now that I've installed smail, a nasty question comes up.  Should our sendmail
>modify "From:" lines?

If you have a legal 822 domain From: line, and you modify it by sticking
your host name on the front, you violate RFC976 and 822, and your mailer
is badly broken.  Unfortunately, this applies to most 4.2BSD hosts, and
I think the bug is still in 4.3.  I think the problem is related to the
fact that the From_ and From: lines are tied together by sendmail, and
it's very hard to prepend to one without breaking the other.  One solution
is to install smail, which won't involve sendmail in ordinary pass-through
mail.

If you have a From: line written with bangs, then there are no standards
that apply, and you can just about do what you please with it.

But note that it's pointless to stick your host name on the front of a
bang path in a From: line to help out a reply command.  This doesn't
work unless EVERY HOP along the way sticks its name in, and since
System III and V don't modify the From: line, the message need go only
through one System V hop to render the From: line meaningless.  Berkeley
long ago gave up on expecting this to work - domains are a much better
solution to the reply problem.

>But all of our neighbors tack their sitename onto the "From:" line as well!
>There's gonna be lots of unreplyable(?) mail if we start playing by The Right
>Rules while everybody else is Doing What The Majority Does.  (Recall that
>4.2BSD /usr/ucb/mail sends replies to the "From:" address, not the "From"
>address.)
>
>What's a mother to do?

That mail is generally unreplyable anyway, so you really aren't breaking
anything by stopping an unsavory practice.

	Mark
#endif COMMENT
-- 
	    Jeff Stearns
    Domain: jeff@tc.fluke.COM
     Voice: +1 206 356 5064
      UUCP: {uw-beaver,decvax!microsof,ucbvax!lbl-csam,allegra,sun}!fluke!jeff
     Snail: John Fluke Mfg. Co. / P.O. Box C9090 / Everett WA  98206