[comp.mail.sendmail] The algorithm for rewriting rules

nelson@sun.soe.clarkson.edu (Russ Nelson) (05/31/90)

A ruleset is like a subroutine that sendmail calls when it wants an
address transformed.  Sendmail calls rules 0 through 4 (or 6,
depending on your version) by itself.  You can call other rulesets as
subroutines.  Specific mailers can call rulesets for the sender or
recipient.

I can't find the algorithm for the application of the rewriting rules.  As
best I can determine from a source, this is it:

	Apply a rule.
	If it matches, rewrite it and apply the same rule again.
	If it doesn't match, try the next rule.

	If you start a RHS with $:, then after rewriting, apply the NEXT rule.
	If you start a RHS with $@, then after rewriting, exit this ruleset.
	If you start a RHS with $#, then after rewriting, exit this ruleset,
	  and ruleset zero will see the $#, and use that information to deliver
	  the mail using the given mailer.


	The picture given in the documentation that looks roughly like this:


          /----> 0 ----> resolved address
          |
 	  |           /---> 1 ---> S ---\
    	  |           |			|
  --> 3 ----->  D  ---|			|---> 4 ----> Message
		      |			|
                      \---> 2 ---> R ---/


	is terribly misleading.  It should really looks like this:



 ---> 3 -------> 0 ----> resolved address

                      /---> 1 ---> S ---\
    	              |			|
 ---> 3 ----->  D  ---|			|---> 4 ----> Message
		      |			|
                      \---> 2 ---> R ---/


	My documentation is not clear on whether the envelope's recipient
	or the header's To: is used to deliver the message.

I hope this helps someone besides myself.  If anyone can improve on it,
please do.

-- 
--russ (nelson@clutx [.bitnet | .clarkson.edu])  Russ.Nelson@$315.268.6667
Violence never solves problems, it just changes them into more subtle problems

michael@fts1.uucp (Michael Richardson) (06/06/90)

In article <1990May31.152505.28721@sun.soe.clarkson.edu> nelson@clutx.clarkson.edu writes:
> ---> 3 -------> 0 ----> resolved address
>
>                      /---> 1 ---> S ---\
>    	              |			|
> ---> 3 ----->  D  ---|			|---> 4 ----> Message
>		      |			|
>                      \---> 2 ---> R ---/

  My experience with ruleset 3 is that an address like:

   Joe Blow <jb@foo.bar.com>

   is turned into:
   jb<@foo.bar.com>

   In the case of the To:/From: addresses, obviously we want to 
keep the "comment" [striping out () comments and replacing them
is fairly easy] intact, while still rewriting the address portion.
This is confounded by the existance of multiple addresses.
  Does/should the rewriting preserve these comment strings?
Or is it this the reason that (Person' Name) is still used a lot?
(Is that even preserved?)
  The reason I ask rather than look at the source, is that I'd like
to know the theory here, rather than the current implementation.

>	My documentation is not clear on whether the envelope's recipient
>	or the header's To: is used to deliver the message.

  The 'resolved address' is used to deliver the message.

-- 
   :!mcr!:               | Tellement de lettres, si peu de temps.
   Michael Richardson    |  If Meech passes, no one will understand that.
 Play: mcr@julie.UUCP Work: michael@fts1.UUCP Fido: 1:163/109.10 1:163/138
    Amiga----^     - Pay attention only to _MY_ opinions. -   ^--Amiga--^

Lovstrand.EuroPARC@Xerox.COM (Lennart Lovstrand) (06/06/90)

In article <1990Jun5.173609.15672@fts1.uucp> michael@fts1.uucp (Michael
Richardson) writes:
>   My experience with ruleset 3 is that an address like:
> 
>    Joe Blow <jb@foo.bar.com>
> 
>    is turned into:
>    jb<@foo.bar.com>
> 
>    In the case of the To:/From: addresses, obviously we want to 
> keep the "comment" [striping out () comments and replacing them
> is fairly easy] intact, while still rewriting the address portion.
> This is confounded by the existance of multiple addresses.

Don't worry, the comment part of each header address is extracted by
crackaddr() {in headers.c} and then tucked away in the the header structure.

Both header and envelope addresses are independently parsed by prescan()
{in parseaddr.c}, which removes comments but retains angle brackets.  This
is then rewritten by a sequence of calls to rewrite().  The headers are
finally recomposed by splicing the rewritten address back in to its comment
context.

Or in other words (pictures, diagrams, whatever):

#
#  From chapter 42 of:
#  "Sendmail and other mysteries explained in twenty minutes or less"
#
#  Section 11: How addresses /really/ are parsed.
#
#  ENVELOPE ADDRESSES
#  Called from parseaddr() with the raw envelope address as supplied on the
#  command line or given in the SMTP MAIL FROM:/RCPT TO: command.
#
   "bar.baz.org!foo"
=> prescan() =>
   "bar" "." "baz" "." "org" "!" "foo"
=> rewrite(3, 0) =>
   $# "tcp" $@ "baz" "." org" $: "foo" "@" "bar" "." "baz" "." "org"
=> buildaddr() -> rewrite(2, [mailer specific if IDA], 4) =>
   "foo" "@" "bar" "." "baz" "." "org"
=> cataddr() =>
   "foo@bar.baz.org"
#
#  HEADER ADDRESSES
#  Called from putheader() -> commaize() -> remotename() with the value of
#  header fields such as From:, To:, Resent-Reply-To:, etc.
#
  "Jan Foo <foo@bar.baz.org> (SysOp)"
=> crackaddr() =>
  "Jan Foo <$g> (SysOp)"
#
# Note the use of the $g macro
#
# And at the same time (well, almost):
#
  "Jan Foo <foo@bar.baz.org> (SysOp)"
=> prescan() =>
  "<foo@bar.baz.org>"
=> rewrite(3, 1/2 [or 5/6 if IDA], [mailer specific if IDA], 4) =>
  "foo" "@" "bar" "." "baz" "." "org"
=> cataddr() =>
  "foo@bar.baz.org"
#
# Finally, with both results above:
#
=> expand() =>
  "Jan Foo <foo@bar.baz.org> (SysOp)"
#
# And we're back to what we started from, voila!
#

Of course, the addresses would change rather substantially if the source and
destination mailers were on two completely different networks, but I think
you get the gist.  Exactly what you see as the result of rewrite(3, 0) is
also up to grabs: standard UCB sendmail will enclose the immediate
destination address in angle brackets; IDA sendmail will append a dot after
the atsign and make sure that there is nothing after the domain.

--
--Lennart <Lovstrand.EuroPARC@Xerox.COM>		R   _A _  N_   K
Rank Xerox EuroPARC, 61 Regent St			\/ |_ |_) | | \/
Cambridge, CB2 1AB, United Kingdom			/\ |_ | \ |_| /\
QOTW: "Missing keyboard; hit F1 to continue."   	E u r o  P A R C

nelson@sun.soe.clarkson.edu (Russ Nelson) (06/07/90)

In article <1990Jun5.173609.15672@fts1.uucp> michael@fts1.uucp (Michael Richardson) writes:

      In the case of the To:/From: addresses, obviously we want to 
   keep the "comment" [striping out () comments and replacing them
   is fairly easy] intact, while still rewriting the address portion.
   This is confounded by the existance of multiple addresses.
     Does/should the rewriting preserve these comment strings?
   Or is it this the reason that (Person' Name) is still used a lot?
   (Is that even preserved?)

As far as I can tell from reading the source, the person's name isn't
even passed to the rewriting rules.  Sendmail fetches the address,
strips out the comments, passes it to the rewriting rules, and then
pastes it back from whence it came.

Having said this, I've taken out the "cruft" rule that I referred
to earlier, and now I see addresses of the form:

	Russ.Nelson.nelson@ftp.ecs.clarkson.edu

from which I would guess that sendmail *doesn't* strip out all the
comments.  However, it *does* preserve the comments.

--
--russ (nelson@clutx [.bitnet | .clarkson.edu])  Russ.Nelson@$315.268.6667
Violence never solves problems, it just changes them into more subtle problems

Lovstrand.EuroPARC@Xerox.COM (Lennart Lovstrand) (06/07/90)

In article <423@roo.UUCP>, I wrote:
] Both header and envelope addresses are independently parsed by prescan()
] {in parseaddr.c}, which removes comments but retains angle brackets.
[...]
] #  HEADER ADDRESSES
[...]
]   "Jan Foo <foo@bar.baz.org> (SysOp)"
] => prescan() =>
]   "<foo@bar.baz.org>"
] => rewrite(3, 1/2 [or 5/6 if IDA], [mailer specific if IDA], 4) =>
]   "foo" "@" "bar" "." "baz" "." "org"
] => cataddr() =>
]   "foo@bar.baz.org"

which should have been (changes in caps):

] Both header and envelope addresses are independently parsed by prescan()
] {in parseaddr.c}, which removes PARENTHESIZED comments but retains
] angle brackets AND ANY OTHER TEXT SURROUNDING THEM.
[...]
] #  HEADER ADDRESSES
[...]
]   "Jan Foo <foo@bar.baz.org> (SysOp)"
] => prescan() =>
]   "JAN" "FOO" "<" "FOO" "@" "BAR" "." "BAZ" "." "ORG" ">"
] => rewrite(3, 1/2 [or 5/6 if IDA], [mailer specific if IDA], 4) =>
]   "foo" "@" "bar" "." "baz" "." "org"
] => cataddr() =>
]   "foo@bar.baz.org"

Sorry for the confusion.
--
--Lennart <Lovstrand.EuroPARC@Xerox.COM>		R   _A _  N_   K
Rank Xerox EuroPARC, 61 Regent St			\/ |_ |_) | | \/
Cambridge, CB2 1AB, United Kingdom			/\ |_ | \ |_| /\
QOTW: "Missing keyboard; hit F1 to continue."   	E u r o  P A R C