[comp.mail.sendmail] sendmail parsing questions: "%"

parmelee@wayback.cs.cornell.edu (Larry Parmelee) (04/25/89)

In my opinion, the problem with "%" when used as an operator in
a mail address is that its meaning is not standardized anywhere,
to the best of my knowledge.  As long as this is the case, I will
avoid both using it and encouraging its use, which usually means
using the RFC822 "route-addr" form instead, inspite of its problems.

As background for your thinking:

Thanks to RFCs 822 and 976, the mail address operators "!" and
"@" have a defined meaning and precedence in relation to each
other:  at least for internet hosts, "@" has the highest
precendence.  Where does "%" fit in relative to "@" and "!"?  To
make a concrete example, given an address like:

			a!b%c@d
	
What path should the mail travel?  Assuming that "a", "b", "c", and "d"
are suitable single host or user names,  your answer should be of the
form:  The mail should first be delivered to host _?_, then to host _?_,
finally to host _?_, where it should be delivered to user _?_.

There are 24 different orderings of "a", "b", "c", and "d", but
probably you will think one of these three is the right one:

	1)	-> host a -> host d -> host c -> user b.
	2)	-> host d -> host a -> host c -> user b.
	3)	-> host d -> host c -> host a -> user b.

If you try to use an address like "a!b%c@d", and discover that the
mail is undeliverable, and you suspect the problem is that one of
your neighbors is doing something wrong, how do you get the problem
resolved?

-Larry Parmelee
parmelee@cs.cornell.edu

dhesi@bsu-cs.bsu.edu (Rahul Dhesi) (04/25/89)

In article <27172@cornell.UUCP> parmelee@wayback.cs.cornell.edu (Larry
Parmelee) writes:
>			a!b%c@d
>	
>What path should the mail travel?

This points out an omission in the Internet RFCs.  There is no
provision for a bracketing syntax, e.g.

     a!<b%c>@d
     <a!b%c>@d
     a!<b%c@d>

We are stuck for the moment with a horrible design decision.  I hope
X.400 is better.
-- 
Rahul Dhesi <dhesi@bsu-cs.bsu.edu>
UUCP:    ...!{iuvax,pur-ee}!bsu-cs!dhesi

diamant@hpfclp.SDE.HP.COM (John Diamant) (04/25/89)

> > That is not a legal RFC822 address, please consider using something
> > like "user%machine1@machine2.dom" instead.
> 
> That is not a legal RFC822 address, either.  Please never recommend the
> use of '%' in addresses.  Please consider using something like

Actually, it is perfectly legal RFC822.  "user%machine1" is local-part and is
subject to interpretation by machine2.dom.  It's meaning is not specified in
RFC822 (except that it must be intrepreted by machine2.dom and no one else),
but it is legal and as long as machine2.dom consistently interprets "%" as a
routing character, there is no problem.

> 	@machine2.dom:user@machine1

> Unfortunately, this is not quite legal either, is it?  I seem to recall
> that route-addrs need an accompanying phrase and should be in <>'s, as in:
> 
> 	Mark Sirota <@relay.cs.net:msir@cc.rochester.edu>

Right.  The angle brackets are required for source routes.


John Diamant
Software Engineering Systems Division
Hewlett-Packard Co.		ARPA Internet: diamant@hpfclp.sde.hp.com
Fort Collins, CO		UUCP:  {hplabs,hpfcla}!hpfclp!diamant

rang@cpsin3.cps.msu.edu (Anton Rang) (04/25/89)

Just to add my own thoughts on this matter..."@" identifies a host to
send the current message to.  Everything to the left (whether there is
a "%" or not) should be sent to that host (ignoring the UUCP "!"
problem).
  If the destination host wants to handle "%", that's fine.  The way
that we (and most other sites) seem to interpret it is:

  Received mail address is "user%percent-host@at-host"...
  becomes outgoing mail "user@percent-host" from the at-host.

This is useful for local hosts when dealing with "dumb" mailers.  For
instance, my local mail address is "rang@cpsin3.cps.msu.edu".  This
isn't registered in the host tables (though it is in the domain
system).  On systems without host tables, they can send to
"rang%cpsin3@cpswh.cps.msu.edu" or "rang%cpsin3.cps.msu.edu@cpswh.cps.msu.edu"
(for instance)--cpswh is our registered host.
  I believe that most of the relay sites (uunet, relay.cs.net) treat
these types of addresses the same way.  "!" is a whole other matter....

+---------------------------+------------------------+---------------------+
| Anton Rang (grad student) | "VMS Forever!"         | rec.music.newage is |
| Michigan State University | rang@cpswh.cps.msu.edu | under discussion... |
+---------------------------+------------------------+---------------------+

blarson@skat.usc.edu (Bob Larson) (04/25/89)

In article <27172@cornell.UUCP> parmelee@wayback.cs.cornell.edu (Larry Parmelee) writes:
>	1)	-> host a -> host d -> host c -> user b.
>	2)	-> host d -> host a -> host c -> user b.
>	3)	-> host d -> host c -> host a -> user b.

	4)	-> host d -> user a!b%c

RFC 822 is not at all ambiguos about this.  I consider RFC 976 an
explanation of how some non-rfc 822 mailers do things at best.  RFC
976 adds to the confusion by giving more credibility of the use of !
other than in the local part of the address.  Use of % as a routing
character has never been anything but a hack, as the csnet people who
started it admit, and the need for it will disappear when MX records
become universal.

-- 
Bob Larson	Arpa: Blarson@Ecla.Usc.Edu	blarson@skat.usc.edu
Uucp: {sdcrdcf,cit-vax}!oberon!skat!blarson
Prime mailing list:	info-prime-request%ais1@ecla.usc.edu
			oberon!ais1!info-prime-request

cfe+@andrew.cmu.edu (Craig F. Everhart) (04/25/89)

> *Excerpts from ext.nn.comp.mail.sendmail: 24-Apr-89 Re: sendmail parsing*
> *questi.. Larry Parmelee@wayback.c (1466)*
> Thanks to RFCs 822 and 976, the mail address operators "!" and
> "@" have a defined meaning and precedence in relation to each
> other:  at least for internet hosts, "@" has the highest
> precendence.  Where does "%" fit in relative to "@" and "!"?
You never have to worry about the ``%'' operator unless there are no other
operators in the address.  The hard problem is the relative (and probably
source-dependent) precedence of ``@'' and ``!''; ``%'' always has the lowest
precedence, since no transport mechanism other than the final one is supposed to
look at it.

diamant@hpfclp.SDE.HP.COM (John Diamant) (04/26/89)

> To make a concrete example, given an address like:
> 
> 			a!b%c@d
> 	
> 	1)	-> host a -> host d -> host c -> user b.
> 	2)	-> host d -> host a -> host c -> user b.
> 	3)	-> host d -> host c -> host a -> user b.

If all hosts are RFC976 compliant, the answer is 2.  This is because all
defined routing characters must take precedence over any character that is 
not defined by the standard.  "@" and "!" are defined, and "%" is not; thus,
"%" must be lower than "@" and "!."  So far, so good.  The real problem comes
about that only one of the hosts is actually a gateway between internet
and UUCP and thus is required to be RFC976 compliant.  Choice 1 above is
illegal, because a is acting as a gateway between UUCP and Internet and it
is not complying to RFC976.  The problem is distinguishing between 2 and 3.
If d is not in fact a UUCP gateway, then it isn't required to be 976
compliant, and it may thus give "%" precedence over "!" (not even knowing
what "!" is).  In this case, you'd get choice 3.

So, in a nutshell, if everyone could be guaranteed to be 976 complaint,
then the answer is unambiguously 2.  Otherwise (as the real world is), it is
ambiguous between 2 and 3.


John Diamant
Software Engineering Systems Division
Hewlett-Packard Co.		ARPA Internet: diamant@hpfclp.sde.hp.com
Fort Collins, CO		UUCP:  {hplabs,hpfcla}!hpfclp!diamant

barnett@crdgw1.crd.ge.com (Bruce G. Barnett) (04/26/89)

In article <1410011@hpfclp.SDE.HP.COM>, diamant@hpfclp (John Diamant) writes:

>Right.  The angle brackets are required for source routes.

But most sendmail.cf files I have seen include something like:
	R$*<$+>$*       $1$2$3
It may be that the spec says angle brackets are required,
but is that really the case?

--
Bruce G. Barnett	<barnett@crdgw1.ge.com>  a.k.a. <barnett@[192.35.44.4]>
			uunet!steinmetz!barnett, <barnett@steinmetz.ge.com>

steve@nuchat.UUCP (Steve Nuchia) (04/27/89)

In article <237@crdgw1.crd.ge.com> barnett@crdgw1.crd.ge.com (Bruce G. Barnett) writes:
>In article <1410011@hpfclp.SDE.HP.COM>, diamant@hpfclp (John Diamant) writes:
>>Right.  The angle brackets are required for source routes.
>But most sendmail.cf files I have seen include something like:
>	R$*<$+>$*       $1$2$3
>It may be that the spec says angle brackets are required,
>but is that really the case?

The angle brackets you see inside sendmail rulesets are put there
to "focus" the address by the rules themselves.  This is not a
requirement but a convention.

The hard-wired address parsing logic that invokes the rules takes
care of the syntactic angle brackets -- the rules never see brackets
in their input.  Any comments associated with an address, ie
	Comm Ent <foo@bar>
or	foo@bar (Comm Ent)

are stripped off before calling the rules and added back to
what the rules return when they are done.  If you see something
like
	R$*<$*>$*	$2		basic RFC822 parsing
near the beginning of S3 it is a no-op.
-- 
Steve Nuchia	      South Coast Computing Services
uunet!nuchat!steve    POB 890952  Houston, Texas  77289
(713) 964 2462	      Consultation & Systems, Support for PD Software.

cfe+@andrew.cmu.edu (Craig F. Everhart) (04/27/89)

> *Excerpts from ext.nn.comp.mail.sendmail: 26-Apr-89 Re: sendmail parsing*
> *questi.. Bruce G. Barnett@crdgw1. (446)*
> But most sendmail.cf files I have seen include something like:
>       R$*<$+>$*       $1$2$3
> It may be that the spec says angle brackets are required,
> but is that really the case?
Most sendmail.cf use the ``<>'' characters to bracket the name of the
destination machine, strictly as a means of making the string processing
possible.  This use of ``<>'' is totally independent of the use of ``<>'' in
surrounding source-routes.  (Check the crackaddr() routine for confirmation.)
When you see a rule such as you quote, you can be sure that the address being
manipulated isn't a full RFC822 address, but is the essential (rfc821) form.
Thus, even if the To: field of a message were
        To: Joe <@host1,@host2:joe@host3>
the text that would be sent to the rule above would be simply
        @host1,@host2:joe@host3
and the job of Sendmail's rule 3 is usually to turn that into
        <@host1>:@host2:joe@host3
so that rule 0 can look at the bracketed host name to figure out where to send
it.

None of this has anything to do with source-route brackets, which should be
being added back to the source-route (if it is a source-route) when actually
being put back into a header or used in an SMTP conversation.

                Craig

steve@nuchat.UUCP (Steve Nuchia) (04/29/89)

That's me, the 1989 hoof-in-mouth disease poster child |
						       V
In article <7306@nuchat.UUCP> steve@nuchat.UUCP (Steve Nuchia) writes:
>The hard-wired address parsing logic that invokes the rules takes
>care of the syntactic angle brackets -- the rules never see brackets
>in their input.  Any comments associated with an address, ie

>	Comm Ent <foo@bar>
>or	foo@bar (Comm Ent)

>are stripped off before calling the rules and added back to
>what the rules return when they are done.  If you see something


That is only half right.  The hard logic sees and remembers the
bracketing and comments but passes the whole address to the
rewrite logic.  It then wraps whatever gets returned in the
remembered text.  My sendmail today greeted me with an
address of the form:

	First Last <First Last First Last First Last foo@bar>

proving conclusively that I didn't know what I was doing.

In the above case my S3 was getting "First Last <foo@bar>"
and returning, ultimately, "First Last foo@bar".  This
got the remembered text added back: "First Last <First Last foo@bar>".
A few relays later it was a real mess :-(

So, ignore me, leave the "basic RFC822 parsing" line at the
beginning of S3, and make sure you don't call it from another
rule without defocusing the address first.
-- 
Steve Nuchia	      South Coast Computing Services
uunet!nuchat!steve    POB 890952  Houston, Texas  77289
(713) 964 2462	      Consultation & Systems, Support for PD Software.

barnett@crdgw1.crd.ge.com (Bruce G. Barnett) (04/29/89)

In article <237@crdgw1.crd.ge.com> I  (Bruce G. Barnett) wrote:
>	R$*<$+>$*       $1$2$3
I meant to write
	R$*<$+>$*	$2

Sigh.

Steve@nuchat.UUCP says:
> If you see something
>like
>	R$*<$*>$*	$2		basic RFC822 parsing
>near the beginning of S3 it is a no-op.

I had no idea. no WONDER people don't understand sendmail files!
Look at the sendmail file distributed in Ultrix 3.0 in ruleset 3:

R$*<$*<$*<$+>$*>$*>$*	$4			3-level <> nesting
R$*<$*<$+>$*>$*		$3			2-level <> nesting
R$*<$+>$*		$2			basic RFC821/822 parsing

So these lines are complete garbage?
There must be some reason why they are there.
Perhaps earlier versions were different?
Or is this to catch bad rules?

--
Bruce G. Barnett	<barnett@crdgw1.ge.com>  a.k.a. <barnett@[192.35.44.4]>
			uunet!steinmetz!barnett, <barnett@steinmetz.ge.com>