[net.mail] parsing host1!user@host2 - a new idea

chongo@nsc.UUCP (Landon C. Noll) (10/16/84)

Chuqui has some good points in his article.  To remove confusion, why not
allow '('s?  You could say:

		(host1!user)@host2  -or-  host1!(user@host2)

The '(' is your friend.  (and so is ')')


Always keep in the message, where the message has gone.  If a message hits
a dead/wrong end, DONT opt. the path of the return message.


On another point, I think each site should not just blindly adjust the
path of a message.  Here are some reasons why:

	1) I don't want site foo to receive my letter because the uucp
	   manager just learned how to monitor letters and kills messages
	   which he/she does not like.  (this actually happened in a
	   local S.F. site)  The normal path might be:
			   a!bletch!foo!bar!mojo
	   but I want:
			a!bletch!curds!and!whey!bar!mojo. 

	
	2) There are too sites named foo.  I send a letter to:
			    a!bletch!c!d!e!foo!mojo
	   but bletch changes the path to:
				a!bletch!foo!mojo
	   where e!foo is not the same site as bletch!foo.



I can see a site adjusting a domain directed message path if it is a gateway,
or sending it on to a gateway site if it is not.  The cases where you want
a forced path are few.  Most of the time, mailer path opt. does help.
But there ARE reasons who you might want to force a given path.


I have a suggestion and another use for the '(' in a path.  Let the
'(' guide you on which paths NOT to touch.   For example:

	a!b!c!(x!y!z)!d!e!foo!mojo  where (x!y!z) is a path expression.

We will talk about this path below, so keep it in mind.

Here is some ideas on how to treat this path expression:

	- Deal with the path  c!(expression)!d  as a glued in path and
	  NOT subject to change.  That is,  c  MUST receive the message and
	  pass it along to the leftmost site inside the expression.  The
	  rightmost site MUST send the message to site  d.  In other words,
	  c!x and z!d are 'glued' in.

	- Disallow a site to change the path to the right of any expression.
	  That is, site  b  can not change the  d!e!foo!mojo path because
	  it is to the right of the path expression.

	- Allow any site within an expression to ONLY change the path inside
	  that expression.  That is,  x  can send it along to  z  and bypass
	  site  y  if it wants to, but dont allow  x  to send it to  d  or  e.

	- Pull the left '('s along until you reach a right ')'.  The life
	  of the path could go like:

	  a!b!c!(x!y!z)!d!e!foo!mojo	- as sent by the user
	    b!c!(x!y!z)!d!e!foo!mojo    - the users site talks directly to the
					  site  b, so  a  is bypassed.  The
					  users site also talks to e, but
					  sending directly to  e  would bypass
					  the expression.
	      c!(x!y!z)!d!e!foo!mojo    - b sent it to c.
		(x!y!z)!d!e!foo!mojo    - c is FORCED to send it to x.
		  (y!z)!d!e!foo!mojo	- x sends it to y.
		    (z)!d!e!foo!mojo	- y sends it to z.
			d!e!foo!mojo	- z is FORCED to send it to d.  The
					  '('s meet and they go away.
			    foo!mojo	- d can send directly to foo, and does.
					  here mojo gets the message.


Some additional notes, expressions can be nested.  Such as:

		a!(b!c!(x!y!z))!d!e!(foo)!g!mojo

a MUST send to b;  c MUST send to x;  z MUST send to d;  e MUST send to foo;
foo MUST send to g.


chongo <nsc!(chongo)!bats> /\(--)/\
-- 
 ~ Imagine UN*X source, being in the public domain... ~ 
					J. Alton 84'

hokey@plus5.UUCP (Hokey) (10/17/84)

I was hoping I could stay out of this.  Where do you want to use
these headers?  In the transport address (the stuff sent to rmail)?
what about the addresses used in the From: and To: lines?

The uucp-mail project has addresses these issues in gory detail.  I will
tell you the decisions that have been reached.  I hope I don't get a lot
of mail on this issue because I need to get the new mail software out in
a hurry.  I believe the issues should be aired ASAP.  Many of the others
on the project would rather wait until the software is ready before mouths
are opened.  I am not always wasy to work with.  Many are concerned that
rehashing these issues will unnecessarily delay the project.  I hope I
am not the only one from the project addressing these issues, because I
have code to write.

First, all addresses sent *from* the new software across uucp will be in
strictly bang format.  Mail to user@dom.ain will be converted to dom.ain!user.
This works for almost all of the cases (the exception being explicitly
routed RFC822 addresses across gateways, near as I can tell).  Mail sent
*to* the new software will accept non-hybrid addresses only, because of
the parsing ambiguity.  This means mail to a!b@c.d will be rejected by
the new rmail.  This is necessary because the mail must go through, and
If, Someday, we all end up with RFC mailers, the parsing of addresses
will change and then we are all in trouble again.

Sendmail sites should leave the From: and To: lines *alone*.  There is
a difference between an address and a route.  Non-RFC mailers won't
look at these lines for replies, and they certainly won't update them
when the mail passes through their site!  These addresses should be
in RFC822 format *if you are using an RFC mailer*.  If a non-RFC
mail message passes through an RFC mailer, it *might* add a From:
line and an Apparently-to: line.  The >addresses< on these lines should
probably take the form  "a!b!c"@thissite  and thissite should be qualified
(site.UUCP) if the site is registered with the Uucp Site Registry
(lauren@vortex.UUCP).

This implies that the mailers invoked by sendmail should be able to
handle routing to non-neighbors, by using things like pathalias.  Note
that routing information is added, the addresses are not changed.  Rob
Warnock wrote a very clear discussion on the issue of route optimization
which clearly states the conditions under which paths may be rerouted.

Sites like ihnp4 optimize paths because many people do not have access
to RFC mailers or routing software, so when people reply to news articles
the mail goes along a path which is usually absurd.  My solution to
the problem of Path: replies to mail is for (at least) the first "smart"
registered site in the path to use its fully-qualified name.  This means
it is safe for any other "smart" mailer to optimize directly to the last
fully-rooted machine on the path.  This may be enough to prevent the
"optimization" of explicitly specified paths.  Chuqui can still send
out his looping paths to check path validity and the speed with which
the message travels, without worrying that the path will be "fixed"
for him.

This also means that these smartmailers will handle mail to other domains
as well as recognizing subdomains.  Just think, no more problems with
the duplicate named machines gang, vortex{.DEC,.UUCP}, rigel{.sun.UUCP,
.oddjob.UUCP}, regina{.DEC,.UUCP}, and a host (no pun...) of others!
Leaves you kind of breathless, huh?  This also means that you could
send mail to, say, ucbvax!site.CSNET!user and not have to worry about
the hilarious contortions with csnet-relay.  Specifically, you would
only have to address the mail to *any* convenient smarthost!do.ma.in!user
and it will get there (assuming the addressee exists!).

I have left out a lot of points in my discussion.  These ideas have been
gone over quite thoroughly by many people, and I have done a mediocre
job of covering the issue.  Many of the problems must be solved in
specific places.  The use of parentheses in an address will mess up
most sendmail sites in the universe.  We can leave most of the routing
software alone if we have user-interface mailers which do the job they
are supposed to do, specifically, produce an *appropriately* formatted
mail message with the help of the user.  This stuff must be easy to use,
and should not get in our way.  We are getting there.
-- 
Hokey           ..ihnp4!plus5!hokey
		  314-725-9492

andrew@garfield.UUCP (Andrew Draskoy) (10/17/84)

Re:  using ()'s to decide precedence in addresses.
The main problem is that this is not in RFC882.
What IS in the standard is using double-quotes to delimit an atom.
Hence the addresses

"joe@blah.ARPA"@foo.BAR.UUCP
and
"a!b!c"@host.ARPA
and maybe even
a!b!c!"user@host.ARPA"

should work if I read the standard correctly.  Since no one seems to be
doing this, I suspect that there is a problem with this.  If that is the
case, I would like someone to tell me why.

Andrew Draskoy
{akgua, allegra, ihnp4, utcsrgv}!garfield!andrew

fair@dual.UUCP (Erik E. Fair) (10/18/84)

I'm about ready to bring up sendmail here at DUAL, and my rules of
thumb are quite simple:

1.	When you're on the ARPA INTERNET, obey RFC822 exactly
		(i.e. `@' takes precedence).

2.	When you're on the UUCP network, obey UUCP bangist conventions
		(i.e. `!' takes precedence).

3.	Rule 1 takes precedence over Rule 2.

In our case, since we're not on the ARPA INTERNET (too bad), I will
implement Rule 2. This is especially important for us, since we have
UUCP connections to not one, not two, but four ARPA INTERNET sites.
These rules will keep the gateways working, and should prevent any
problems with internal internets that people have running around (like
we now do), because they reflect the current state of the world of
networks.

	Erik E. Fair	ucbvax!fair	fair@ucb-arpa.ARPA

	dual!fair@BERKELEY.ARPA
	{ihnp4,ucbvax,hplabs,decwrl,cbosgd,sun,nsc,apple,pyramid}!dual!fair
	Dual Systems Corporation, Berkeley, California

wls@astrovax.UUCP (William L. Sebok) (10/19/84)

Chuqui and Chongo's articles <1606@nsc.UUCP> and <1608@nsc.UUCP> have many
good ideas.  Much of the points mentioned there are things I had thought of
myself and wondered why weren't they thought of in the standard.  I think that
RFC 822 is an inferior standard because it does not allow something like
nestable parentheses to indicate and override precedence.  The double quote
character provided part of the functionality mentioned but unfortunately
because a double quoted clause is closed by another double quote, double
quoted claused cannot be nested.
-- 
Bill Sebok			Princeton University, Astrophysics
{allegra,akgua,burl,cbosgd,decvax,ihnp4,noao,princeton,vax135}!astrovax!wls

piet@mcvax.UUCP (Piet Beertema) (10/19/84)

<...>

	>I'm about ready to bring up sendmail here at DUAL, and my rules of
	>thumb are quite simple:
	>1.	When you're on the ARPA INTERNET, obey RFC822 exactly
	>		(i.e. `@' takes precedence).
	>2.	When you're on the UUCP network, obey UUCP bangist conventions
	>		(i.e. `!' takes precedence).
	>3.	Rule 1 takes precedence over Rule 2.
Simple solutions sometimes work very good.... Yes, I strongly agree with
the given parsing, especially in relation with pathaliasing:

It seems there are quite some pathalias versions around, the latest one being
the Peter SteveMark (:-)) one. However some articles about these pathaliases
indicate that people see it as a user interface that should be installed on
every site ("what use is a pathalias that runs only on 32-bit machines?").
I strongly object to that view. Pathaliasing requires a database that should
be maintained on a high-priority level; and you can't expect that from all
sites.
Thus pathaliasing should be done only by backbone or whatever-you-may-call-them
sites and be fully transparent to the users both on the backbone site and on
the leaf nodes; therefore pathaliasing (routing) should be linked to sendmail.
This means that at any site you should be able to have a mail routed by
the nearest backbone site; and since lots of sites don't run sendmail, the
bang convention should/can be used to get the mail to the backbone. Thus a user
could say: "mail path!backbone!user@site.domain" where "path" is the path to
the backbone. That means that on all intermediate hops the '!' should have
precedence over the '@'; only the backbone site will expand (if necessary)
the remaining part using pathaliasing. A backbone should also know/add a path
to a given network gateway if necessary and convert to the proper syntax.

In fact the above scheme has been operating satisfactorily for more than
a year now here at mcvax.
-- 
	Piet Beertema, CWI, Amsterdam
	...{decvax,philabs}!mcvax!piet

teus@haring.UUCP (10/21/84)

The idea of having parentheses in routing addresses popped here in Europe
up for getting around the problem of "different parsing ideas in different
networks". Together with "do not touch it for rerouting" there are
enough arguments to it. I think it is a good idea. The only problem left
is: which character(s) are used for it? Are '(' and ')' suitable?
-- 
	Teus Hagen	teus@mcvax.UUCP  (CWI, Amsterdam)

smb@ulysses.UUCP (Steven Bellovin) (10/23/84)

Using parentheses will badly break lots of existing software, for several
reasons.

	(a) rfc822 treats parenthesized stuff as comment.

	(b) lots of mailers use 'system()' or equivalent to invoke the
	next mailer down the forwarding chain; think of what unquoted
	parentheses do when the shell gets hold of them.  (You wouldn't
	believe what our shell does, but even /bin/sh won't like them.)

	(c) Parentheses are already used as an (undocumented) part of the
	uux protocol...  That is, if you say

		uux foo!command a!b

	your uux will try to ship file 'b' from machine 'a' to 'foo' before
	executing 'command'.  It doesn't matter the command is really

		uux foo!rmail host!user

	uux has no way of knowing.  To get around this, mailers generate

		uux foo!rmail \(host!user\)

	to pass to the shell; the parentheses in turn tell uux that the
	funny string with a '!' is to be passed on literally, and is not
	a file transfer request.

ag5@pucc-i (Henry C. Mensch) (10/24/84)

<<>>

	Well, we could always use {} or [] since the various and
sundry mailers don't seem to use these...

--------------------------------------------------------------------
Henry C. Mensch  |  User Confuser |  Purdue University User Services
{ihnp4|decvax|ucbvax|purdue|sequent|inuxc|uiucdcs}!pur-ee!pucc-i!ag5
{allegra|cbosgd|hao|harpo|seismo|intelca|masscomp}!pur-ee!pucc-i!ag5
--------------------------------------------------------------------
                  "Hit me with your laser beam!"