[net.mail] Parsing mixed form addresses

dpk@mcvax.uucp (Doug Kingston) (02/01/86)

One of the big problems facing unix mail system now is the ambiguous
nature of "mixed form addresses", e.g. x!y@z.  As was recently mentioned
by some others, the key to correctly parsing these addresses is to
know the source.  If you know the source (or more correctly its
addressing format), then you can almost always determine the correct
meaning of the address.  UUCP addresses should be assumed to have
! precedence.  Expect that when you generate them, and accept it
when you receive them.  Mail from 822 standards sources (PhoneNet,
SMTP, Bitnet, ...) should be given %/@ precedence.  This the basic
policy being adopted in Europe.

This was one of the major problems with Sendmail until recent enhanced
config files were published.  Sendmail would always use the same
input parsing rules regardless of the source of the message.  This
is doomed to failure with mixed form addresses and more then one type
of source.  I am glad this is being changed.

All ! or all @/% addresses are not a problem. They are unambiguous.

				-Doug-

				Doug Kingston
				Centrum voor Wiskunde en Informatica
				Kruislaan 413
				1098 SJ Amsterdam, The Netherlands

ulmo@well.UUCP (Brad Allen) (02/03/86)

>One of the big problems facing unix mail system now is the ambiguous
>nature of "mixed form addresses", e.g. x!y@z.  As was recently mentioned
>by some others, the key to correctly parsing these addresses is to
>know the source.  If you know the source (or more correctly its
>addressing format), then you can almost always determine the correct
>meaning of the address.  UUCP addresses should be assumed to have
>! precedence.  Expect that when you generate them, and accept it
>when you receive them.  Mail from 822 standards sources (PhoneNet,
>SMTP, Bitnet, ...) should be given %/@ precedence.  This the basic
>policy being adopted in Europe.

The way I have been sending mail through mixed standards is supposing
that each machine just looks at the spot it wants to.  That is,
if a machine is ARPA, then it will look to the right, and if
it's UUCP, it will look to the left.

This makes thing ambiguous since some machines have more than
one network operating, but I make sure I specify what network
I'm sending to, and this makes it unambiguous.

I think I once sent successfully through this path:

ptsfa!ucbvax.ARPA!ucscl!firezar@ucscc.UUCP

This goes (from my machine well) to ptsfa via UUCP (the Well
tends to assume uucp addresses), and then to ucbvax (via uucp).
Then ucbvax switches over to the ARPA format, looks at the end
of the address, and sends to ucscc.UUCP (which I suppose
gets converted right over to the uucp send program on ucbvax ...!)
then ucscc sends to ucscl!firezar as if it were uucp address.
ucscc assumes the ! or : format, unless I specify a network
like .arpa or .bitnet or .UCSC.

Until every machine in the whole universe knows of every
other machine (!!!, someday ...), and even when we get
to that point, I strongly beleive in the "view it in context"
argument.

		..ucbvax!{ptsfa,dual,lll-crg}!well!ulmo
		..ucbvax!ucscc!ucscl!firezar (specify to Ulmo)
		Brad Allen

gnu@hoptoad.uucp (John Gilmore) (02/03/86)

Let me cast a dissenting vote on interpreting incoming mail based
on what transport mechanism was used.

I run a 4.2BSD based system which connects to all other systems via
uucp.  I ran a similar setup at Sun with a few hundred machines on
a network, all connected to the outside world via uucp.  In both these
setups, RFC822 rules are used -- @ always takes precedence.  Incoming
mail from the Arpanet gateway has the gateway info removed, making it
look like we are on the Arpanet, and outgoing mail for the Arpa
domains is of course forwarded immediately to the gateway.

If the gateway sites that we connect to suddenly decide that "since we talked
uucp we must care more about !" then the whole thing falls down.

Just because I run uucp doesn't mean I don't know better!  It means
nobody has offerred me an Arpanet connection in the same price range.

----

No matter what scheme you use, unless every site follows the same
rules, there will always be places that (theoretically or practically)
cannot be reached.  The key as I see it is to make it clear what will
go on, so that humans can plan around the limitations of the medium.
Are we gonna add another field to the uucp maps for "parses ! ahead of
% but behind @", or are we going to try to move towards a standard?  If
we solidly move towards a standard, (eg, cutting out the "%" crap),
then sites that do that stuff will not be reachable and will have to
fix themselves up.

It is *possible* for any site to fix its mailer to produce and accept
unmixed ! addresses.  Look at DEC or Sun.  Ever had problems getting a
decnet "::" address parsed?  Of course not, they translate it properly at
the gateway.  Woe to those who cheap out and make their local kludges
visible to the whole net!
-- 
# I resisted cluttering my mail with signatures for years, but the mail relay
# situation has gotten to where people can't reach me without it.  Dammit!
# John Gilmore  {sun,ptsfa,lll-crg,nsc}!hoptoad!gnu    jgilmore@lll-crg.arpa

ka@hropus.UUCP (Kenneth Almquist) (02/08/86)

> Just because I run uucp doesn't mean I don't know better!  It means
> nobody has offerred me an Arpanet connection in the same price range.

You want to run RFC822 mail over UUCP.  Fine.  But pretending that UUCP
mail is RFC822 mail doesn't work very well.  Consider a simple RFC822
address:
	"Kenneth Almquist"@hropus.att
The first uux that gets this address will strip off the quotes; the
second will split the address at the space, which is now unquoted.  Or
consider any address that contains angle brackets; uuxqt will kick back
all such addresses for security reasons.

Berkeley's attempts to treat UUCP mail like RFC822 mail have not given
us a correct implementation of RFC822 over UUCP.  What they *have* done
is to make a mess of UUCP mail; it is clear to me that things like
giving @ precedence over ! are undesireable in UUCP mail and should be
fixed.

But you want to run RFC822 mail to your Arpanet gateways.  This seems
like a reasonable idea to me.  If nothing else RFC822 mail has the
advantage of having a well defined standard.  But you loose this
advantage if you don't implement the standard correctly.  If RFC822 is
worth doing at all, it is worth doing right.  I will briefly consider
some problems:

First problem:  some UUCP's insist on sending back acknowledgements for
every remote execution unless the program being run is rmail.
Solution:  send RFC822 mail by remotely executing rmail with the -a
option.  Install a new version of rmail which execs the old version
unless it is called with the -a option.  Second problem:  RFC822
assumes that the destination(s) and return path for the mail will be
sent in the SMTP envelope.  Solution:  store this information in two
lines preceding the piece of mail.  For example:
	Currently-To: <ka@hropus>, <@hropus: xxx@houxm>
	Return-Path: <yyy@hrumr>
	[The RFC822 header starts here]
The addresses are all route-addresses in the format specified by
RFC822.  Third problem:  Once you start talking RFC822 to some sites,
you must be able to convert UUCP mail to RFC822 and vice versa.
Solution:  you can use the Berkeley mail code, which does this badly,
or write your own and do a better job.

So if you like RFC822, write some code to do it, get you neighbors to
install it, and live happily ever after.  The only reason that we don't
have RFC822 mail running on top of UUCP now is that no one has gotten
around to doing it.  It would not take all *that* much work to put
together a system to handle RFC822 mail over UUCP, and anyone who did
it would be rewarded by the love of all the RFC822 lovers out there.
My only suggestion is that you write up a proposal for encapsulating
RFC822 mail and post it before you start coding, since everybody will
have to live with this standard.  I would be willing to expand upon my
own ideas of how to do this if anyone is interested.
				Kenneth Almquist
				ihnp4!houxm!hropus!ka	(official name)
				ihnp4!opus!ka		(shorter path)


But we *had* a nice mail standard, until people started contaminating
it with ideas from RFC822  :-)

david@ukma.UUCP (David Herron, NPR Lover) (02/13/86)

In article <253@hropus.UUCP> ka@hropus.UUCP (Kenneth Almquist) writes:
>> Just because I run uucp doesn't mean I don't know better!  It means
>> nobody has offerred me an Arpanet connection in the same price range.
>
>First problem:  some UUCP's insist on sending back acknowledgements for
>every remote execution unless the program being run is rmail.
>Solution:  send RFC822 mail by remotely executing rmail with the -a
>option.  Install a new version of rmail which execs the old version
>unless it is called with the -a option.  Second problem:  RFC822
>assumes that the destination(s) and return path for the mail will be
>sent in the SMTP envelope.  Solution:  store this information in two
>lines preceding the piece of mail.  For example:
>        Currently-To: <ka@hropus>, <@hropus: xxx@houxm>
>        Return-Path: <yyy@hrumr>
>        [The RFC822 header starts here]
>The addresses are all route-addresses in the format specified by
>RFC822.  Third problem:  Once you start talking RFC822 to some sites,
>you must be able to convert UUCP mail to RFC822 and vice versa.
>Solution:  you can use the Berkeley mail code, which does this badly,
>or write your own and do a better job.

Oh geez.... It's not a good idea to go mucking up a standard to handle
local problems.  What do you do with these nifty-keeno new-style
header lines at gateways?  (A position we're in here by sitting
on 3 networks (bitnet, csnet, and usenet)).

BITNET has something which will solve this problem already.  It's called
BSMTP, the B stands for Batch.  (Remember what kind of computer makes
up half of BITNET?).  Anyway, BSMTP is a batch form of SMTP.  All the
same information is exchanged between the two machines, but as two 
seperate file transfers rather than a two-way conversation.

In addition.... there is a program ALREADY EXISTING which will handle
BSMTP and do routing amongst a bunch of domains, etc, etc...  It's
the mail portion of the UREP package (Unix RSCS Emulation Program,
the little gadget which attaches us to BITNET).

Some problems:

1) UREP documentation is abominable, and the code reads even worse.
   (It looks like it's well designed code, but there's SO MUCH of it
   and it's uncommented and not very self-documenting)
2) UREP requires source liscences, etc.  Also some dealings with Penn-state.
3) The file which controls the routing is ONE FILE for ALL domains
   you provide routing into.  (This is fine for me because that program
   only handles stuff going into bitnet, so the file is 1100 lines
   long, buuuut, between my entire database I probably have 10,000
   hosts listed)  Also, I don't think there's any hashing on the
   file meaning linear searches... again, for a huge domain table
   this will be murder.
4) The code has some hard-wired assumptions (I think) about bitnet.

On the other hand, it looks like a very capable system and can easily
be made to interface with uucp mail.  (bitnet doesn't talk directly
to this program, but goes through 2 other programs before it gets
to the router, then for sending it back out it goes through another
2 programs).

I can probably dig up some documentation if anybody's interested...


-- 
David Herron,  cbosgd!ukma!david, david@UKMA.BITNET, david@mathsci.uky.csnet
							  ^
			Notice new and improved address---|

Postmaster for Kentucky
"'New and improved' is a misnomer" -- David Herron, 1986