[net.mail] Mail Addressing [2 of 4] Semantics

kre@ucbvax.ARPA (Robert Elz) (08/03/85)

Addressing semantics present the most difficult problem to solve
currently facing us.  This problem is also the most urgent to solve
quickly.

I'm going to try and discuss this without resort to any magic symbols
(! @ and the like) as as soon as some people see any of those symbols
various preconceptions leap out at them, and they are prevented from
seeing the real issues.  Its not going to be easy, and I'm sure that at
least a couple of times I'm not going to be able to do it.

What we have to decide here, is what information must be present in a
mail address in order to uniquely specify the intended recipient.

First I'm going to assume a couple of things that I think are non
contentious in this forum (though they are by no means necessary).

A mail address is just a means of specifying a person (or program, or
whatever) that should receive the mail.

In a tiny system that's all that we would ever need.

Since there is no practicable way that we can make these names unique
worldwide, we add to this the notion of a "host" - the place where this
person receives his mail.

In a smallish network this would be all that would be needed, however
as the number of hosts grows, specifying names for them uniquely also
becomes impossible.

Here is where the recent discussion has exposed two possible methods
for proceeding.

We can associate with each host some additional "attribute" that we
must also specify, to make its name unique.  (And as necessary, we can
add more attributes to the host name, or we can successively qualify
the attributes already given).

Or we can group hosts into "lumps", and then specify the name of the
host, and the lump to which it belongs.  Again, we can bundle lumps of
hosts together to make bigger lumps, and so on as necessary.  To name a
host we specify its name, and the name of all lumps it belongs to.
(Sometimes as a contraction we are permitted to omit the names of any
lumps that the sender also belongs to, but this is just an abbreviated
form of addressing, and like all abbreviations, should be avoided
wherever possible).

The former technique is apparently the one that is being considered
(adopted?) by the ISO MHS committee.  (See Mark Horton's article
<1343@cbosgd.UUCP>).  [Note: I am not claiming that the MHS naming
scheme is anything very closely related to the one I am about to
describe - just that the principles are similar].

It is also the technique advocated in a series of articles by Peter
Honeyman (<537@down.FUN>, <545@down.FUN>, <552@down.FUN>).  The
attribute he suggests that we should use is (I suspect) not the same
one that the ISO people would adopt, but the essentials of the scheme
are still there.  The "attribute" used is the name of some other host
that this one has a direct link to.  If that doesn't prove to be unique,
then the attribute host is further qualified by giving it a similar
attribute.

The "lumps" scheme is the one commonly labelled "domains".

Before looking ad advantages & disadvantages of each of these, lets
consider a small anecdote to put things in perspective.

While I have been in the US, my employer has seen fit to change its
telephone number.  This means that I am going to have to have new
business cards printed.  On those cards I will include my paper-mail
address, and my phone number (and conceivably the telex number as well,
though no-one I know would even want to contact me that way).  Each of
these things can be specified in a way that they can be used anywhere.
Wherever I am, I can give a card to someone I meet, and if they have
access to the proper equipment they can use the address on my card to
contact me.

The addresses also remain constant - other than actions I may take (or
my employer may take) they are unlikely to change for years.  It is
vaguely possible that the body responsible for getting messages to me
may need to change my address, but they do this very rarely.  Certainly
nothing any other user of similar addresses can do can affect my
address, in any of these forms.

I would also like to include my electronic-mail address.  To really be
useful, this must meet the same criteria.  It must be usable (as it
stands) by anyone, anywhere.  Of course, the situation in the e-mail
world is not nearly this stable, but ideally, it would be, and just
perhaps, we can help push things this way, by leading from the front.

Now, to look at attribute addresses, and domains.

1) Domain addresses can be lengthy - they include a lot of information,
that is, in most cases, redundant.  In many cases, some of this excess
information is nearly meaningless (doesn't mean anything recognisable
to the average observer, its simply a magic incantation).

2) Attribute addresses can be shorter, only enough attributes to make
the address unique need be specified.  Of course, its perfectly legal
to overspecify an address, but there is (usually) no way to determine
in advance exactly which attributes need to be specified, or even, if
in fact, any current set of attributes will give a unique address at
all, or whether a new one will need to be used.

Now lets examine how a name gets assigned in each of the two cases.
With domains, the maintainer of the domain name list is asked to add a
specific (chosen) name to the list of names known in that domain.  If
the name chosen is unacceptable (either because it violates the
appropriate syntax requirements, or because it duplicates an existing
name, and hence would cause ambiguities), it is rejected, and the
proposer would then chose another.  Having reserved a name, nothing
more need ever be done.

With attributes, each site simply chooses a name, and makes known some
relevant attributes that apply to it.  Should the chosen name duplicate
the name of an existing site, then the attributes are used to
disambiguate.  This is the one BIG disadvantage of this scheme, if my
site had been the one whose name was (inadvertently) duplicated, then
my address now needs an attribute that it didn't need before, and until
I know the attributes of the new site, I can have no idea what new
attribute people mailing to me must specify.  That is, there's no way I
can predict this and print it on my business cards.

To my mind, domain type addresses are the only ones that make sense.
Except in the case that all of the currently 'top' level domains are
bundled together, and placed in a new 'top' domain, addresses don't
change once issued.  And that action is one that is taken only by the
people who manage the name space, and is likely to be made very rarely,
and with plenty of advance warning.  This is similar to the procedures
when the telephone company decide to change your area code, or when the
post office (or whatever appropriate body it is) decides to rename the
street that you live in.  Those well publicised changes we can
tolerate, unannounced changes occasioned by some new site joining the
network are intolerable.

Now domains do have some problems.  There has to be someone to
co-ordinate names in each domain (some "authority").  See Henry
Spencer's article <5836@utzoo.UUCP> in which he co-opts me as a
volunteer to do this work (:-).  Henry makes the point that volunteer
labour isn't easy to get to do this task.  Of course, "labour" isn't
really needed, we have all this computing power just waiting to be
used.  No-one has ever said that the naming authority must be a
"human".  The task to be performed isn't overly onerous, and creating a
program to receive mail from someone wanting to register a name, check
the proposed name for syntax problems and potential clashes, and either
add the name to the list, or reject it, is not something I would feel
to be beyond my capabilities.  Neither is it a perpetual task.
The fact that this has not yet (to my knowledge) been automated,
only testifies to the comparative simplicity of the task.

There is a third possibility for addressing (not mentioned above).
An address could be specified by detailing the route used to
get to the address.  This would be something akin to specifying
a postal address as "go north 3 blocks, take a left, continue
2 blocks, take a right, then veer right again past the big tree,
continue till you see a supermarket on the right, then take the
next left, and the third house on the right is it".  Of course
this presupposes that everyone starts from the same point, which
we usually solve by making that point be some 'well known monument'
and leave it up to the individual to work out how to get from where
he is to the monument.  Sometimes we may even give routes from
a few well known monuments, so people can pick one that they know,
and is close to where they are starting from.  These addresses are
unambiguous, and remain constant (unless someone blocks off
a street, or there's a traffic jam, or ..)  Of course, its
inconceivable that anyone would actually choose a scheme like
this for addressing .. or is it? (see the article <686@umd5.UUCP>)

franka@mmintl.UUCP (Frank Adams) (08/07/85)

I would like to propose a UUCP naming scheme which would be simple to
implement, yet deal with the need to supply a unique, unvarying address.
What I propose is to designate a few sites as "root" sites.  Your full
address is a route from any one root site to your host (and then to you).
The requirements to make this work are threefold:

1) The names of root sites must be reserved; no other site may be permitted
   to adopt such a name.

2) Each host must know how to deliver mail to a root site.  (This may mean
   requiring the user to prefix a route to the root site to the destination
   address with existing mailers.)

3) Each root site must know how to deliver mail to every other root site.

The only problem I see with these is how to designate the root sites.  I
suspect that about a dozen are sufficient, so this could be resolved in an
ad hoc manner.

Some caveats: I do not mean that mail should be forced to follow the implied
route specified by the address.  The point of this scheme is that "dumb"
mailers can follow a simple set of directions to forward mail, while "smart"
mailers can reroute mail without error.

I have not addressed the issues involved in cross-net mail here, either.
I have the impression that those problems are more syntactic than semantic.
Whether the items in an address represent machines or domain names does
not matter *for a mailer on another network*.  How they are presented does.


I believe that this scheme avoids the danger of a "takeover" of the net,
as well.  It would be relatively simple, if such were attempted, to
redesignate the root systems; all that is required is a check that their
names are not duplicated.

henry@utzoo.UUCP (Henry Spencer) (08/11/85)

> Now domains do have some problems.  There has to be someone to
> co-ordinate names in each domain (some "authority")... volunteer
> labour isn't easy to get to do this task... No-one has ever said that
> the naming authority must be a "human".  The task to be performed isn't
> overly onerous, and creating a program to [handle name registration]
> is not something I would feel to be beyond my capabilities.  Neither is
> it a perpetual task...

This actually just shifts the issue, to finding a volunteer to provide the
machine time for the job.  For name registration, this probably isn't too
much of a problem (although Lauren could probably tell you some interesting
stories about the legal aspects, e.g. pinheads who feel they have a divine
right to use some specific name and threaten to sue the registry when they
find the name is taken already...[I kid you not]).  It brings in a more
troublesome issue, however.

What does random site X do when a user there (or a machine it connects to)
asks for mail transmission to site foo.bar, which X doesn't know about?
Right:  it punts the mail to the domain-administration site.  Given the
explosive growth rate of the network, how long will it be before that site
is swamped and its sponsors get fed up and pull the plug?

Having multiple administration sites for each domain only postpones the
problem slightly.

Of course, site X "ought" to know about any site it talks to frequently,
so that it doesn't need to hit the domain administrator every time.  But
this assumes that the dissemination of such knowledge can keep pace with
the growth of the network, which is an *assumption*, not a self-evident
fact.  I'm afraid I have little confidence in it.

One idea which almost nobody has discussed, but which might really help,
is to take the position that site X must *not* just forward the mail to
the domain administrator.  It must ask the domain administrator for the
routing information, and then use that information *itself*.  This has
the disadvantage that it slows down mail traffic considerably, but the
advantage that it gives X considerable incentive to do the work locally
if at all possible -- incentive that is lacking otherwise.  If we really
want domains to work, it is vitally important to do everything possible
to limit the load on the administering sites.

Perceptive observers may have noticed that utzoo is a fairly obvious
candidate as one of the domain-administration sites for eastern Canada,
and that utzoo has not yet volunteered to do it.  Don't hold your breath.
-- 
				Henry Spencer @ U of Toronto Zoology
				{allegra,ihnp4,linus,decvax}!utzoo!henry

tp@ndm20 (08/22/85)

>is to take the position that site X must *not* just forward the mail to
>the domain administrator.  It must ask the domain administrator for the
>routing information, and then use that information *itself*.  This has
.
.
.
>to limit the load on the administering sites.

As somewhat of an add on to Henry's idea, how about this.   If a site
can not route a message, it asks the domain  administrator the proper
route.  This route, through an *automatic* mechanism, is updated into
the requestor's database,  so he  will never  have to  ask about that
route again.  This provides an automatic way for routing databases to
be updated.  As I understand the idea of domains,  these routes would
not  automatically  disseminate,  as  the  idea is  to minimalize any
node's knowledge of the full configuration of  the net.   This scheme
allows a node to only keep track of the  sites he  actually mails to.
If the route ever fails, then the node can ask the domain
administrator and get an updated route.  

The problem with this whole line of reasoning is that it requires new
software that is completely different from what is  already in place.
The  mailer  would  have to  know to  ask for  a route,  and hold the
message until it got one.  It should also recieve undeliverable mail,
contact the domain administrator for a new route, and re-send it.  It
could be a long time before someone found out his mail was
undeliverable.  

Unless  the  routes given  out by  the domain  administrator are kept
around, the administrator  will be  plagued by  route requests, which
probably accounts for just as much load (if not  more) as  if it just
forwarded the message.  The catch is that if the are kept around, you
never know when they become invalid.

Terry Poot
Nathan D. Maier Consulting Engineers
(214)739-4741
Usenet: ...!{allegra|ihnp4}!convex!smu!ndm20!tp
CSNET:  ndm20!tp@smu
ARPA:   ndm20!tp%smu@csnet-relay.ARPA