sam@delftcc.UUCP (Sam Kendall) (02/01/86)
I think the algorithms that underly a pathalias-routing program such as Stan Barber's uumail are worth discussing, independent of their implementation. What we need is a sort of "Guide to Using the Pathalias Database". I think there are two independent questions: first, given an address, what name(s) do we look up. (I am using "name" to mean site name, possibly qualified with domains.) Second, how do we look up those name(s). Okay, first question. Given a path a!b!c!...!p!q!r!stuff@z which name(s) do we try to look up? Basically, either (1) "a", or (2) "r", or (3) "z". We can refine these choices a bit: (1) if we don't find "a" in the database, we try "b", then "c", and so on; likewise, (2) if we don't find "r", we try "q", then "p", and so on. Also, (3) if we don't find "z", graduate to option (1) or (2). "uumail" lacks these refinements, I think, and it also has no option (2). Second question. Give an algorithm for mapping a name into a path. This is a series of alternatives to be tried in order until one works, something like: First strip off any ".uucp" domain; it is default, sort of. (Note: I am assuming case-independence, as in pathalias -i.) (1) Look it (the name) up in the pathalias database. (2) Look up all domains (if any), inner to outer (e.g., for "ernie.berkeley.edu", look up ".berkeley.edu" then ".edu"). If found, use the fully domain-qualified name in the path. (3) Prepend a "." to the name, and look it up. This is to cover things like "larry.rosler@ATT.UUCP" or "sob@harvard.edu"; "ATT.UUCP" and "harvard.edu" are domains, not actual hosts, but it can still make sense to send things to them. (I'm not sure that these addresses actually work.) (4) A questionable step: append ".UUCP" and use step (2), i.e., send the letter to the nearest UUCP gateway. This is justified if UUCP gateways tend to have more up-to-date pathalias entries than most sites. UUCP gateways should omit this step. But this might lead to UUCP gateways getting a lot of traffic in dead letters; this will be more tolerable when there are more UUCP gateways. (5) Give up. This assumes domain names have an initial dot, as in the latest version of pathalias. I don't know if this makes "uumail"'s domain table obsolete; the domain table is more flexible than domains handled through pathalias, but it would be much more elegant and convenient to handle domains entirely through pathalias. Certainly pathalias's domain handling (when combined with the algorithm above) is sufficient for my needs, but my site is UUCP-only. Implemented, I think my loose outline would result in at least two layers of subroutines. The top layer (which corresponds to my first question) parses the address, picking out names from it and calling the bottom layer (second question) to do the lookup(s). Finally, a couple of miscellaneous items. First, another question: that of private names. RFC 822 says that you can leave off the domain part of your destination site name if it is the same as your own. I don't remember how this works in a routing spec. Anyway, suppose I am mailing from site "a.WOMBAT.EDU" (also on the UUCP network) to site "x.WOMBAT.EDU". If I mail to "user@x", RFC 822 says it should go to "x.WOMBAT.EDU". No problem so far. But there is an actual site "x" on the UUCP network. Where should a letter to "x!user" go, "x.UUCP" or "x.WOMBAT.EDU"? There are frustrating name collision problems inherent in merging domains and the flat and/or relative UUCP namespace. Second, Peter Honeyman doesn't worry too much about address routing using just pathalias, I guess, because he is working with his more ambitious edge database. He talks about it a bit in the current issue of Unix Review. Eric Allman interviewed him, and it was pretty interesting to me (but I haven't been following net.mail for more than a few months). Comments, anyone? ---- Sam Kendall allegra \ Delft Consulting Corp. seismo!cmcl2 ! delftcc!sam +1 212 243-8700 ihnp4 / ARPA: delftcc!sam@nyu.ARPA
jer@peora.UUCP (J. Eric Roskos) (02/03/86)
> I think the algorithms that underly a pathalias-routing program such as > Stan Barber's uumail are worth discussing, independent of their > implementation. What we need is a sort of "Guide to Using the Pathalias > Database". First of all, by way of clarification, I wrote the "opath" routing code in Stan's uumail (though I am very grateful to him for including it in his program; he also cleaned up a number of things). Actually I had a lot of reasons for the approaches used; let me comment in response to your observations on why I did various things (most of all, leaving things alone sometimes). > Okay, first question. Given a path > > a!b!c!...!p!q!r!stuff@z > > which name(s) do we try to look up? Basically, either (1) "a", or (2) > "r", or (3) "z". We can refine these choices a bit: (1) if we don't > find "a" in the database, we try "b", then "c", and so on; likewise, (2) > if we don't find "r", we try "q", then "p", and so on. Also, (3) if we > don't find "z", graduate to option (1) or (2). "uumail" lacks these > refinements, I think, and it also has no option (2). My algorithm was to look up "a", iff "a" is not a neighbor. (In the original version of the program, I had a table of neighbors so that the database didn't have to be referenced in that case; but since the path for neighbors is just the name of the neighbor, and since obtaining the table in a secure manner required either hardcoding the table or calling a setuid program, I eventually eliminated that and just looked up all the names.) "z" definitely should not be looked up; that is the essence of what I've been arguing for for a long time now, viz., that the string representing the path should consist of "names interpreted at the next site" separated by "!"s, with no characters other than "!" significant (at that level of the parsing; however, a given "name interpreted at the next site" can be further parsed by the site it was intended for). I have mixed feelings about looking up "r". I definitely don't think it should be looked up if there is any alternative; a lot of AT&T mailer sites do that, and it causes a lot of trouble from time to time when you want to explicitly specify a path, for whatever reason, and then a mailer down the line "optimizes" it. I do think it is somewhat more reasonable to rewrite the path if you're a site such as the gateways in Europe which have to choose between low-cost packet networks and high-cost conventional telephone connections, though. The algorithm of trying successive sites down the path if the next one is unknown is an interesting improvment, though. The problem is, since site names are not unique, if you don't know some of the names that make up the context of one you do know, you may end up choosing the wrong one. Peter Honeyman's new "pathparse" program might be better to use for this. > Second question. Give an algorithm for mapping a name into a path. This is well-defined in RFC822. The current "opath" routines let you "cheat" somewhat on this, since you have to explicitly specify what ".GIZMO" means as distinct from ".GIZMO.UUCP". > (3) Prepend a "." to the name, and look it up. This is to cover > things like "larry.rosler@ATT.UUCP" or "sob@harvard.edu"; > "ATT.UUCP" and "harvard.edu" are domains, not actual hosts, but > it can still make sense to send things to them. (I'm not sure > that these addresses actually work.) First of all, if you look in the latest distribution of the UUCP map, you'll find that the map folks have already started implementing the domains (including, alas, geographic subdomains), they're just commented out. For UUCP routing, in my opinion, a domain name (e.g., "ATT.UUCP") does map to one or more site names; actually the opath code (I think in the verison Stan used) lets you choose from among a number of alternative sites when resolving the domain name, with a weighting, so that you can have several different nameservers for a domain, and you can route to more than one of them, with the frequency of routing weighted in proportion to how much you want to send to each of them. > This assumes domain names have an initial dot, as in the latest version > of pathalias. I don't know if this makes "uumail"'s domain table > obsolete; the domain table is more flexible than domains handled through > pathalias, but it would be much more elegant and convenient to handle > domains entirely through pathalias. Certainly pathalias's domain > handling (when combined with the algorithm above) is sufficient for my > needs, but my site is UUCP-only. I've been thinking about this a lot the past few days. For the present, you can use ".ATT", etc., in place of the gateway names in the ">gateway" field of the routing table. I haven't decided yet what the relative merits of the two approaches (other than the probabilistic routing) are. Well, I could go on at length, but our system is going down for maintenance, so I'll leave it at that for now... -- UUCP: Ofc: jer@peora.UUCP Home: jer@jerpc.CCUR.UUCP CCUR DNS: peora, pesnta US Mail: MS 795; CONCURRENT Computer Corp. SDC; (A Perkin-Elmer Company) 2486 Sand Lake Road, Orlando, FL 32809-7642 xxxxx4xxx "There are other places that are also the world's end ... But this is the nearest ... here and in England." -TSE
greg@ncr-sd.UUCP (Greg Noel) (02/05/86)
In article <122@delftcc.UUCP> sam@delftcc.UUCP (Sam Kendall) writes: >I think the algorithms that underly a pathalias-routing program such as >Stan Barber's uumail are worth discussing, independent of their >implementation. What we need is a sort of "Guide to Using the Pathalias >Database". I agree. After looking at both smail and uumail, I am becoming convinced that they differ more in detail than in any major way. I have also looked briefly at HP's domain mailer and find that it is doing much the same sort of thing as the other two. In fact, I have been looking at extracting as much as possible of the common code and putting it in a library that could be shared among all the domain mailers; I have already started work on it. So far, I haven't been able to do much (my boss has this strange idea that I should be doing some stuff for him), but libmail already has a couple of routines in it. I think there should be a forum for the discussion for the algorithms, the computational model, and for the interchange of code. I don't think that there is the kind of general interest needed for a newsgroup, but there probably is enough grounds for a mailing list. It could easily be an extension of smailers@cbosgd (for the folks trying the beta version of smail). I would volunteer to moderate it except that San Diego is a bit of a backwater in terms of network traffic and I'm not sure how good the connectivity would be. In any event, I \will/ volunteer to help out in any way I can. OK, domain mailer gurus, how about it? Would you participate in a mailing list to develop the underlying algorithms and thereby reduce the distance between your separate implementations? Oh, to answer some of your questions, smail has a much better nomenclature for describing the types of mail routing; most of your variations are included, as well as some others. -- -- Greg Noel, NCR Rancho Bernardo Greg@ncr-sd.UUCP or Greg@nosc.ARPA
joel@gould9.UUCP (Joel West) (02/05/86)
In article <1954@peora.UUCP>, jer@peora.UUCP (J. Eric Roskos) writes: > > Okay, first question. Given a path > > > > a!b!c!...!p!q!r!stuff@z > > > > which name(s) do we try to look up? > > My algorithm was to look up "a", iff "a" is not a neighbor. > "z" definitely should not be looked up; that is the essence of > what I've been arguing for for a long time now, although jer@peora.UUCP, like some other uucp-only sites, expresses a preference for ! precedence, many of us have to co-exist in the ARPA-mandated RFC-822 world, which requires "@" precedence. If someone wants to have "!" precedence for his personal use, fine. But anything that claims to be a generalized smart mailer must support "@" precedence, at least as an option. As <gnu@hoptoad.UUCP> remarked in an earlier message, learn from DEC and Sun and don't show your particular addressing problems and perversions to the rest of the net. -- Joel West (619) 457-9681 CACI, Inc. Federal, 3344 N. Torrey Pines Ct., La Jolla, CA 92037 {cbosgd,ihnp4,pyramid,sdcsvax,ucla-cs}!gould9!joel gould9!joel@nosc.ARPA
jer@peora.UUCP (J. Eric Roskos) (02/06/86)
This is a continuation of my posting explaining the rationale behind "opath"'s route-generating scheme, <1954@peora>. Underlying the approach used are a number of principles which are not documented in the program or manual pages. I have explained them in the past here, usually as they evolved, but not in summary. The first is the design decision that there should be a distinct "routing language" separate from the RFC822 language. The term "language" here should be interpreted as the definition accepted in formal language theory, not as some loose, generic term. The language for RFC822-compliant addresses is well-defined in RFC822. That is a superset the language recognized by the routine opath(), since opath only recognizes the <addr-spec> from RFC822. The language for the "UUCP routing language" is recognized by the routine oupath(), and has the syntax <u-addr> ::= <dest-spec>!<uninterpreted> | <dest-spec> <dest-spec> consists of all strings of at most some finite length l from the 96-character printable ASCII set, except strings that contain "!" anywhere in the string. <uninterpreted> consists of all strings from the 96-character printable ASCII set. The value of "l" is defined in opath.c, but I increased the value subsequent to posting the source because someone (I think Chris Torek) pointed out that some site names in the UK were longer than the value of "l" I had specified. That is a sort of practical problem, the kind of thing standards are good for; the only thing limiting l is the fact that it takes a lot of space to store long strings. The interpretation of <dest-spec> is a semantic issue that is particular to the site interpreting it. In other words, a given mail site may be of one of the three semantic classes Mark Horton has previously specified, reflecting the "smartness" of the mailer. It would be somewhat desirable if all sites interpreted <dest-spec> the same way, but they don't have to-- all that is required is that <dest-spec> not contain a "!". Now, actually there is a "loophole" involved in the language that eliminates even this requirement. In our use of the routing language, <dest-spec> tells where to send the message to, giving that site the <uninterpreted> part of the routing string (having removed the <dest-spec> and the "!"); and it's not necessary that that site accept the same language we have used. This is why, for example, peora can take an address like ...!pesnta!peora!csnet-gw!y!z@slowvax.csnet and correctly transmit it to csnet-gw (where csnet-gw is a neighbor of ours who happens to run Sendmail and be a CSnet site), and have that site give @-precedence to the <uninterpreted> part, with the result that the message is delivered via the path we would express as csnet-gw!slowvax.csnet!y!z It could, for example, be the case that csnet-gw is really a different mail system at peora, instead of a neighbor of ours. So this simple language also lets you express some constructs that are a little more complex than it seems, at first, you are allowed to express -- because the language does not provide a way for you to interpret <uninterpreted> at all, and so you can't get confused by it yourself, but can pass it to some other mailer that recognizes <uninterpreted> as a whole different language, and does something different with it. This does, however, have its limitations, and that is why it would be best if all sites recognized the same language. That is also why the "imbedded domain" constructs like ...!peora!slowvax.csnet!samsvax.arpa!sam are needed. However, a fundamental principle in the design of this simple language *is* its simplicity -- it involves the idea of not adding any feature that you haven't found you actually need (which, if I were expounding upon Unix, I would say is the fundamental difference between the BSD Unix and System V) -- and thus the language at first may seem "too simple". [Now it's time for another meeting, so I'll have to explain some more of the rationale some other time.] -- UUCP: Ofc: jer@peora.UUCP Home: jer@jerpc.CCUR.UUCP CCUR DNS: peora, pesnta US Mail: MS 795; CONCURRENT Computer Corp. SDC; (A Perkin-Elmer Company) 2486 Sand Lake Road, Orlando, FL 32809-7642 xxxxx4xxx "There are other places that are also the world's end ... But this is the nearest ... here and in England." -TSE
ulmo@well.UUCP (Brad Allen) (02/09/86)
In article <309@gould9.UUCP>, joel@gould9.UUCP (Joel West) writes: > In article <1954@peora.UUCP>, jer@peora.UUCP (J. Eric Roskos) writes: > > > Okay, first question. Given a path > > > > > > a!b!c!...!p!q!r!stuff@z > > > > > > which name(s) do we try to look up? > > > > My algorithm was to look up "a", iff "a" is not a neighbor. > > "z" definitely should not be looked up; that is the essence of > > what I've been arguing for for a long time now, > > although jer@peora.UUCP, like some other uucp-only sites, expresses > a preference for ! precedence, many of us have to co-exist in > the ARPA-mandated RFC-822 world, which requires "@" precedence. > > If someone wants to have "!" precedence for his personal use, fine. > But anything that claims to be a generalized smart mailer must > support "@" precedence, at least as an option. As <gnu@hoptoad.UUCP> > remarked in an earlier message, learn from DEC and Sun and don't show > your particular addressing problems and perversions to the rest > of the net. > -- > Joel West (619) 457-9681 I agree strongly with the last comment. @ should basically always take precedence, since most organized networks use the @ standard (exclusively?). But while this standard isn't completely adopted, the pathalias files (or other nodelist files) kept by smart mailers could keep information about how each node prioritizes the @ and the !, and thus be able to know exactly how to create paths from that information. Knowing this, hopefully most mail would arrive at any given host with a path that it can recognize the way it wants to. {hplabs,dual,ptsfa,lll-crg}!well!ulmo
greg@ncr-sd.UUCP (Greg Noel) (02/11/86)
In article <400@ncr-sd.UUCP> greg@ncr-sd.UUCP (Greg Noel) writes: >I think there should be a forum for the discussion for the algorithms, the >computational model, and for the interchange of code. ..... > >OK, domain mailer gurus, how about it? Would you participate in a mailing >list to develop the underlying algorithms and thereby reduce the distance >between your separate implementations? I've gotten some replies to this; it seems that there is some interest in such a mailing list. I'm still collecting names; if anybody is intested in participating in such a list, drop me a line. One of the respondants was J. Eric Roskos, who points out that the following could be interpreted as somewhat gratuitious: >Oh, to answer some of your questions, smail has a much better nomenclature >for describing the types of mail routing; most of your variations are >included, as well as some others. Well, I didn't intend it that way, but under the cirumstances (there was a very pretty lady blowing in my ear so I was, um, under some pressure to finish the reply quickly) I was a little briefer than I could have been. In any event, I will expand a bit on how smail describes it. Smail allows you to determine how each address is evaluated. (The following is an over-simplification, and uses the names from the Beta version -- the distribution names were cleaned up, presumably to protect the guilty.) There are four ways in which an address is considered; one at-based scheme and three bang-based schemes: a. The at-based scheme is well defined in RFC-822, so I won't go into it. (Although it didn't handle @host,@host:user@host; I hope this was fixed in the production release.) b. NORMAL bang-routing is just to pass the mail to the left-most host. If the left-most host is not directly connected, it is an error. Actual routing is performed only if the left-most host is a domain name (i.e., it has dots in it). You can consider this mode to be a replacement for rmail without being too far wrong. c. PUSHY bang-routing routes (i.e., looks up the path) to the left-most host. The documentation claims that this is so you can impress your friends with how many sites you can reach. d. BULLY bang-routing starts at the \right/-most host and works to the left. It routes to the first host it finds to which it can do PUSHY routing. The choice of algorithm is done by flags, so hybrid addresses are treated by whatever is the dominant mode; smail doesn't try to make the decision. If the address mode is wrong (i.e., no @-signs in an address being at-routed), it will try the other mode. If that fails, it will route it to a local delivery agent. At-routing is essentially treated as a PUSHY bang-routed address. If routing was done, the new full-path address is ROUTED, which is essentially the NORMAL bang-routing case above, and a delivery agent (uux or sendmail) is invoked to do the transmission to the next site. (The choice of delivery agent is wired in at compile time; it must be either uux or sendmail.) The point of all this is that the nomenclature is independent of the particular implementation. In fact, I would argue that the algorithms implied above are tactical routines that all of the domain mailers use in one form or the other, and that the major difference between them is at the level of strategy -- which fragment to use in which order, and what kind of external hints can be used to better the routing process. I would like to see the development of a common set of routines shared between the various mailers/routers; then we could talk rationally about the actual differences in strategy without the obscuring cloud of implementation considerations. Enuf soapboxing! Anybody else want to talk about it on a mailing list? -- -- Greg Noel, NCR Rancho Bernardo Greg@ncr-sd.UUCP or Greg@nosc.ARPA
jer@peora.UUCP (J. Eric Roskos) (02/11/86)
This is the third (and possibly last) posting in a series explaining "opath"'s routing scheme. The previous two articles are <1954@peora.UUCP> and <1958@peora.UUCP>. I have written these articles with little provocation (and probably with few readers, and even fewer who agree) because for the past year I have been persistent in expounding on the way I think the UUCP mail should work; I am not one to hold to a position in the face of widespread disagreement unless I have thought through my position carefully, and thus wanted to explain the rationale. The final major tenet behind opath, which actually ties the previous two postings together, involves an abstract model for how the mail is delivered. A user at some originating site o writes a message, which is a string of characters S. He intends to send it to some destination site d, where it will be read. Hopefully, the format for messages will agree, possibly after some trivial transformation (e.g., converting LFs to CR/LFs) is made on the message, between the two sites. The extent to which they *have* to agree depends on the complexity of the programs originating and receiving them; this is a strong argument for simplicity of the programs (call them "mailers") at each end, but there are also a lot of beneficial things that such mailers can do (for example, automatically generating a reply) if a certain level of complexity is permitted. Presently there are (at least) two major formats for messages. One, which is an extension of the original Unix mailers, treats a message as simply a string of text beginning with the characters "From ", with messages separated by a blank line (so that any message in a file other than the first will actually begin with the string "\nFrom ". It is because of this standard that many Unix mailers insert the character ">" at the start of any paragraph beginning with the word "From".). This is an old standard that is not very compatible with mailers elsewhere. The second standard is RFC822. RFC822 is a standard for the format of mail messages, although it is usually discussed in net.mail in the context only of mail addresses. In any case, if the originating and destination mailers agree on a message format, that should be sufficient; all that is necessary is to get the message there. To do this requires what we usually call, in here, a "transport mechanism". Ideally, the transport mechanism should be entirely distinct from the mailers. It should be equally capable of sending arbitrary files between sites (e.g., binary object files) as mail messages. It shouldn't know and shouldn't be required to know either that it is transmitting a mail message, or what the format of the message is. In the domain of well-defined networks, this is accomplished by defining a series of "layers"; messages consist of a block of data, whose meaning is unimport- ant to the software at a given layer, encapsulated in an "envelope" (whose meaning *is* important to the software at that layer) describing how to deliver the message, along with information for validating that nothing has been lost out of the message (e.g., a checksum, CRC, etc.). At the next higher layer, this envelope is itself treated as data, along with all the rest of the data for the message, and another envelope is put around that. The software at this next-higher layer doesn't even know where the envelope for the layer below it ends, and the message begins. Since only the envelope is interpreted, the meaning of the data is unimportant, and no meaning is even defined for it. If the information in the envelope on routing is considered a language, then it is not necessary that the languages at two different layers be in any way compatible, as long as the integrity of the message/envelope distinction is not violated. This idea is fundamental to my arguments in favor of a distinct routing language for UUCP. It is in fact the case that in System V UUCP, the transport mechanism can deliver arbitrary data files without awareness of their contents, across many "hops". In prior UUCPs, the transport mechanism could only deliver the message across one "hop", i.e., to a neighboring site, after which a program (rmail) had to be run to decide where to send the message next. This is where the trouble started, since it provided the potential for circumventing the distinction between the message and its envelope. But, in fact, this was not done in standard Unix. A routing language as described previously was used; each rmail was given a string in the language, it took off the <nextsite> part, and delivered the message, along with an envelope consisting of the <uninterpreted> part, to the site named by <nextsite>. It also prepended a "routing stamp" to the front of the message it was delivering; although the routing string was in a separate file from the message and routing stamp, the routing string and routing stamp can be considered the envelope, as distinct from the message body. In this way, the message can be delivered without tampering with the message body; and, as discussed in the previous posting on the routing language, the message can even be moved across transport mechanisms (e.g., between the ARPAnet and UUCP network) without problems, as long as the receiving transport mechanism accepts a string in the form of <interpreted> as an instruction on how to deliver the message. The problem, and source of much debate and confusion, occurs when the envelope/message distinction is not maintained, however. This is especially easy to do when Sendmail is used to process the messages, since Sendmail provides nothing to prevent the combining of the two other than careful discipline. Fortunately, the "interpret the routing string in the context in which it was delivered" method provided by Gene Spafford does preserve that distinction. In reality, of course, mailers do make changes to the message. The main change they make is to add lines telling how the message was delivered ("Received:" lines). Unfortunately, some also make other changes; I have argued in the past that this is a result of confusing the routing language with the language used to define the standard format for the message; i.e., making the assumption (which I have claimed is incorrect) that because the envelope uses one language, the message must also comply with it (or vice versa). It is my contention, based on the model given above, that no such compliance is needed; and furthermore, that since the original Unix mailers had a very trivial definition of the structure of the message itself, that the message can be made to comply with RFC822, and thus with other RFC822-compliant networks, without the great deal of confusion that now exists over how to do so, and without making the message (while in the domain of UUCP) non-compliant with RFC822. -- UUCP: Ofc: jer@peora.UUCP Home: jer@jerpc.CCUR.UUCP CCUR DNS: peora, pesnta US Mail: MS 795; CONCURRENT Computer Corp. SDC; (A Perkin-Elmer Company) 2486 Sand Lake Road, Orlando, FL 32809-7642 xxxxx4xxx