[net.mail.headers] ARPA to BITNET munging - example

zben@umd5.UUCP (09/10/85)

This is a somewhat contrived example of mailing messages through the ARPA
Internet and through a BitNet mail gateway.  My apologies for its length and
complexity, but there are some subtleties that I especially wanted to show.
To illustrate the processing required for addresses that appear in the SMTP
out-of-band (OOBS) information but not in the message header, I decided to
use a "BCC:" (blank carbon copy) address for the message, although the
implementation of BCC: is probably different everywhere.  This same case does
occur frequently in the real word, however.  For example, mail sent from a
"mail reflector" typically has a bundle of OOBS addresses that are not even
mentioned in the header.

Note also the rules for the BitNet gateway to which I talk and which is
assumed for this example:

The user mail agent (mail program) sends one copy of the message to the
mailer virtual machine on his host.  After syntax checking, it in turn
sends one copy of the message to the mailer virtual machine at EACH site.
If two or more users were named at the same site, only ONE copy of the
message is sent to the mailer virtual machine at that site.

The sending is point-to-point within BitNet and is mediated by the RSCS
system, atop which BitNet mail is built, in the same fashion that UUCP
mail is built atop the UUCP/UUX network.

When the mailer virtual machine at a site gets mail from a mailer virtual
machine at another site, it examines the HEADER fields (yuck!) and sends a
copy on to each local user named.  It utterly ignores addresses that name
other machines (as the originating mailer sent those separately).

This is why we carefully move addresses to an ALSO-TO: field below.  We do
not want to accidently generate a duplicate copy of the message in the case
that it was "split up" by a dumb ARPANET mailer before it got to us.


Technical terms:

MUNGE   - The act of translation of the syntax and semantics of the fields of
          the headers of messages passed across domain boundaries.  The aim of
          this translation is to make the header comprehensible to mail agents
          in the new (destination) domain.

OOBS    - This refers to the SMTP out-of-band information that is NOT a part
          of the message header at all, but is passed from sending host to
          receiving host as part of the SMTP mail interaction.

OURNAME - Any of the various names for the site that is actually running this
          software.  This includes the primary (official) name and all defined
          alias or NIC names.

TO*     - Any of the header fields which carry "To:" semantic information.
          In my implementation these are: To:, CC:, BCC:, Resent-To:,
          Resent-CC:, and Resent-BCC: fields.

The example is the sending of a message from BEN at site ORIG to several
people on both ARPA and BitNet.  To set the stage, the hypothetical network
configuration is shown here in diagram form.  Just to make things a little
more complicated, we assume ORIG cannot talk directly to HERE, so several
addresses must be forwarded through host RELAY.


                                               ::
   (--------)    ---------                     ::           (---------)
   (  BILL  )----| OTHER |         ARPA        ::  BitNet   (  MARY   )
   (--------)    ---------        domain       ::  domain   (---------)
                     |                         ::                |
                     |                         ::                |
   (--------)    ---------     ---------    --------        -----------
   (  BEN   )----| ORIG  |-----| RELAY |----| HERE |--------| SCHOOL  |
   (--------)    ---------     ---------    --------        -----------
                     |                         ::                |
                     |                         ::                |
   (--------)    ---------                     ::           (---------)
   ( MYBOSS )----| ADMIN |                     ::           ( HERBOSS )
   (--------)    ---------                     ::           (---------)
                                               ::


         Figure 1: Hypothetical configuration for munging example

---------

Hypothetical commands to mail sending agent at ORIG.ARPA:

   send  BILL@OTHER.ARPA    @RELAY.ARPA,@HERE.ARPA:MARY@SCHOOL.BITNET
   ?bcc  MYBOSS@ADMIN.ARPA  @RELAY.ARPA,@HERE.ARPA:HERBOSS@SCHOOL.BITNET

Mail sending agent on ORIG.ARPA generates header:

FROM: <BEN@ORIG.ARPA>
TO:   <BILL@OTHER.ARPA>, <@RELAY.ARPA,@HERE.ARPA:MARY@SCHOOL.BITNET>

Out-of-band (OOBS) information generated as:

MAIL FROM: <BEN@ORIG.ARPA>
RCPT TO:   <BILL@OTHER.ARPA>
RCPT TO:   <MYBOSS@ADMIN.ARPA>
RCPT TO:   <@RELAY.ARPA,@HERE.ARPA:MARY@SCHOOL.BITNET>
RCPT TO:   <@RELAY.ARPA,@HERE.ARPA:HERBOSS@SCHOOL.BITNET>

---------

When message is passed from ORIG.ARPA to RELAY.ARPA, the header is untouched,
and the OOBS information is the subset for which RELAY.ARPA is responsible:

MAIL FROM: <BEN@ORIG.ARPA>
RCPT TO:   <@RELAY.ARPA,@HERE.ARPA:MARY@SCHOOL.BITNET>
RCPT TO:   <@RELAY.ARPA,@HERE.ARPA:HERBOSS@SCHOOL.BITNET>

---------

When message is passed from RELAY.ARPA to HERE.ARPA, the header is untouched,
and the OOBS information has been updated to add RELAY.ARPA to the back path,
and to remove RELAY.ARPA from the forward paths:

MAIL FROM: <@RELAY.ARPA:BEN@ORIG.ARPA>
RCPT TO:   <@HERE.ARPA:MARY@SCHOOL.BITNET>
RCPT TO:   <@HERE.ARPA:HERBOSS@SCHOOL.BITNET>

---------

At HERE.ARPA the munging actually gets done.  Note that the header is
unchanged, but the OOBS information has been processed by the relay.
We extract the addresses from the header and correlate with the OOBS info:

       HEADER ADDRESSES

(1) BILL@OTHER.ARPA
(2) @RELAY.ARPA,@HERE.ARPA:MARY@SCHOOL.BITNET

       OOBS ADDRESSES

(A) @HERE.ARPA:MARY@SCHOOL.BITNET
(B) @HERE.ARPA:HERBOSS@SCHOOL.BITNET

       CORRELATIONS

               (1)    -    Header address not correlated with OOBS (case a)
        (A)<-->(2)    -    Correlation                             (case b)
        (B)           -    OOBS address not correlated with header (case c)

       ACTIONS

(case a): Any header TO* address that does NOT correlate with an OOBS gets
          moved to a new ALSO-TO: field in the output header.  OURNAME is
          also added so that replies (if any) come back through here.

(case b): Any header TO* address that CORRELATES with an OOBS is passed
          through UNCHANGED, except for removing OURNAME and rewriting to
          BitNet address syntax.

(case c): Any OOBS address that does NOT correlate with a header TO* field
          is MATERIALIZED in a new TO: field in the output header.  OURNAME
          is removed from address so BitNet node can recognize it.

       OUTPUT HEADER                                                (note)

FROM:    <BEN%ORIG.ARPA%RELAY.ARPA@HERE.BITNET>                      (i)
TO:      <MARY@SCHOOL.BITNET>,                                       (ii)
         <HERBOSS@SCHOOL.BITNET>                                     (iii)
ALSO-TO: <BILL%OTHER.ARPA@HERE.BITNET>                               (iv)

       NOTES

(i)    FROM: field is recoding of OOBS MAIL FROM: information, with OURNAME
       added.  This gets advisories (if any) back to the original sender.
       Equivalent ARPA syntax: @HERE.BITNET,@RELAY.ARPA:BEN@ORIG.ARPA
       Equivalent UUCP syntax: HERE.BITNET!RELAY.ARPA!ORIG.ARPA!BEN

(ii)   According to (case b), MARY@SCHOOL.BITNET is passed through unchanged.
       OURNAME is NOT added to the address.

(iii)  According to (case c), HERBOSS@SCHOOL.BITNET is materialized into
       a TO: field in the header, and OURNAME is NOT added.

(iv)   According to (case a), BILL@OTHER is moved to ALSO-TO: field.
       In this example, OURNAME is added, but I don't know what algorithm
       should determine this.  Suppose OTHER is a BitNet site???
       Equivalent ARPA syntax: @HERE.BITNET:BILL@OTHER.ARPA
       Equivalent UUCP syntax: HERE.BITNET!OTHER.ARPA!BILL

A word about correlation:  This example is carefully contrived to pass the
message through an ARPA relay site before munging to BitNet, thus highlighting
some of the differances in processing between the header addresses and the
OOBS addresses.  Note that in the above example we had to correlate the OOBS:

RCPT TO: <@HERE.ARPA:MARY@SCHOOL.BITNET>

with the header TO: field:

TO: <@RELAY.ARPA,@HERE.ARPA:MARY@SCHOOL.BITNET>

so we have to ignore the leading RELAY.ARPA in the TO* field.  At the very
least we have to ignore leading sites from the header address until we find
an instance of OURNAME.  The algorithm I use is a bit more general, and I
believe it will work even if the message is looped through the munge site
multiple times!  To wit:

Algorithm C: Test two address strings for correlation.

[C0] (initiation)
     Remove leading OURNAMEs from OOBS string.

[C1] (passive string search part 1)
     Attempt to extract one site from HEADER string.
     If no more sites, return UNCORRELATED, else goto C2.

[C2] (passive string search part 2)
     Lookup site extracted to see if it is an "OURNAME".
     If it is NOT "OURNAME" then goto C1 else goto C3.

[C3] (begin active string search)
     Remove leading OURNAMEs from HEADER string.
     Save current location in HEADER string scanning (in case fails).
     Begin scanning OOBS at first non-OURNAME.

[C4] (active string search part 1)
     Attempt to extract leading site from HEADER string.
     If successful goto C5 else goto C7.

[C5] (active string search part 2)
     Attempt to extract leading site from OOBS string.
     If successful goto C6 else goto C9.

[C6] (active string search part 3)
     If sites extracted from HEADER and OOBS string are the same, goto C4.
     If they are different sites, goto C9.

[C7] (End of HEADER string encountered)
     Attempt to extract leading site from OOBS string.
     If successful goto C9 else goto C8.

[C8] (Both strings exhausted except for USER parts)
     Compare user parts, if equal then return CORRELATED, else goto C9.

[C9] (partial failure)
     Restore scanning position in HEADER string saved in C3 and goto C1.

A word on step C6, above.  We want to detect that the site is the same
even if the two site names differ only by being different aliases for the
same site.  I accomplish this step by looking up both of them, and then
testing the ARPA site words (#,#,#,#) for equality.  If lookup fails for
one, but not the other, step C6 returns "not-equal".  If lookup fails for
BOTH site names, I compare the names textually.  This means that correlation
can continue beyond the names known to my domain, with only the loss of the
alias-detecting facility.

REFERENCES:

"Simple Mail Transfer Protocol",
     Johnathan B. Postel, ARPA RFC 821

"Standard for the Format of Arpa Internet Text Messages",
     David H. Crocker, ARPA RFC 822

"Proposed Standard for Message Header Munging",
     Marshall T. Rose, ARPA RFC 886

ACKNOWLEGEMENTS:

The author would like to credit Hans Breitenlohner for his modifications to
the Sperry RTP system, which implement physical transfer for messages in a
reasonable way (i.e. *I* don't have to do the EBCDIC-ASCII translation :-),
and to credit Bruce Crabill with the modifications to IBM RSCS to allow the
Sperry mainframe (read Remote Hasp Workstation) to participate in the RSCS
network.

-zben
Ben Cranston <ZBEN@UMD2.ARPA>
             <ZBEN@UMDC.BITNET>
             <ZBEN@UMD5.UUCP>
-- 
Ben Cranston  ...{seismo!umcp-cs,ihnp4!rlgvax}!cvl!umd5!zben  zben@umd2.ARPA